# Manual Changes

## template mapping files are in the git repository
## original data in _CyVerse Discovery Environment_ 
### data file is: "1987-2019 Cougar Weight-Length Public Request.xlsx"

### _verbatimLocality_
- concatenation of "Management Unit" and "County"

### _yearCollected_
- in _Date_
- create new column _yearCollected_
- separate out year
- include century as well (e.g., 1999)

## To Code
### _sex_
- change "F" to "female"
- change "M" to "male"

### _ageUnit_
- all in "year" (spelled out and singular)

### _measurementUnit_
- Weight ({body mass}) is in "lb" (keep abbreviated)
- Length ({body length}) is in "in" (keep abbreviated)

### _materialSampleType_
- in "Status" colum
- "A" = Intact 
    - change "A" to "whole"
- "B" = Field Dressed
    - change "B" to "part - gutted"
- "C" = Skinned
    - change "C" to part - skinned

In [182]:
import pandas as pd
import numpy as np 
import uuid
import re

In [183]:
# Import Oregon FWS Cougar Data Locally
cougar_data = pd.read_csv("../Original Data/cougar_data.csv")

# Drop unnecessary rows 
cougar_data = cougar_data.iloc[4:]

# Create new header
new_header = cougar_data.iloc[0] 
cougar_data = cougar_data[1:] 
cougar_data.columns = new_header

In [184]:
# Import locality data
cougar_locality = pd.read_csv("../Original Data/oregonManagementAreas.csv")

In [185]:
# Create verbatimLocality column by concatenating Management Unit and County
cougar_data=cougar_data.assign(verbatimLocality = cougar_data['Management Unit'] 
                                                + ", "
                                                + cougar_data['County'])

In [186]:
# Add lat and long columns
cougar_data=cougar_data.assign(decimalLatitude = "")
cougar_data=cougar_data.assign(decimalLongitude = "")

unit_name=cougar_locality["Unit Name"]
management_name=cougar_data["Management Unit"]

#Add coordinateUncertaintyInMeters column
cougar_data=cougar_data.assign(coordinateUncertaintyInMeters=50000)

# Match unit_name to management_name and transfer coordinate information
for i in management_name.index:
    for j in unit_name.index:
        if management_name[i]==unit_name[j]:
            cougar_data["decimalLatitude"][i]=cougar_locality["latitude"][j]
            cougar_data["decimalLongitude"][i]=cougar_locality["longitude"][j]
        elif management_name[i]=="McKenzie":
            cougar_data["decimalLatitude"][i]="44.1083926996967"
            cougar_data["decimalLongitude"][i]="-122.417312310006"
   


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  from ipykernel import kernelapp as app
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  app.launch_new_instance()
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [187]:
# Create yearCollected column to deer data
cougar_data=cougar_data.assign(yearCollected = "")
cougar_data['yearCollected'] = cougar_data.Date.str[-4:]

In [188]:
# Correct sex column 
female = cougar_data['Sex']=="F"
male = cougar_data['Sex'] == "M"
cougar_data['Sex'][(female == False)&(male==False)]="not collected"
cougar_data['Sex'][female == True]="female"
cougar_data['Sex'][male == True]="male"

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [189]:
# Create ageUnit Column and assign it to "year"
cougar_data = cougar_data.assign(ageUnit = "year")

In [190]:
# Fix status column to use GEOME terms 
whole = cougar_data['Status']=="A"
gutted = cougar_data['Status']=="B"
skinned = cougar_data['Status']=="C"
skinned = cougar_data['Status']=="c"

cougar_data['Status'][whole == True] = "whole organism"
cougar_data['Status'][gutted == True] = "part organism"
cougar_data['Status'][skinned == True] = "part organism"

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':


In [191]:
# Rearrange columns so that template columns are first, followed by measurement values

# Create column list
cols = cougar_data.columns.tolist()

# Specify desired columns
cols = ['verbatimLocality',
        'yearCollected',
        'decimalLatitude', 
        'decimalLongitude',
        'coordinateUncertaintyInMeters',
        'Date',
        'Sex',
        'ageUnit',
        'Status',
        'Age',
        'Weight',
        'Length']

# Subset dataframe
cougar_data = cougar_data[cols]

In [192]:
# Matching template and column terms

# Renaming columns 
cougar_data = cougar_data.rename(columns = {'Sex':'sex',
                                            'Date':'verbatimEventDate',
                                            'Status':'materialSampleType',
                                            'Age': 'verbatimAgeValue'})

In [193]:
# Matching trait and ontology terms

# Renaming columns
cougar_data = cougar_data.rename(columns={'Weight': 'body mass',
                                          'Length': 'body length'})

In [194]:
# Create measurementUnit column
cougar_data = cougar_data.assign(measurementUnit="")

In [195]:
# Fill in blanks for required columns 
cougar_data=cougar_data.assign(country="USA")
cougar_data=cougar_data.assign(stateProvince="Oregon")
cougar_data=cougar_data.assign(basisOfRecord="PreservedSpecimen")
cougar_data=cougar_data.assign(scientificName="Puma concolor")
cougar_data=cougar_data.assign(locality="Unknown")
cougar_data=cougar_data.assign(samplingProtocol="Unknown")
cougar_data=cougar_data.assign(measurementMethod="Unknown")

In [196]:
# Adding an additional column for ageValue
cougar_data=cougar_data.assign(ageValue="")
cougar_data["ageValue"]=cougar_data["verbatimAgeValue"]

In [197]:
# Create materialSampleID which is a UUID for each measurement
# Create eventID and populate it with materialSampleID

cougar_data=cougar_data.assign(materialSampleID = '')
cougar_data['materialSampleID'] = [uuid.uuid4() for _ in range(len(cougar_data.index))]

for ind in cougar_data.index:
    x=cougar_data['materialSampleID'][ind]
    y=str(x)
    z=y.replace("-", '_')
    
    cougar_data['materialSampleID'][ind] = z

cougar_data=cougar_data.assign(eventID = cougar_data["materialSampleID"])

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if sys.path[0] == '':


In [198]:
#  Create long version so that each trait has its own row

# Creating long version, first specifiying keep variables, then naming variable and value
longVers=pd.melt(cougar_data, 
                id_vars=['verbatimLocality',
                         'yearCollected',
                         'sex',
                         'ageUnit',
                         'materialSampleType',
                         'verbatimAgeValue',
                         'ageValue',
                         'verbatimEventDate',
                         'country',
                         'stateProvince',
                         'eventID',
                         'locality',
                         'decimalLatitude', 
                         'decimalLongitude',
                         'coordinateUncertaintyInMeters',
                         'measurementMethod',
                         'samplingProtocol',
                         'basisOfRecord',
                         'scientificName',
                         'materialSampleID',
                         'measurementUnit'], 
                var_name = 'measurementType', 
                value_name = 'measurementValue')


In [199]:
# Populating measurementUnit column with appropriate measurement units in long version

long_body_mass_filter=longVers['measurementType']=="body mass"
long_no_body_filter=longVers['measurementType']!="body mass"

longVers['measurementUnit'][long_body_mass_filter] = "lb"
longVers['measurementUnit'][long_no_body_filter] = "in"

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


In [200]:
# Create diagnosticID which is a unique number for each measurement
longVers=longVers.assign(diagnosticID = '')
longVers['diagnosticID'] = np.arange(len(longVers))

In [201]:
# If measurement value equals N/a, delete entire row
longVers = longVers.dropna(subset=['measurementValue'])

# Drop first row of data, it contains no measurementValue but is still retained
#longVers = longVers.drop(longVers.index[0])

In [202]:
# Writing long data csv file
longVers.to_csv('../Mapped Data/FuTRES_Puma_concolor_ODFW_OR_USA_Modern.csv', index = False);