# Manual Changes

## template mapping files are in the git repository

## original data in _CyVerse Discovery Environment_ 
### data file is: "ODOVIRGCLEAN.csv"

### _lifeStage_ and _ageValue_
- in _lifestage_
- create new columns _ageValue_ and _ageUnit_
- separate out lifeStage (e.g., juvenile, adult) from ageValue and ageUnit
- make sure ageUnit is spelled out and singular (e.g., "year")

### _yearCollected_
- in _eventDate_
- create new column _yearCollected_
- separate out year
- include century as well (e.g., 1999)

### _unused columns_
- LocationCode
- Note

## To Code
### _measurementValue_
- select only "1st_" measurement

### _measurementUnit_
- make sure either in "g" or "mm"

In [3]:
import pandas as pd

## For large datasets
- compress file
- create script that checks if it is valid, if not reloads it, if _still_ not then waits a bit (exponential background)
wget : bash command that dl files from websites (in terminal: brew install wget)
get file && unzip

*Note*: need hash (fingerprint) for file & stable link to version of dataset

In [4]:
%%bash
wget 

wget: missing URL
Usage: wget [OPTION]... [URL]...

Try `wget --help' for more options.


CalledProcessError: Command 'b'wget \n'' returned non-zero exit status 1.

In [None]:
#Import Mammal VertNet Data from Cyverse
mammal = pd.read_csv("https://de.cyverse.org/dl/d/338C987D-F776-4439-910F-3AD2CD1D06E2/mammals_no_bats_2019-03-13.csv")

In [None]:
#Rearrange columns so that template columns are first, followed by measurement values

#Create column list
cols = mammal.columns.tolist()

#Specify desired columns
cols = ['catalognumber',
        'collectioncode',
        'decimallatitude',
        'decimallongitude',
        'maximumelevation',
        'minimumelevation,'
        'eventdate',
        'institutioncode',
        'lifestage',
        'locality',
        'sex',
        'scientificname',
        'references',
        '1st_body_mass',
        '1st_ear_length',
        '1st_hind_foot_length',
        '1st_tail_length',
        '1st_total_length']

#Subset dataframe
mammal = mammal[cols]


In [None]:
#Matching template and column terms

#Renaming columns 
mammal = mammal.rename(columns = {'catalognumber': 'catalogNumber',
                                 'collectioncode':'collectionCode',
                                 'decimallatitude':'decimalLatitude',
                                 'decimallongitude':'decimalLongitude',
                                 'maximumelevation':'maximumElevationInMeters'
                                 'minimumelevation':'minimumElevationInMeters'
                                 'eventdate':'verbatimEventDate',
                                 'institutioncode' :'institutionCode',
                                 'lifestage':'lifeStage',
                                 'locality':'verbatimLocality',
                                 'scientificname':'scientificName'})

In [None]:
#Matching trait and ontology terms

#Renaming columns
mammal = mammal.rename(columns={'1st_body_mass':'body mass',
                                '1st_ear_length': 'ear length',
                                '1st_hind_foot_length':'hind foot length',
                                '1st_tail_length':'tail length',
                                '1st_total_length':'full body length'})

In [None]:
#create long version so that each trait has its own row

#creating long version, first specifiying keep variables, then naming variable and value

longVersMammal=pd.melt(mammal,
                      id_vars=['catalogNumber',
                      'collectionCode',
                      'decimalLatitude',
                      'decimalLongitude',
                      'maximumElevationInMeters', 
                      'minimumElevationInMeters',
                      'verbatimEventDate',
                      'institutionCode',
                      'lifeStage',
                      'verbatimLocality',
                      'sex',
                      'scientificName',
                      'references'], 
                var_name = 'trait',
                value_name = 'measurement')

#Writing long data csv file
longVersMammal.to_csv('Mammal_Data_Long.csv')

In [None]:
#Populating measurementUnit column with appropriate measurement units in long version
longVers=longVers.assign(measurementUnit="")

for ind in longVers.index:
    if longVers['measurementType'][ind] == "body mass":
        longVers['measurementUnit'][ind]="g"
    else:
        longVers['measurementUnit'][ind]="mm"

In [None]:
#Writing long data csv file
longVersMammal.to_csv('../Mapped Data/Mammal_Data_Long.csv')