## Code used to generate data for ontology annotations
#### Values needed for ontology table:
* `trait`
* `trait_description`
* `units`
* `method_name`

In [1]:
import pandas as pd

In [6]:
df = pd.read_csv('data/raw/mac_season_4.csv')
print(f'Shape of dataframe: {df.shape}')

Shape of dataframe: (372363, 39)


In [7]:
unique_traits = df.trait.unique()
print(f'{df.trait.nunique()} unique traits within dataset')

75 unique traits within dataset


### I. Find all traits that have more than one unique value for 
* `trait_description`
* `units`
* `method_name`

In [25]:
for trait in unique_traits:
    
    if df.loc[df.trait == trait].trait_description.nunique() > 1:
        print(f'Trait with more than one unique trait description: {trait}')
        
    if df.loc[df.trait == trait].units.nunique() > 1:
        print(f'Trait with more than one unique value for units: {trait}')
    
    if df.loc[df.trait == trait].method_name.nunique() > 1:
        print(f'Trait with more than one unique method name: {trait}')

Trait with more than one unique method name: canopy_height


### II. Trait descriptions
Four traits have ontology identifiers in their trait description:
* `lodging_present`
* `stem_elongated_internodes_number`
* `emergence_count`
* `plant_basal_tiller_number`

In [11]:
for trait in unique_traits:
    
    print(f'All unique descriptions for {trait}: {df.loc[df.trait == trait].trait_description.unique()}')

All unique descriptions for leaf_desiccation_present: ['Presence or absence of leaves showing desiccation. 1 = present, 0 = absent']
All unique descriptions for lodging_present: ['Plant lodging: presence or absence of lodging or severe leaning within a plot. 1 = present, 0 = absent. Sorghum Crop Ontology Identifier CO_324:0000283.']
All unique descriptions for leaf_temperature: ['temperature of the surface of a sunlit leaf']
All unique descriptions for planter_seed_drop: ['Number of seeds dropped per planted length of subplot row during planting']
All unique descriptions for roll: ['The angle that the handheld instrument is held along the long axis']
All unique descriptions for PhiNO: ['Chlorophyll fluorescence-derived photosynthesis parameter, ratio of incoming light lost via non-regulated processes']
All unique descriptions for PhiNPQ: ['Chlorophyll fluorescence-derived photosynthesis parameter, ratio of incoming light that goes towards non-photochemical quenching']
All unique descri

#### Find all trait descriptions containing ontology identifiers
* Search for string values containing `CO_`

In [20]:
ontology_identifiers = df.loc[df.trait_description.str.contains('CO_') == True]
print(f'Traits with ontology identifiers in trait description: {ontology_identifiers.trait.unique()}')

Traits with ontology identifiers in trait description: ['lodging_present' 'stem_elongated_internodes_number' 'emergence_count'
 'plant_basal_tiller_number']


### III. Units

In [12]:
for trait in unique_traits:
    
    print(f'All unique units for {trait}: {df.loc[df.trait == trait].units.unique()}')

All unique units for leaf_desiccation_present: [nan]
All unique units for lodging_present: ['binary']
All unique units for leaf_temperature: ['K']
All unique units for planter_seed_drop: ['count']
All unique units for roll: [nan]
All unique units for PhiNO: [nan]
All unique units for PhiNPQ: [nan]
All unique units for absorbance_530: ['arbitrary absorbance units']
All unique units for absorbance_605: ['arbitrary absorbance units']
All unique units for absorbance_730: ['arbitrary absorbance units']
All unique units for absorbance_880: ['arbitrary absorbance units']
All unique units for absorbance_940: ['arbitrary absorbance units']
All unique units for Fs: [nan]
All unique units for NPQt: [nan]
All unique units for qL: [nan]
All unique units for qP: [nan]
All unique units for RFd: ['ratio']
All unique units for SPAD_530: [nan]
All unique units for SPAD_605: [nan]
All unique units for SPAD_730: [nan]
All unique units for leaf_thickness: ['mm']
All unique units for ambient_humidity: ['%']

#### Example exploration code for null values
* View one complete row for more information

In [14]:
df.loc[df.trait == 'leaf_desiccation_present'].iloc[0]

Unnamed: 0                                                           1
checked                                                              0
result_type                                                     traits
id                                                          6001958927
citation_id                                                      6e+09
site_id                                                     6000005673
treatment_id                                                     6e+09
sitename                  MAC Field Scanner Season 4 Range 11 Column 5
city                                                          Maricopa
lat                                                            33.0749
lon                                                           -111.975
scientificname                                         Sorghum bicolor
commonname                                                     sorghum
genus                                                          Sorghum
specie

### IV. Method Name
* `canopy_height` is the only trait with more than method

In [21]:
for trait in unique_traits:
    
    print(f'All unique method names for {trait}: {df.loc[df.trait == trait].method_name.unique()}')

All unique method names for leaf_desiccation_present: ['Visual assessment of leaf dessication']
All unique method names for lodging_present: ['Plant lodging incidence - Presence estimation']
All unique method names for leaf_temperature: ['temperature of sunlit leaf with [instrument name]']
All unique method names for planter_seed_drop: ['Planter seed drop count']
All unique method names for roll: ['MultispeQ v1.0 field measurements of fluorescence-based and absorbance-based parameters']
All unique method names for PhiNO: ['MultispeQ v1.0 field measurements of fluorescence-based and absorbance-based parameters']
All unique method names for PhiNPQ: ['MultispeQ v1.0 field measurements of fluorescence-based and absorbance-based parameters']
All unique method names for absorbance_530: ['MultispeQ v1.0 field measurements of fluorescence-based and absorbance-based parameters']
All unique method names for absorbance_605: ['MultispeQ v1.0 field measurements of fluorescence-based and absorbance-