# Exploring properties of the raw color data exported from the TRY database

Here is the request, submitted and downloaded June 16, 2022:

TRY Data Request 21604  
Only public data were requested.  
Title:  
21604  
Authors:  
Patrick McKenzie (Columbia University) PI  
Trait List:  
207,  
Species List:  

Description:

In [5]:
import pandas as pd
import numpy as np

In [3]:
color_dat = pd.read_csv('../data/TRY_cleaned_colordata.csv')

In [4]:
color_dat

Unnamed: 0,AccSpeciesName,OrigValueStr
0,Aconitum napellus,blue
1,Aconitum degenii,blue
2,Aconitum pilipes,blue
3,Aconitum plicatum,blue
4,Aconitum tauricum,blue
...,...,...
17051,Zygotritonia bongensis,orange
17052,Zygotritonia bongensis,yellow
17053,Zygotritonia nyassana,green
17054,Zygotritonia nyassana,purple


### Count the number of colors in the dataset, recognizing that there is some redundancy with capital letters

In [14]:
len(color_dat.OrigValueStr.unique())

19

In [15]:
len(color_dat.OrigValueStr.str.lower().unique())

11

### Count number of species represented -- and see that capitalization is not a factor here!

In [7]:
len(color_dat.AccSpeciesName.unique())

10440

In [10]:
len(color_dat.AccSpeciesName.str.lower().unique())

10440

### Recognize that some species names have many words... and some only have one

In [18]:
np.unique([len(i.split()) for i in color_dat.AccSpeciesName])

array([1, 2, 3, 4, 5, 6, 7, 8])

In [21]:
# extract those that have four names
color_dat.AccSpeciesName[np.array([len(i.split()) for i in color_dat.AccSpeciesName]) == 4]

9         Anagallis arvensis subsp. foemina
10        Anagallis arvensis subsp. foemina
152      Veronica austriaca subsp. teucrium
162             Viola canina subsp. montana
164           Viola canina subsp. schultzii
                        ...                
16797      Vitex madiensis subsp. madiensis
16798      Vitex madiensis subsp. madiensis
16813      Vitex madiensis subsp. madiensis
16814      Vitex madiensis subsp. madiensis
16815      Vitex madiensis subsp. madiensis
Name: AccSpeciesName, Length: 553, dtype: object

In [22]:
# extract those that have eight names
color_dat.AccSpeciesName[np.array([len(i.split()) for i in color_dat.AccSpeciesName]) == 8]

7240    Chrysophyllum boukoko nse (Aubrev. & Pellegr.)...
7241    Chrysophyllum boukoko nse (Aubrev. & Pellegr.)...
8115     Cynanchum daltonii (Decne. ex Webb) Liede & Meve
8116     Cynanchum daltonii (Decne. ex Webb) Liede & Meve
Name: AccSpeciesName, dtype: object

In [23]:
# extract those that have one name
color_dat.AccSpeciesName[np.array([len(i.split()) for i in color_dat.AccSpeciesName]) == 1]

5447           Albuca
5665       Amaranthus
6055        Asclepias
8193          Cyperus
8194          Cyperus
8846         Dissotis
9563         Eulophia
9564         Eulophia
9565         Eulophia
10246       Gladiolus
11650       Kniphofia
11738          Kumara
11739          Kumara
11740          Kumara
12172      Lipocarpha
12173      Lipocarpha
12303        Ludwigia
13201           Ochna
13202           Ochna
13203           Ochna
13658        Paspalum
13659        Paspalum
13932      Petalidium
13933      Petalidium
14657     Raphionacme
14658     Raphionacme
14844         Romulea
14845         Romulea
15390           Senna
15465            Sida
15466            Sida
16838    Wahlenbergia
16977     Xysmalobium
16978     Xysmalobium
Name: AccSpeciesName, dtype: object

### how many have the expected binomial nomenclature?

In [26]:
# extract those that have one name
color_dat.AccSpeciesName[np.array([len(i.split()) for i in color_dat.AccSpeciesName]) == 2]

0             Aconitum napellus
1              Aconitum degenii
2              Aconitum pilipes
3             Aconitum plicatum
4             Aconitum tauricum
                  ...          
17051    Zygotritonia bongensis
17052    Zygotritonia bongensis
17053     Zygotritonia nyassana
17054     Zygotritonia nyassana
17055     Zygotritonia nyassana
Name: AccSpeciesName, Length: 16362, dtype: object