## Trialing the new UK Flora dataset for data exploration
The dataset contains a current inventory of vascular plant species and their attributes present in the flora of Britain and Ireland. The species list is based on the most recent key to the flora of Britain and Ireland, with taxon names linked to unique Kew taxon identifiers and the World Checklist of Vascular Plants, and includes both native and non-native species. Attribute data stem from a variety of sources to give an overview of the current state of the vascular flora. Attributes include functional traits, distribution and ecologically relevant data (e.g. genome size, chromosome numbers, spatial distribution, growth form, hybridization metrics and native/non-native status). The data include previously unpublished genome size measurements, chromosome counts and CSR life strategy assessments. The database aims to provide an up-to-date starting point for flora-wide analyses.

This dataset will be available under the terms of the Open Government Licence https://eidc.ceh.ac.uk/licences/OGL/plain Publication date: 2021-09-20

https://catalogue.ceh.ac.uk/documents/9f097d82-7560-4ed2-af13-604a9110cf6d

Need to register to download the data.

You must always use the following attribution statement to acknowledge the source of the information: "Contains data supplied by Natural Environment Research Council."

You must include any copyright notice identified in the metadata record for the Data on all copies of the Data, publications and reports, including but not limited to, use in presentations to any audience.

You will ensure that citation of any relevant key publications, Digital Object Identifiers and any other required acknowledgments identified in the metadata record for the Data are included in full in the reference list of any reports or publications that describe any research in which the Data have been used.

Downloaded the data and the supporting information

In [1]:
! ls New_Flora_datasets/data

BI_main.csv      GS_BI.csv        GS_Kew_BI.csv    chrom_num_BI.csv


In [2]:
! head -3 New_Flora_datasets/data/*.csv

==> New_Flora_datasets/data/BI_main.csv <==
kew_id,unclear_species_marker,extinct_species_marker,taxon_name,taxon_name_binom,authors,taxon_name_WCVP,authors_WCVP,order,family,genus,subgenus,section,subsection,series,species,group,aggregate,members_of_agg.,taxonomic_status,accepted_kew_id,accepted_name,accepted_authors,imperfect_match_with_Stace_IV,WCVP_URL,POWO_URL,IPNI_URL,accepted_WCVP_URL,StaceIV_nativity,Atlas_nativity_viaALIENATT_PLANTATT,Stace_Crawley_nativity_aliens,SLA,LDMC,seed_mass,leaf_area,mean_veg_height,max_veg_height,L_PLANTATT,F_PLANTATT,R_PLANTATT,N_PLANTATT,S_PLANTATT,L_Doring,F_Doring,R_Doring,N_Doring,S_Doring,T_Doring,ECPE_CSR,predicted_CSR,growth_form,succulence,life_form,biome,origin,TDWG_level_1_code,GB_Man_hectads_post2000,Ire_hectads_post2000,CI_hectads_post2000,GB_Man_hectads_1987_1999,Ire_hectads_1987_1999,CI_hectads_1987_1999,GB_Man_hectads_2000_2009,Ire_hectads_2000_2009,CI_hectads_2000_2009,GB_Man_hectads_2010_2019,Ire_hectads_2010_2019,CI_hectads_2010_2

In [3]:
# Analysis modules
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import scipy
import statsmodels.api as sm
np.set_printoptions(precision=5, suppress=True)  # suppress scientific floatation 
sns.set(color_codes=True)
%matplotlib inline
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)

Check the dataframe

In [5]:
Flora = pd.read_csv('New_Flora_datasets/data/BI_main.csv', sep=",", encoding='latin-1')

In [6]:
Flora.head(3)

Unnamed: 0,kew_id,unclear_species_marker,extinct_species_marker,taxon_name,taxon_name_binom,authors,taxon_name_WCVP,authors_WCVP,order,family,genus,subgenus,section,subsection,series,species,group,aggregate,members_of_agg.,taxonomic_status,accepted_kew_id,accepted_name,accepted_authors,imperfect_match_with_Stace_IV,WCVP_URL,POWO_URL,IPNI_URL,accepted_WCVP_URL,StaceIV_nativity,Atlas_nativity_viaALIENATT_PLANTATT,Stace_Crawley_nativity_aliens,SLA,LDMC,seed_mass,leaf_area,mean_veg_height,max_veg_height,L_PLANTATT,F_PLANTATT,R_PLANTATT,N_PLANTATT,S_PLANTATT,L_Doring,F_Doring,R_Doring,N_Doring,S_Doring,T_Doring,ECPE_CSR,predicted_CSR,growth_form,succulence,life_form,biome,origin,TDWG_level_1_code,GB_Man_hectads_post2000,Ire_hectads_post2000,CI_hectads_post2000,GB_Man_hectads_1987_1999,Ire_hectads_1987_1999,CI_hectads_1987_1999,GB_Man_hectads_2000_2009,Ire_hectads_2000_2009,CI_hectads_2000_2009,GB_Man_hectads_2010_2019,Ire_hectads_2010_2019,CI_hectads_2010_2019,hybrid_propensity,scaled_hybrid_propensity,BOLD_link1,BOLD_link2,BOLD_link3,GS_1C_pg,GS_2C_pg,GS_1C_Mbp,GS_2C_Mbp,from_BI_material,data_source,sporophytic_chromosome_number,infraspecific_variation_chrom_number,other_reported_sporophytic_chromosome_number,source_of_other_chrom_num
0,60468511-2,,,Abies alba Mill.,Abies alba,Mill.,Abies alba,Mill.,Pinales,Pinaceae,Abies,,,,,alba,,,,Accepted,,,,,https://wcvp.science.kew.org/taxon/60468511-2,http://plantsoftheworldonline.org/taxon/604685...,https://ipni.org/n/60468511-2,,Neo-natd,AN,Neo,7.698508,0.529816,65.612834,255.029158,46.843893,68.0,,,,,,3.0,,,,0.0,5.0,,S,Tree,,phanerophyte / tree,,mountains in C Europe,1,382.0,230.0,0.0,230.0,28.0,0.0,120.0,179.0,0.0,303.0,89.0,0.0,,,,,,17.27,34.54,16891.68,33783.36,n,marda et al. 2019,,,24.0,"marda et al. 2019, Zonneveld, 2019"
1,325658-2,,,Abies amabilis Douglas ex J.Forbes,Abies amabilis,Douglas ex J.Forbes,Abies amabilis,(Douglas ex Loudon) J.Forbes,Pinales,Pinaceae,Abies,,,,,amabilis,,,,Accepted,,,,,https://wcvp.science.kew.org/taxon/325658-2,http://plantsoftheworldonline.org/taxon/325658-2,https://ipni.org/n/325658-2,,,,,86.690769,,42.277126,,50.148522,75.0,,,,,,,,,,,,,,Tree,,phanerophyte / tree,,W N America,7,11.0,0.0,0.0,7.0,0.0,0.0,5.0,0.0,0.0,8.0,0.0,0.0,,,,,,,,,,,,,,,
2,261486-1,,,Abies cephalonica Loudon,Abies cephalonica,Loudon,Abies cephalonica,Loudon,Pinales,Pinaceae,Abies,,,,,cephalonica,,,,Accepted,,,,,https://wcvp.science.kew.org/taxon/261486-1,http://plantsoftheworldonline.org/taxon/261486-1,https://ipni.org/n/261486-1,,Neo-natd,AN,Neo,6.530926,,71.43,,25.875,40.0,,,,,,,,,,,,,,Tree,,phanerophyte / tree,,Greece,1,11.0,0.0,0.0,6.0,0.0,0.0,1.0,0.0,0.0,9.0,0.0,0.0,,,,,,18.14,36.27,17738.0,35476.0,,C-ValueDB,,,,


What do we have data on?

In [7]:
Flora.columns

Index(['kew_id', 'unclear_species_marker', 'extinct_species_marker',
       'taxon_name', 'taxon_name_binom', 'authors', 'taxon_name_WCVP',
       'authors_WCVP', 'order', 'family', 'genus', 'subgenus', 'section',
       'subsection', 'series', 'species', 'group', 'aggregate',
       'members_of_agg.', 'taxonomic_status', 'accepted_kew_id',
       'accepted_name', 'accepted_authors', 'imperfect_match_with_Stace_IV',
       'WCVP_URL', 'POWO_URL', 'IPNI_URL', 'accepted_WCVP_URL',
       'StaceIV_nativity', 'Atlas_nativity_viaALIENATT_PLANTATT',
       'Stace_Crawley_nativity_aliens', 'SLA', 'LDMC', 'seed_mass',
       'leaf_area', 'mean_veg_height', 'max_veg_height', 'L_PLANTATT',
       'F_PLANTATT', 'R_PLANTATT', 'N_PLANTATT', 'S_PLANTATT', 'L_Doring',
       'F_Doring', 'R_Doring', 'N_Doring', 'S_Doring', 'T_Doring', 'ECPE_CSR',
       'predicted_CSR', 'growth_form', 'succulence', 'life_form', 'biome',
       'origin', 'TDWG_level_1_code', 'GB_Man_hectads_post2000',
       'Ire_hecta

What types are these?

In [8]:
Flora.dtypes

kew_id                                           object
unclear_species_marker                           object
extinct_species_marker                           object
taxon_name                                       object
taxon_name_binom                                 object
authors                                          object
taxon_name_WCVP                                  object
authors_WCVP                                     object
order                                            object
family                                           object
genus                                            object
subgenus                                         object
section                                          object
subsection                                       object
series                                           object
species                                          object
group                                            object
aggregate                                       

What does the data look like?

In [9]:
Flora.iloc[1]

kew_id                                                                                  325658-2
unclear_species_marker                                                                       NaN
extinct_species_marker                                                                       NaN
taxon_name                                                    Abies amabilis Douglas ex J.Forbes
taxon_name_binom                                                                  Abies amabilis
authors                                                                      Douglas ex J.Forbes
taxon_name_WCVP                                                                   Abies amabilis
authors_WCVP                                                        (Douglas ex Loudon) J.Forbes
order                                                                                    Pinales
family                                                                                  Pinaceae
genus                         

There is much more data in the other data set - root form, stomatal distribution etc...

Where is there missing data?

In [10]:
Flora.info(null_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3227 entries, 0 to 3226
Data columns (total 83 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   kew_id                                        3227 non-null   object 
 1   unclear_species_marker                        575 non-null    object 
 2   extinct_species_marker                        18 non-null     object 
 3   taxon_name                                    3227 non-null   object 
 4   taxon_name_binom                              3227 non-null   object 
 5   authors                                       3227 non-null   object 
 6   taxon_name_WCVP                               3226 non-null   object 
 7   authors_WCVP                                  3226 non-null   object 
 8   order                                         3215 non-null   object 
 9   family                                        3227 non-null   o

  Flora.info(null_counts=True)


In [12]:
daffs = Flora[Flora['taxon_name_binom']=='Narcissus pseudonarcissus']

In [13]:
daffs

Unnamed: 0,kew_id,unclear_species_marker,extinct_species_marker,taxon_name,taxon_name_binom,authors,taxon_name_WCVP,authors_WCVP,order,family,genus,subgenus,section,subsection,series,species,group,aggregate,members_of_agg.,taxonomic_status,accepted_kew_id,accepted_name,accepted_authors,imperfect_match_with_Stace_IV,WCVP_URL,POWO_URL,IPNI_URL,accepted_WCVP_URL,StaceIV_nativity,Atlas_nativity_viaALIENATT_PLANTATT,Stace_Crawley_nativity_aliens,SLA,LDMC,seed_mass,leaf_area,mean_veg_height,max_veg_height,L_PLANTATT,F_PLANTATT,R_PLANTATT,N_PLANTATT,S_PLANTATT,L_Doring,F_Doring,R_Doring,N_Doring,S_Doring,T_Doring,ECPE_CSR,predicted_CSR,growth_form,succulence,life_form,biome,origin,TDWG_level_1_code,GB_Man_hectads_post2000,Ire_hectads_post2000,CI_hectads_post2000,GB_Man_hectads_1987_1999,Ire_hectads_1987_1999,CI_hectads_1987_1999,GB_Man_hectads_2000_2009,Ire_hectads_2000_2009,CI_hectads_2000_2009,GB_Man_hectads_2010_2019,Ire_hectads_2010_2019,CI_hectads_2010_2019,hybrid_propensity,scaled_hybrid_propensity,BOLD_link1,BOLD_link2,BOLD_link3,GS_1C_pg,GS_2C_pg,GS_1C_Mbp,GS_2C_Mbp,from_BI_material,data_source,sporophytic_chromosome_number,infraspecific_variation_chrom_number,other_reported_sporophytic_chromosome_number,source_of_other_chrom_num
1922,66177-1,,,Narcissus pseudonarcissus L.,Narcissus pseudonarcissus,L.,Narcissus pseudonarcissus,L.,Asparagales,Amaryllidaceae,Narcissus,,,,,pseudonarcissus,,,,Accepted,,,,,https://wcvp.science.kew.org/taxon/66177-1,http://plantsoftheworldonline.org/taxon/66177-1,https://ipni.org/n/66177-1,,N,N,,19.29892,0.1375,5.32,1773.5,0.28,0.4,7.0,5.0,6.0,5.0,0.0,7.0,5.0,6.0,5.0,0.0,,,CR,Herb,,geophyte,Southern Temperate,,,1032.0,78.0,7.0,883.0,4.0,10.0,645.0,21.0,1.0,810.0,59.0,6.0,6.0,28.571429,https://www.boldsystems.org/index.php/Public_R...,https://www.boldsystems.org/index.php/Public_R...,,11.75,23.5,11515.0,23030.0,n,C-ValueDB,14| 43,v,14,"C-ValueDB, Zonneveld, 2019"


One line per species. 
Ecology values, genome sizea dn chromsome number, range and change in range