# GBIF API request for species naming explanation

In [1]:
import json
import requests

import pandas as pd

### Introduction: example request for info

The requirements are discussed in the following comment:
https://github.com/LifeWatchINBO/invasive-t0-occurrences/issues/6 and an example is included. 

Let us test the example by making a request:

In [2]:
temp = requests.get("http://api.gbif.org/v1/species/match?verbose=false&kingdom=Plantae&name=Heracleum%20mantegazziaum&strict=false")

In [3]:
temp.content

b'{"usageKey":3034825,"scientificName":"Heracleum mantegazzianum Sommier & Levier","canonicalName":"Heracleum mantegazzianum","rank":"SPECIES","status":"ACCEPTED","confidence":96,"matchType":"FUZZY","kingdom":"Plantae","phylum":"Tracheophyta","order":"Apiales","family":"Apiaceae","genus":"Heracleum","species":"Heracleum mantegazzianum","kingdomKey":6,"phylumKey":7707728,"classKey":220,"orderKey":1351,"familyKey":6720,"genusKey":3034824,"speciesKey":3034825,"synonym":false,"class":"Magnoliopsida"}'

### Development of the building blocks of our small application

Two important functions are available to extract the information:
* A function that can do a request with the given species/kingdom combination: `extract_gbif_species_names_info`
* A function that can iteratively do this for every specie in the provided list on https://github.com/LifeWatchINBO/invasive-t0-occurrences/blob/master/species-list/species-list.tsv `extract_species_information`

Importing them makes it available:

In [4]:
from gbif_species_name_extraction import extract_gbif_species_names_info, extract_species_information

### Extract info from GBIF about provided species

`extract_gbif_species_names_info` provides the funcionality to request species info from the GBIF API. It can be used as python function or from the command line:

#### Python function:

In [5]:
updated_tsv = extract_species_information("../../../metadata/species-list/species-list.tsv", 
                                          output=None,
                                          api_terms=["usageKey", 
                                                     "scientificName", 
                                                     "canonicalName",
                                                     "status", 
                                                     "rank", 
                                                     "matchType", 
                                                     "confidence"],
                                          update_cols=True
                                         )

Using columns gbifapi_scientificName and kingdom for API request.


In [6]:
updated_tsv.head()

Unnamed: 0,name,kingdom,euConcernStatus,gbifapi_usageKey,gbifapi_scientificName,gbifapi_canonicalName,gbifapi_status,gbifapi_rank,gbifapi_matchType,gbifapi_confidence,gbifapi_acceptedKey,gbifapi_acceptedScientificName,nbnTaxonID,NBN_scientificName
0,Acer negundo,Plantae,under consideration,3189866,Acer negundo L.,Acer negundo,ACCEPTED,SPECIES,EXACT,100,3189866,Acer negundo L.,NBNSYS0000014604,Acer negundo
1,Alopochen aegyptiaca,Animalia,under consideration,2498252,"Alopochen aegyptiaca (Linnaeus, 1766)",Alopochen aegyptiaca,ACCEPTED,SPECIES,EXACT,100,2498252,"Alopochen aegyptiaca (Linnaeus, 1766)",NHMSYS0001689380,Alopochen aegyptiacus
2,Alternanthera philoxeroides,Plantae,under consideration,3084923,Alternanthera philoxeroides (Mart.) Griseb.,Alternanthera philoxeroides,ACCEPTED,SPECIES,EXACT,100,3084923,Alternanthera philoxeroides (Mart.) Griseb.,,
3,Ameiurus melas,Animalia,under consideration,2340977,"Ameiurus melas (Rafinesque, 1820)",Ameiurus melas,ACCEPTED,SPECIES,EXACT,100,2340977,"Ameiurus melas (Rafinesque, 1820)",NHMSYS0000544615,Ameiurus melas
4,Asclepias syriaca,Plantae,under consideration,3170247,Asclepias syriaca L.,Asclepias syriaca,ACCEPTED,SPECIES,EXACT,100,3170247,Asclepias syriaca L.,INBSYS0000005932,Asclepias syriaca


The available options are:
* output : if None, nothing is written to output, pandas DataFrame is returnes; if string, the output is written to tsv file
* api_terms : Either a list of existing terms or just provide 'all' if interested in all the information 

In [8]:
extract_species_information("../../../metadata/species-list/species-list.tsv", 
                                          output=None,
                                          api_terms="all")

Using columns gbifapi_scientificName and kingdom for API request.


Unnamed: 0,name,kingdom,euConcernStatus,gbifapi_usageKey,gbifapi_scientificName,gbifapi_canonicalName,gbifapi_status,gbifapi_rank,gbifapi_matchType,gbifapi_confidence,...,gbifapi_orderKey,gbifapi_phylum,gbifapi_phylumKey,gbifapi_rank.1,gbifapi_scientificName.1,gbifapi_species,gbifapi_speciesKey,gbifapi_status.1,gbifapi_synonym,gbifapi_usageKey.1
0,Acer negundo,Plantae,under consideration,3189866,Acer negundo L.,Acer negundo,ACCEPTED,SPECIES,EXACT,100,...,933,Tracheophyta,7707728,SPECIES,Acer negundo L.,Acer negundo,3189866,ACCEPTED,False,3189866
1,Alopochen aegyptiaca,Animalia,under consideration,2498252,"Alopochen aegyptiaca (Linnaeus, 1766)",Alopochen aegyptiaca,ACCEPTED,SPECIES,EXACT,100,...,1108,Chordata,44,SPECIES,"Alopochen aegyptiaca (Linnaeus, 1766)",Alopochen aegyptiaca,2498252,ACCEPTED,False,2498252
2,Alternanthera philoxeroides,Plantae,under consideration,3084923,Alternanthera philoxeroides (Mart.) Griseb.,Alternanthera philoxeroides,ACCEPTED,SPECIES,EXACT,100,...,422,Tracheophyta,7707728,SPECIES,Alternanthera philoxeroides (Mart.) Griseb.,Alternanthera philoxeroides,3084923,ACCEPTED,False,3084923
3,Ameiurus melas,Animalia,under consideration,2340977,"Ameiurus melas (Rafinesque, 1820)",Ameiurus melas,ACCEPTED,SPECIES,EXACT,100,...,708,Chordata,44,SPECIES,"Ameiurus melas (Rafinesque, 1820)",Ameiurus melas,2340977,ACCEPTED,False,2340977
4,Asclepias syriaca,Plantae,under consideration,3170247,Asclepias syriaca L.,Asclepias syriaca,ACCEPTED,SPECIES,EXACT,99,...,412,Tracheophyta,7707728,SPECIES,Asclepias syriaca L.,Asclepias syriaca,3170247,ACCEPTED,False,3170247
5,Symphyotrichum salignum,Plantae,,3151811,Symphyotrichum salignum (Willd.) G.L.Nesom,Symphyotrichum salignum,DOUBTFUL,SPECIES,EXACT,100,...,414,Tracheophyta,7707728,SPECIES,Symphyotrichum salignum (Willd.) G.L.Nesom,Symphyotrichum salignum,3151811,DOUBTFUL,False,3151811
6,Baccharis halimifolia,Plantae,listed,3129663,Baccharis halimifolia L.,Baccharis halimifolia,ACCEPTED,SPECIES,MANUAL,,...,414,Tracheophyta,7707728,SPECIES,Baccharis halimifolia L.,Baccharis halimifolia,3129663,DOUBTFUL,False,3129663
7,Bison bison,Animalia,under consideration,2441176,"Bison bison (Linnaeus, 1758)",Bison bison,ACCEPTED,SPECIES,EXACT,100,...,731,Chordata,44,SPECIES,"Bison bison (Linnaeus, 1758)",Bison bison,2441176,ACCEPTED,False,2441176
8,Cabomba caroliniana,Plantae,listed,2882443,Cabomba caroliniana A. Gray,Cabomba caroliniana,ACCEPTED,SPECIES,EXACT,100,...,636,Tracheophyta,7707728,SPECIES,Cabomba caroliniana A. Gray,Cabomba caroliniana,2882443,ACCEPTED,False,2882443
9,Callosciurus erythraeus,Animalia,listed,2437394,"Callosciurus erythraeus (Pallas, 1779)",Callosciurus erythraeus,ACCEPTED,SPECIES,EXACT,100,...,1459,Chordata,44,SPECIES,"Callosciurus erythraeus (Pallas, 1779)",Callosciurus erythraeus,2437394,ACCEPTED,False,2437394


#### Command line

When working in the command line, the function will take the first argument as input file and the last argument as file to write it to, both combined with the relative path:

```bash
python gbif_species_name_extraction.py ../../species-list/species-list.tsv ../../species-list/species-list-gbif-match.tsv
```

The terms added to the tsv file are the default list as follows:

```python
["usageKey", "scientificName", "canonicalName", "status", "rank", "matchType", "confidence"]

```

### Extract info of individual species

The `extract_gbif_species_names_info` function is useful for a request of a single species/kingdom combination:

In [None]:
extract_gbif_species_names_info("Dinebra panicea (Retz.) P.M. Peterson & N. Snow var. brachiata (Steud.) P.M. Peterson & N. Snow", kingdom="Plantae")