# GBIF API request for species naming explanation

In [1]:
import json
import requests

import pandas as pd

### Introduction: example request for info

The requirements are discussed in the following comment:
https://github.com/LifeWatchINBO/invasive-t0-occurrences/issues/6 and an example is included. 

Let us test the example by making a request:

In [9]:
temp = requests.get("http://api.gbif.org/v1/species/match?verbose=false&kingdom=Plantae&name=Heracleum%20mantegazziaum&strict=false")

In [10]:
temp.json()

{'canonicalName': 'Heracleum mantegazzianum',
 'class': 'Magnoliopsida',
 'classKey': 220,
 'confidence': 96,
 'family': 'Apiaceae',
 'familyKey': 6720,
 'genus': 'Heracleum',
 'genusKey': 3034824,
 'kingdom': 'Plantae',
 'kingdomKey': 6,
 'matchType': 'FUZZY',
 'order': 'Apiales',
 'orderKey': 1351,
 'phylum': 'Tracheophyta',
 'phylumKey': 7707728,
 'rank': 'SPECIES',
 'scientificName': 'Heracleum mantegazzianum Sommier & Levier',
 'species': 'Heracleum mantegazzianum',
 'speciesKey': 3034825,
 'status': 'ACCEPTED',
 'synonym': False,
 'usageKey': 3034825}

### Development of the building blocks of our small application

Two important functions are available to extract the information:
* A function that can do a request with the given species/kingdom combination: `extract_gbif_species_names_info`
* A function that can iteratively do this for every specie in the provided list on https://github.com/LifeWatchINBO/invasive-t0-occurrences/blob/master/species-list/species-list.tsv `extract_species_information`

Importing them makes it available:

In [39]:
from gbif_species_name_match import (extract_gbif_species_names_info, 
                                     extract_species_information,
                                     extract_gbif_accepted_key)

### Extract info from GBIF about provided species

`extract_gbif_species_names_info` provides the funcionality to request species info from the GBIF API. It can be used as python function or from the command line:

#### Python function:

In [28]:
!cat sample.tsv

name,kingdom,euConcernStatus
Alopochen aegyptiaca,Animalia,under consideration
Cotoneaster ganghobaensis,Plantae,
Cotoneaster hylmoei,Plantae,
Cotoneaster x suecicus,Plantae,
Euthamia graminifolia,Plantae,under preparation


In [33]:
updated_tsv = extract_species_information("sample.csv", 
                                          output=None,                              
                                          api_terms=["usageKey", 
                                                     "scientificName", 
                                                     "canonicalName",
                                                     "status", 
                                                     "rank", 
                                                     "matchType", 
                                                     "confidence"],
                                         )

Using columns name and kingdom for API request.


In [34]:
updated_tsv.head()

Unnamed: 0,name,kingdom,euConcernStatus,gbifapi_usageKey,gbifapi_scientificName,gbifapi_canonicalName,gbifapi_status,gbifapi_rank,gbifapi_matchType,gbifapi_confidence,gbifapi_acceptedKey,gbifapi_acceptedScientificName
0,Alopochen aegyptiaca,Animalia,under consideration,2498252,"Alopochen aegyptiaca (Linnaeus, 1766)",Alopochen aegyptiaca,ACCEPTED,SPECIES,EXACT,100,2498252,"Alopochen aegyptiaca (Linnaeus, 1766)"
1,Cotoneaster ganghobaensis,Plantae,,3025989,Cotoneaster ganghobaensis J. Fryer & B. Hylmö,Cotoneaster ganghobaensis,SYNONYM,SPECIES,EXACT,100,3026007,Cotoneaster lambertii G. Klotz
2,Cotoneaster hylmoei,Plantae,,3025758,Cotoneaster hylmoei K.E. Flinck & J. Fryer,Cotoneaster hylmoei,SYNONYM,SPECIES,EXACT,100,3026076,Cotoneaster salicifolius var. rugosus (E. Prit...
3,Cotoneaster x suecicus,Plantae,,3026040,Cotoneaster ×suecicus G. Klotz,Cotoneaster suecicus,ACCEPTED,SPECIES,EXACT,100,3026040,Cotoneaster ×suecicus G. Klotz
4,Euthamia graminifolia,Plantae,under preparation,3092782,Euthamia graminifolia (L.) Nutt.,Euthamia graminifolia,DOUBTFUL,SPECIES,EXACT,99,3092782,Euthamia graminifolia (L.) Nutt.


The available options are:
* output : if None, nothing is written to output, pandas DataFrame is returnes; if string, the output is written to tsv file
* api_terms : Either a list of existing terms or just provide 'all' if interested in all the information 

In [36]:
extract_species_information("sample.csv", output=None, api_terms="all")

Using columns name and kingdom for API request.


Unnamed: 0,name,kingdom,euConcernStatus,gbifapi_acceptedKey,gbifapi_acceptedScientificName,gbifapi_canonicalName,gbifapi_class,gbifapi_classKey,gbifapi_confidence,gbifapi_family,...,gbifapi_orderKey,gbifapi_phylum,gbifapi_phylumKey,gbifapi_rank,gbifapi_scientificName,gbifapi_species,gbifapi_speciesKey,gbifapi_status,gbifapi_synonym,gbifapi_usageKey
0,Alopochen aegyptiaca,Animalia,under consideration,2498252,"Alopochen aegyptiaca (Linnaeus, 1766)",Alopochen aegyptiaca,Aves,212,100,Anatidae,...,1108,Chordata,44,SPECIES,"Alopochen aegyptiaca (Linnaeus, 1766)",Alopochen aegyptiaca,2498252,ACCEPTED,False,2498252
1,Cotoneaster ganghobaensis,Plantae,,3026007,Cotoneaster lambertii G. Klotz,Cotoneaster ganghobaensis,Magnoliopsida,220,100,Rosaceae,...,691,Tracheophyta,7707728,SPECIES,Cotoneaster ganghobaensis J. Fryer & B. Hylmö,Cotoneaster lambertii,3026007,SYNONYM,True,3025989
2,Cotoneaster hylmoei,Plantae,,3026076,Cotoneaster salicifolius var. rugosus (E. Prit...,Cotoneaster hylmoei,Magnoliopsida,220,100,Rosaceae,...,691,Tracheophyta,7707728,SPECIES,Cotoneaster hylmoei K.E. Flinck & J. Fryer,Cotoneaster salicifolius,3026074,SYNONYM,True,3025758
3,Cotoneaster x suecicus,Plantae,,3026040,Cotoneaster ×suecicus G. Klotz,Cotoneaster suecicus,Magnoliopsida,220,100,Rosaceae,...,691,Tracheophyta,7707728,SPECIES,Cotoneaster ×suecicus G. Klotz,Cotoneaster suecicus,3026040,ACCEPTED,False,3026040
4,Euthamia graminifolia,Plantae,under preparation,3092782,Euthamia graminifolia (L.) Nutt.,Euthamia graminifolia,Magnoliopsida,220,99,Asteraceae,...,414,Tracheophyta,7707728,SPECIES,Euthamia graminifolia (L.) Nutt.,Euthamia graminifolia,3092782,DOUBTFUL,False,3092782


#### Command line

When working in the command line, the function will take the first argument as input file and the last argument as file to write it to, both combined with the relative path:

```bash
python gbif_species_name_extraction.py sample.csv sample_dump.csv
```

The terms added to the tsv file are the default list as follows:

```python
["usageKey", "scientificName", "canonicalName", "status", "rank", "matchType", "confidence"]

```

### Extract info of individual species

The `extract_gbif_species_names_info` function is useful for a request of a single species/kingdom combination:

In [37]:
extract_gbif_species_names_info("Dinebra panicea (Retz.) P.M. Peterson & N. Snow var. brachiata (Steud.) P.M. Peterson & N. Snow", kingdom="Plantae")

{'canonicalName': 'Dinebra panicea brachiata',
 'class': 'Liliopsida',
 'classKey': 196,
 'confidence': 100,
 'family': 'Poaceae',
 'familyKey': 3073,
 'genus': 'Leptochloa',
 'genusKey': 2703864,
 'kingdom': 'Plantae',
 'kingdomKey': 6,
 'matchType': 'EXACT',
 'order': 'Poales',
 'orderKey': 1369,
 'phylum': 'Tracheophyta',
 'phylumKey': 7707728,
 'rank': 'VARIETY',
 'scientificName': 'Dinebra panicea var. brachiata (Steud.) P.M.Peterson & N.Snow',
 'species': 'Leptochloa panicea',
 'speciesKey': 2703872,
 'status': 'SYNONYM',
 'synonym': True,
 'usageKey': 8306067}

The `extract_gbif_accepted_key` fucntion is useful to get the acceptedKey corresponding to any usage key of GBIF:

In [40]:
extract_gbif_accepted_key(3025758)

(3026076,
 'Cotoneaster salicifolius var. rugosus (E. Pritz.) Rehd. & E.H. Wilson')