# GBIF mapper

This Python notebook uses the GBIF API to access the specimen data for a selected family (Orchidaceae) collected in a selected country (PG) between two years (1990 and 1999), plotting all points which have coordinates on an interactive zoomable map. It uses two Python libraries:

- **pygbif** to simplify working with the GBFI API (documentation at: https://pygbif.readthedocs.io/en/latest/)
- **folium** to produce the map (documentation at: https://python-visualization.github.io/folium/)

These libraries are specified in `requirements.txt` and the notebook project is set up to install these when the virtual machine is created.

We will use two parts of the GBIF API, the [species API](https://www.gbif.org/developer/species) to translate our selected family name to a key in the GBIF taxonomy and the [occurrence API](https://www.gbif.org/developer/occurrence) to access specimen metadata.

Two `import` statements are required to import these from the `pygbif` package, the second uses an import alias to abbreviate the package name

In [1]:
from pygbif import species
from pygbif import occurrences as occ

Use the species API to translate our family name into a taxonkey:

In [2]:
taxon_key=species.name_suggest(q='Orchidaceae',rank='FAMILY')[0]['key']

Next, set up the rest of the query parameters for the occurrence search. We will save these in variabels so that it is simpler to use these multiple times:

In [3]:
basisOfRecord='PRESERVED_SPECIMEN'
country='PG'
year_min=1990
year_max=1999
year='{},{}'.format(year_min,year_max)
pagesize=300

We will do an initial (restricted) search in order to:

1. Examine the kind of data that we can access from the occurrence API
1. Determine how many occurrence records our query will retrieve

**Note**: the parameter name in the call to `occ.search` is taxon rank dependent - so if you modify the code to look eg for a species the calls to `occ.search` in this and the map generating code below will need to be modified. Currently we are looking for a family, so the parameter is named `familyKey`

In [4]:
initial_record_limit=1
res=occ.search(familyKey=taxon_key, basisOfRecord=basisOfRecord, country=country,hasCoordinate=True,year=year,limit=initial_record_limit)

We can print the results of the API call so that we can see that:
1. We get some metadata about the results of the query, and how to access the full resultset: 
    - `count` indicates the number of records matched
    - There are fields for `offset` and `limit`
    - Also a flag for `endOfRecords`
1. Each occurrence record includes the fields shown in the GBIF data portal

In [5]:
res

{'offset': 0,
 'limit': 1,
 'endOfRecords': False,
 'count': 1950,
 'results': [{'key': 2513806628,
   'datasetKey': '15f819bd-6612-4447-854b-14d12ee1022d',
   'publishingOrgKey': '396d5f30-dea9-11db-8ab4-b8a03c50a862',
   'installationKey': '65eb93af-41b8-4917-a5fc-cfe2a45aa4bf',
   'publishingCountry': 'NL',
   'protocol': 'DWC_ARCHIVE',
   'lastCrawled': '2020-01-15T12:50:23.918+0000',
   'lastParsed': '2020-01-15T13:37:04.066+0000',
   'crawlId': 129,
   'extensions': {},
   'basisOfRecord': 'PRESERVED_SPECIMEN',
   'taxonKey': 2819708,
   'kingdomKey': 6,
   'phylumKey': 7707728,
   'classKey': 196,
   'orderKey': 1169,
   'familyKey': 7689,
   'genusKey': 2819708,
   'acceptedTaxonKey': 2819708,
   'scientificName': 'Glomera Blume',
   'acceptedScientificName': 'Glomera Blume',
   'kingdom': 'Plantae',
   'phylum': 'Tracheophyta',
   'order': 'Asparagales',
   'family': 'Orchidaceae',
   'genus': 'Glomera',
   'genericName': 'Glomera',
   'taxonRank': 'GENUS',
   'taxonomicStatus

The GBIF occurrence API is limited to return no more than 300 records in each call, to access more data than this requires paging. Here we calculate how many pages we will need to access the full dataset:

In [6]:
# Calculate how many pages of results we need:
total=res['count']
pages=total//pagesize
if total%pagesize != 0:
    pages += 1
print('{} total records means {} pages'.format(total, pages))

1950 total records means 7 pages


This next part generates the map, so first `import`s the necessary `folium` packages. 

In [7]:
import folium
from folium.plugins import MarkerCluster

We then make paged calls to the GBIF API, and for each page of results, loop over the contained specimen records and read off their lat/long coordinates. These are used to plot markers on the map.  Each marker is shown with a pop-up which is a link to the full data record in the GBIF data portal:

In [8]:
latlong=[res['results'][0]['decimalLatitude'],res['results'][0]['decimalLongitude']]
m = folium.Map(latlong, zoom_start=5)
marker_cluster = MarkerCluster()
for page in range(0,pages):    
    print('{}/{}'.format(page+1, pages),end='\r')
    res=occ.search(familyKey=taxon_key, basisOfRecord=basisOfRecord, country=country,hasCoordinate=True,year=year,offset=page*pagesize)
    for result in res['results']:
        if 'locality' in result.keys():
            popuptext='<a href="http://gbif.org/occurrence/{gbifid}">{locality}</a>'.format(gbifid=result['gbifID'],locality=result['locality'])
        elif 'verbatimLocality' in result.keys():
            popuptext='<a href="http://gbif.org/occurrence/{gbifid}">{locality}</a>'.format(gbifid=result['gbifID'],locality=result['verbatimLocality'])
        else:
            popuptext='<a href="http://gbif.org/occurrence/{gbifid}">{gbifid}</a>'.format(gbifid=result['gbifID'])
        folium.Marker([result['decimalLatitude'], result['decimalLongitude']],
                         popup=popuptext,
                        icon=folium.Icon(color='darkblue')).add_to(marker_cluster)
m.add_child(marker_cluster)
m

7/7