# Assessing species thermotolerance
In this notebook, we'll explore the use of APIs for accessing climate and species occurrence data to automate the determination of whether a species is thermotolerant.

In [1]:
import pandas as pd
import requests
import json
import io
import sys
sys.path.append('../utils/')
from functions_query_from_species_list import *

## Reading in our species list

In [42]:
species = pd.read_csv('../data/species_info_full.csv')
species.head()

Unnamed: 0,abbreviated_species,species_gbif,species,common_name,original_source,thermotolerance,reference,notes
0,A. americanus,Acorus americanus,Acorus americanus,,phytozome,,,
1,A. tequilanavar,Agave tequilana,Agave tequilana,,phytozome,,,
2,A. officinalis,Althaea officinalis,Althaea officinalis,marsh-mallow,phytozome,,,
3,A. linifolium,Alyssum linifolium,Alyssum linifolium,,phytozome,,,
4,A. hypochondriacus,Amaranthus hypochondriacus,Amaranthus hypochondriacus,,phytozome,,,


## Getting occurrence data
To do this, we'll use the [GBIF](https://www.gbif.org/) database's [python package](https://pygbif.readthedocs.io/en/latest/index.html). Specifically, we'll follow the example [on this page](https://data-blog.gbif.org/post/downloading-long-species-lists-on-gbif/) to get the lat/lon for species records.

First, we need to match the taxon names from our dataframe with the numeric keys usd by GBIF. We'll filter to only keep exact matches, and then quantify how many of our species were successfully matched.

In [43]:
taxon_keys = match_species(species, nameCol="species_gbif")

In [44]:
taxon_keys_filtered = taxon_keys[(taxon_keys['matchType'] == 'EXACT') & (taxon_keys['status'] != 'DOUBTFUL')]
num_confident = 0
num_missing = 0
num_ambiguous = 0
for sp in species.species:
    num_keys = len(taxon_keys_filtered[taxon_keys_filtered["canonicalName"] == sp])
    input_name = species[species.species == sp].species_gbif.values[0]
    if num_keys == 0:
        num_missing += 1
        print(f'Species {sp} (input name {input_name}) does not have a numeric key')
    elif num_keys == 1:
        num_confident += 1
    elif num_keys > 1:
        statuses = taxon_keys_filtered[taxon_keys_filtered["canonicalName"] == sp].status.tolist()
        if 'ACCEPTED' not in statuses:
            num_ambiguous += 1
            # Choose the first one if they're all listed as synonyms
            to_drop = taxon_keys_filtered[taxon_keys_filtered["canonicalName"] == sp].index.tolist()[1:]
            taxon_keys_filtered = taxon_keys_filtered.drop(to_drop)
        else:
            # Keep only the accepted key
            to_drop = taxon_keys_filtered[(taxon_keys_filtered['canonicalName'] == sp) & ~(taxon_keys_filtered['status'] == 'ACCEPTED')].index
            taxon_keys_filtered = taxon_keys_filtered.drop(to_drop)
            num_confident += 1
print(f'\n\nThere are {num_confident} confident keys, {num_ambiguous} ambiguous keys, and {num_missing} missing keys. The first key for ambiguous species was arbitrarily chosen.')

Species Agave tequilana  (input name Agave tequilana ) does not have a numeric key
Species Beta vulgaris subsp. vulgaris (input name Beta vulgaris) does not have a numeric key
Species Brassica oleracea var. capitata (input name Brassica oleracea) does not have a numeric key
Species Camelina sativa var. DH55 (input name Camelina sativa) does not have a numeric key
Species Corylus americana var. rush (input name Corylus americana) does not have a numeric key
Species Populus nigra x maximowiczii (input name Populus nigra) does not have a numeric key
Species Populus tremula x Populus alba (input name Populus tremula) does not have a numeric key
Species Saccharum officinarum x spontaneum (input name Saccharum officinarum) does not have a numeric key
Species Ziziphus spina christi (input name Ziziphus spina) does not have a numeric key


There are 152 confident keys, 3 ambiguous keys, and 9 missing keys. The first key for ambiguous species was arbitrarily chosen.


Now we need to formulate and make the API request to get the location data for the species with keys.

In [45]:
key_list = taxon_keys_filtered.usageKey.tolist()
login = ""
password = ""
# Make download query
download_query = {}
download_query["creator"] = ""
download_query["notificationAddresses"] = [""]
download_query["sendNotification"] = False # if set to be True, don't forget to add a notificationAddresses above
download_query["format"] = "SIMPLE_CSV"
download_query["predicate"] = {
    "type": "in",
    "key": "TAXON_KEY",
    "values": key_list
}
# Generate download
create_download_given_query(login, password, download_query)

<Response [401]>


<Response [401]>

In [31]:
create_download_given_query(login, password, download_query).__dict__

<Response [401]>


{'_content': b'',
 '_content_consumed': True,
 '_next': None,
 'status_code': 401,
 'headers': {'Vary': 'Origin, Access-Control-Request-Method, Access-Control-Request-Headers', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '0', 'Pragma': 'no-cache', 'Expires': '0', 'X-Frame-Options': 'DENY', 'Content-Length': '0', 'Date': 'Tue, 29 Jul 2025 18:49:16 GMT', 'Cache-Control': 'no-cache, no-store, must-revalidate', 'X-Varnish': '846922285', 'Age': '0', 'Via': '1.1 varnish (Varnish/6.6)', 'Connection': 'keep-alive'},
 'raw': <urllib3.response.HTTPResponse at 0x7f1fe624cdc0>,
 'url': 'http://api.gbif.org/v1/occurrence/download/request',
 'encoding': None,
 'history': [],
 'reason': 'Unauthorized',
 'cookies': <RequestsCookieJar[]>,
 'elapsed': datetime.timedelta(microseconds=273799),
 'request': <PreparedRequest [POST]>,
 'connection': <requests.adapters.HTTPAdapter at 0x7f1fe6212aa0>}

In [29]:
download_query

{'creator': '',
 'notificationAddresses': [''],
 'sendNotification': False,
 'format': 'SIMPLE_CSV',
 'predicate': {'type': 'in',
  'key': 'TAXON_KEY',
  'values': [7795888,
   2930137,
   5285951,
   3133927,
   3045203,
   7499732,
   5290143,
   7219519,
   7068845,
   5372392,
   8324121,
   3052680,
   3934083,
   7560481,
   5290052,
   2926379,
   2982583,
   5359660,
   3052436,
   3084711,
   5375920,
   2949728,
   5341297,
   2651591,
   5373124,
   2705081,
   5415040,
   5350523,
   3933330,
   5289698,
   2594602,
   3152666,
   4097481,
   2964167,
   3043392,
   5823097,
   5281381,
   4142653,
   5350466,
   2704161,
   5357027,
   5418431,
   5373475,
   2972043,
   5421389,
   3933091,
   7911813,
   2974832,
   4932035,
   2703316,
   5353583,
   2965280,
   5361880,
   2895345,
   8149923,
   2863967,
   5288819,
   6026968,
   2754476,
   2669001,
   3152205,
   3001244,
   3040232,
   2705116,
   2775744,
   3049314,
   2706056,
   2985993,
   10961835,
   288838

In [11]:
splist = ['Cyanocitta stelleri', 'Junco hyemalis', 'Aix sponsa',
  'Ursus americanus', 'Pinus conorta', 'Poa annuus']
keys = [ species.name_backbone(x)['usageKey'] for x in splist ]

In [4]:
keys

[2482598, 9362842, 2498387, 2433407, 5285750, 2704179]

In [7]:
out = [ occ.search(taxonKey = x, limit=0) for x in keys ]

In [18]:
from pygbif import occurrences as occ
occ.search(taxonKey = 3329049)
occ.get(key = 3329049)
occ.count(isGeoreferenced = True)
occ.download('basisOfRecord = PRESERVED_SPECIMEN')
occ.download('taxonKey = 3119195')
occ.download('decimalLatitude > 50')
occ.download_list(user = "sckott", limit = 5)
occ.download_meta(key = "0000099-140929101555934")
occ.download_get("0000066-140928181241064")
occ.download_citation("0002526-241107131044228")
occ.download_describe("simpleCsv")
occ.download_sql("SELECT gbifid,countryCode FROM occurrence WHERE genusKey = 2435098")

HTTPError: 404 Client Error: Not Found for url: https://api.gbif.org/v1/occurrence/3329049