# German Mammals GBIF

Using the Data from GBIF to create the list of German Mammals

In [None]:
from pygbif import species as species
from pygbif import occurrences as occ 

# we refer to the occurrence module as occ

import os

# Create Criditentials
# Set GBIF credentials
os.environ["GBIF_USER"] = "your_username"
os.environ["GBIF_PWD"] = "your_password"
os.environ["GBIF_EMAIL"] = "your_email@example.com"

Finding the keys for the different taxon levels can be a little tricky if you do not know where or how to look.

For example here we want to find mammals in Germany. We can check the documentation of occ.search() on by following this link: https://pygbif.readthedocs.io/en/latest/docs/usecases.html

There we find that our class key is an integer "classKey – [int] Class classification key". But we do not have a list of classes and there corrisponding integers. Which is confusing (if this exists please send me the link). If you are like me you will try entering a string like 'mammalia' anyway only to get a trackback. So what you can do instead is to the URL for the taxon level of interest and pull the key from there for mammals that is this https://www.gbif.org/species/359. the classKey = 359. 

occ.search() allows us to specify several parameters. Including country, here the documentation is fairly straight forward and is as follows: 'country – [str] The 2-letter country code (as per ISO-3166-1) of the country in which the occurrence was recorded. See here http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2´ 

We can visit the link and find that the two letter string for Germany is 'DE', or that Samoa is 'WS'. 

There are several other arguments that are imporant
1. limit-which gives the number of returned records the default is 300 and the limit is 1000.
2. offset-which indicates where to start from
3. q - allows search with a word or phrase

Alright now lets try it by looking at the first 10 mammal records.

In [7]:
occ.search(classKey=359,country='DE', limit=10)

{'offset': 0,
 'limit': 10,
 'endOfRecords': False,
 'count': 767703,
 'results': [{'key': 5028885891,
   'datasetKey': 'aa6c5ee6-d4d7-4a65-a04f-379cffbf4842',
   'publishingOrgKey': '2754e9c0-0e43-4f65-968a-6f16b9c378ce',
   'installationKey': 'dcceb601-2fb0-49dc-9cd2-7c00056f2b2c',
   'hostingOrganizationKey': '2754e9c0-0e43-4f65-968a-6f16b9c378ce',
   'publishingCountry': 'DE',
   'protocol': 'BIOCASE',
   'lastCrawled': '2025-05-09T16:19:17.871+00:00',
   'lastParsed': '2025-05-09T16:32:59.595+00:00',
   'crawlId': 345,
   'extensions': {},
   'basisOfRecord': 'HUMAN_OBSERVATION',
   'occurrenceStatus': 'PRESENT',
   'taxonKey': 5220126,
   'kingdomKey': 1,
   'phylumKey': 44,
   'classKey': 359,
   'orderKey': 731,
   'familyKey': 5298,
   'genusKey': 2440927,
   'speciesKey': 5220126,
   'acceptedTaxonKey': 5220126,
   'scientificName': 'Capreolus capreolus (Linnaeus, 1758)',
   'acceptedScientificName': 'Capreolus capreolus (Linnaeus, 1758)',
   'kingdom': 'Animalia',
   'phylum

The above output is not particularly readable, nor is it in the table format I would like for a list of species, we also have no idea how many records exist. 

To find the number of records lets use occ.count() however, here there is not argument classKey instead we use taxonKey

In [5]:
occ.count(taxonKey=359,country='DE')

767703

76,703 records is a lot. 

In [8]:
occ.count(taxonKey=359,country='DE', isGeoreferenced=True)

690231

We can see there are fewer records if we specify that we need the recored to be georeferanced.

Lets see if we can download the data and then simplify the output.

An interesting quirk of the occ.download() method is that filters need to be passed as parameters. Using either 

In [None]:
download_key = occ.download(
    [
        'taxonKey = 359',
        'country = DE',
        'hasCoordinate = true',
        'hasGeospatialIssue = false'
    ],
    format="DWCA"  # or "SIMPLE_CSV", "SPECIES_LIST"
)

ValueError: GBIF_USER not supplied and no entry in environmental
                           variables