This notebook uses a local dump that I assembled in a MongoDB database of the GAP range data for processing. I found some errant codes in the data that did not align with the code values that I found in the gapproduction package. The notebook checks the data and reports on any cases where there is a code value that does not have a corresponding code definition/term in the dictionary.

In [1]:
from pymongo import MongoClient
from IPython.display import display

In [2]:
mdbClient = MongoClient()
bis = mdbClient["bis"]
gapRangeDataCache = bis["gapRangeData"]

This is the dictionary of codes and values for range data that I pulled from https://github.com/nmtarr/GAPProduction/blob/master/gapproduction/dictionaries.py.

Eventually, we need to move this content to an appropriate master vocabulary source as part of the Biogeographic Information System. I did that with a set of vocabularies [here](https://www.sciencebase.gov/vocab/category/59f60211e4b09d26336e76eb) and tried to use the ScienceBase Vocab API to drive the code that assembles a cache of the range data for a given species in CSV. Unfortunately, the Vocab app seems to be having some issues with how it builds its index, so I could not use that source directly. We may need to find another tech base to move all the BIS vocabularies into if we cannot get those issues resolved.

In [3]:
RangeCodesDict = {"Presence": {1: "Known/extant", 2: "Possibly present", 3: "Potential for presence", 
                               4: "Extirpated/historical presence", 
                               5: "Extirpated purposely (applies to introduced species only)",
                                6: "Occurs on indicated island chain", 7: "Unknown"},
                "Origin": {1: "Native", 2: "Introduced", 3: "Either introducted or native", 
                           4: "Reintroduced", 5: "Either introduced or reintroduced",
                           6: "Vagrant", 7: "Unknown"},
                "Reproduction": {1: "Breeding", 2: "Nonbreeding", 
                                 3: "Both breeding and nonbreeding", 7: "Unknown"},
                 "Season": {1: "Year-round", 2: "Migratory", 3: "Winter", 4: "Summer", 
                            5: "Passage migrant or wanderer", 6: "Seasonal permanence uncertain", 
                            7: "Unknown", 8: "Vagrant"}}

These 4 code blocks check the full set of range data across all 4 taxa groups to see if there are any errant code numbers. You can see the records returned for just a couple of species that need to be examined to see if we made a mistake in assigning codes or need to add new definitions.

In [4]:
for rangeRecord in gapRangeDataCache.find({"intGapPres":{"$nin":list(RangeCodesDict["Presence"].keys())}}):
    display (rangeRecord)

In [5]:
for rangeRecord in gapRangeDataCache.find({"intGapOrigin":{"$nin":list(RangeCodesDict["Origin"].keys())}}):
    display (rangeRecord)

In [6]:
for rangeRecord in gapRangeDataCache.find({"intGapRepro":{"$nin":list(RangeCodesDict["Reproduction"].keys())}}):
    display (rangeRecord)

{'Origin': 'Native',
 'Presence': 'Known/extant',
 'Reproduction': 'Unknown',
 'Season': 'Unknown',
 '_id': ObjectId('59f68b8a3339a20cad26f13b'),
 'intGapOrigin': 1,
 'intGapPres': 1,
 'intGapRepro': 0,
 'intGapSeas': 0,
 'strHUC12RNG': 20600020101,
 'strUC': 'bCOEIx'}

{'Origin': 'Native',
 'Presence': 'Known/extant',
 'Reproduction': 'Unknown',
 'Season': 'Unknown',
 '_id': ObjectId('59f68b8a3339a20cad26f13d'),
 'intGapOrigin': 1,
 'intGapPres': 1,
 'intGapRepro': 0,
 'intGapSeas': 0,
 'strHUC12RNG': 20600020103,
 'strUC': 'bCOEIx'}

In [7]:
for rangeRecord in gapRangeDataCache.find({"intGapSeas":{"$nin":list(RangeCodesDict["Season"].keys())}}):
    display (rangeRecord)

{'Origin': 'Native',
 'Presence': 'Known/extant',
 'Reproduction': 'Unknown',
 'Season': 'Unknown',
 '_id': ObjectId('59f68b8a3339a20cad26f13b'),
 'intGapOrigin': 1,
 'intGapPres': 1,
 'intGapRepro': 0,
 'intGapSeas': 0,
 'strHUC12RNG': 20600020101,
 'strUC': 'bCOEIx'}

{'Origin': 'Native',
 'Presence': 'Known/extant',
 'Reproduction': 'Unknown',
 'Season': 'Unknown',
 '_id': ObjectId('59f68b8a3339a20cad26f13d'),
 'intGapOrigin': 1,
 'intGapPres': 1,
 'intGapRepro': 0,
 'intGapSeas': 0,
 'strHUC12RNG': 20600020103,
 'strUC': 'bCOEIx'}

{'Origin': 'Native',
 'Presence': 'Known/extant',
 'Reproduction': 'Nonbreeding',
 'Season': 'Unknown',
 '_id': ObjectId('59f68d9c3339a20cad302b6f'),
 'intGapOrigin': 1,
 'intGapPres': 1,
 'intGapRepro': 2,
 'intGapSeas': 0,
 'strHUC12RNG': 30202040404,
 'strUC': 'bCOTEx'}