This notebook evaluates and populates a new SGCN-specific set of information in the TIR with a logical taxonomic group value to use in the SGCN National List and elsewhere. It uses a table of logical mappings from the SGCN schema (sgcn.adjustments) from submitted taxonomic groups to those that we want to use for consistency and clarity in the National List. We will update this process in future to build the mapping based on taxonomy from the taxonomic authorities (e.g., "aves" to "birds"), but the data structures in the TIR and SGCN will remain consistent.

In [1]:
import requests
import pandas as pd
from bis2 import gc2

In [2]:
# Retrieve the logical mappings from the config file
sb_sgcnCollectionItem = requests.get("https://www.sciencebase.gov/catalog/item/56d720ece4b015c306f442d5?format=json&fields=files").json()

for file in sb_sgcnCollectionItem["files"]:
    if file["title"] == "Configuration:Taxonomic Group Mappings":
        tgMappings = pd.read_table(file["url"], sep=",", encoding="utf-8")

for index, row in tgMappings.iterrows():
    providedName = str(row["ProvidedName"])
    preferredName = str(row["PreferredName"])
    preferredNamePair = '"taxonomicgroup"=>"'+preferredName+'"'
    print (providedName, preferredName)
    q_updateGroups = "UPDATE tir.tir SET sgcn = '"+preferredNamePair+"' WHERE registration->'taxonomicgroups' LIKE '%"+providedName+"%' OR registration->'taxonomicgroups' LIKE '%"+preferredName+"%'"
    r = requests.get(gc2.sqlAPI("DataDistillery","BCB")+"&q="+q_updateGroups).json()
    
    # Deal with really stupid problem with "Ec" and "Ce" are throwing a fit with PostgreSQL
    # Strip first character from the provided name and try the query again
    while "message" in r.keys():
        providedName = providedName[1:]
        print (providedName, preferredName)
        q_updateGroups = "UPDATE tir.tir SET sgcn = '"+preferredNamePair+"' WHERE registration->'taxonomicgroups' LIKE '%"+providedName+"%' OR registration->'taxonomicgroups' LIKE '%"+preferredName+"%'"
        r =  requests.get(gc2.sqlAPI("DataDistillery","BCB")+"&q="+q_updateGroups).json()
        
    

Amphibian Amphibians
Arthropod Other Invertebrates
Bird Birds
Bivalves Mollusks
Bryophytes Plants
Cnidarians Other Invertebrates
Flatworm Other Invertebrates
Freshwater Mussel Mollusks
Gastropods Mollusks
Insect Insects
Invertebrate Other Invertebrates
Invertebrates Other Invertebrates
Mammal Mammals
Mite Arachnids
Mussel Mollusks
Myriapods Other Invertebrates
Plant Plants
Poriferans Other Invertebrates
Reptile Reptiles
Snail Mollusks
Spider Arachnids
Vascular Plants Plants
Worms Other Invertebrates
Fishes Fish
Echinoderms Other Invertebrates
chinoderms Other Invertebrates
Cephalopods Other Invertebrates
ephalopods Other Invertebrates


This last process updates any remaining taxonomic group values to "other" to catch any corner cases. We'll eventually come back and add a few more mappings to deal with these.

In [3]:
otherGroupPair = '"taxonomicgroup"=>"other"'
q_updateOther = "UPDATE tir.tir SET sgcn = '"+otherGroupPair+"' WHERE sgcn IS NULL"
r = requests.get(gc2.sqlAPI("DataDistillery","BCB")+"&q="+q_updateOther).json()
print (r)

{'success': True, '_execution_time': 0.064, 'affected_rows': 0, 'auth_check': {'success': True, 'session': None, 'auth_level': None}}
