This notebook is some work in progress that compares the methods used in my derivation of Abby's original name processing code that used the R-Taxize package in an interactive way. The query looks for cases where Abby's process found a match, but my process, designed to be run in a completely automated way, did not. I used this to tease out some remaining pesky issues in the ITIS notebook for the TIR. When I left this off, there were only 29 names that the new process did not work out.

In reviewing these remaining cases, I found the following issues that I don't think are resolvable but that I also don't think are showstoppers for running this in an automated way:

* There are cases at higher taxonomic levels where there is more than one taxon that is "valid" or "accepted" in ITIS. There are other annotation properties that indicate the level of completeness. Using a human-driven process with R Taxize or something like it, a user can select based on this annotation. My current process moves on and records a negative result if there is more than one match based on a search that now constrains to "valid" or "accepted." While we could tweak the algorithm to look for clues in the completeness properties and make a reasonable selection, I don't think it's worth adding that complexity for the relative rare cases where it would come up.

* There are cases where it looks like Abby might have done some additional work as the script was running. Some names do not come up in any kind of search. Following the selections that I think Abby made interactively, I can see that Genus names changed or other things are quite different. We might eventually deal with this in the Taxonomic Information Registry by building some kind of annotation structure that would link names we encounter in the registry to something we know that is not captured in a taxonomic authority. That could then be fed into processing algorithms to use in going after improved information. There are also probably some more complex searches we could run by going to the Genus or other higher taxonomy level and then exploring the tree, but the gain in additional name matches doesn't seem worth it at this time.

* There are a couple cases of additional scientific name shorthand that Abby cleaned up (e.g., "Procambarus h. hagenianus" to "Procambarus hagenianus hagenianus"). We could also see about handling this in the cleaning function, but it will be a little challenging to tease out those cases from those where the intent of abbreviated strings in the name means something different. Again, the gain is not worth it at this time.

In [1]:
import requests,configparser
from IPython.display import display


In [2]:
# Get API keys and any other config details from a file that is external to the code.
config = configparser.RawConfigParser()
config.read_file(open(r'../config/stuff.py'))

In [3]:
# Build base URL with API key using input from the external config.
def getBaseURL():
    gc2APIKey = config.get('apiKeys','apiKey_GC2_BCB').replace('"','')
    apiBaseURL = "https://gc2.mapcentia.com/api/v1/sql/bcb?key="+gc2APIKey
    return apiBaseURL

In [22]:
q = "SELECT DISTINCT ON (taxonomicauthorityid_accepted) \
    taxonomicauthorityid_accepted, scientificname_submitted, scientificname_cleaned \
    FROM public.sgcn \
    WHERE taxonomicauthorityid_accepted LIKE '%itis%' AND \
    taxonomicauthorityid_accepted NOT IN \
    (SELECT taxonomicauthorityid FROM sgcn.sgcn_nationallist)"
r = requests.get(getBaseURL()+"&q="+q).json()

recordCount = 0

for feature in r["features"]:
    print ("ITIS ID: "+feature["properties"]["taxonomicauthorityid_accepted"])
    print ("Submitted name: "+feature["properties"]["scientificname_submitted"])
    print ("Abby's clean name: "+feature["properties"]["scientificname_cleaned"])
    print ("----")
#    print (requests.get(getBaseURL()+"&q=UPDATE tir.tir2 SET itis = Null WHERE registration->'SGCN_ScientificName_Submitted' = '"+feature["properties"]["scientificname_submitted"]+"'").json())
    recordCount = recordCount + 1

print (recordCount)

ITIS ID: http://services.itis.gov/?q=tsn:100825
Submitted name: Baetis brunneicolor
Abby's clean name: Baetis brunneicolor
----
ITIS ID: http://services.itis.gov/?q=tsn:101584
Submitted name: Tortopus circumfluus
Abby's clean name: Tortopus circumfluus
----
ITIS ID: http://services.itis.gov/?q=tsn:101869
Submitted name: Erythemis vesiculosa
Abby's clean name: Erythemis vesiculosa
----
ITIS ID: http://services.itis.gov/?q=tsn:102026
Submitted name: Cordulegastridae
Abby's clean name: Cordulegastridae
----
ITIS ID: http://services.itis.gov/?q=tsn:102802
Submitted name: Taeniopteryx starki
Abby's clean name: Taeniopteryx starki
----
ITIS ID: http://services.itis.gov/?q=tsn:103067
Submitted name: Isoperla sagittata
Abby's clean name: Isoperla sagittata
----
ITIS ID: http://services.itis.gov/?q=tsn:110182
Submitted name: Pseudanophthalmus grandis orthosulc
Abby's clean name: Pseudanophthalmus grandis orthosulc
----
ITIS ID: http://services.itis.gov/?q=tsn:115700
Submitted name: Hydroptila m