This notebook explores the records retrieved from the consultation between the WLCI scientific species name list and the ITIS API. The cached ITIS results are examined and next steps are determined.

In [1]:
#Import needed packages
import json
import bispy
import requests
from IPython.display import display
from joblib import Parallel, delayed

bis_utils = bispy.bis.Utils()

In [2]:
#Open cache ITIS list created from WLCI Species List from Literature.ipynb
with open("cache/itis.json", "r") as f:
    itis_cache = json.loads(f.read())

In [3]:
#Print ITIS list 
itis_cache

[{'processing_metadata': {'status': 'success',
   'date_processed': '2019-08-28T22:13:01.784573',
   'status_message': 'Exact Match',
   'details': [{'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=nameWOInd:Setophaga\\%20discolor'}]},
  'parameters': {'Scientific Name': 'Setophaga discolor'},
  'itis_data': [{'tsn': '950052',
    'nameWInd': 'Setophaga discolor',
    'nameWOInd': 'Setophaga discolor',
    'unit1': 'Setophaga',
    'unit2': 'discolor',
    'usage': 'valid',
    'credibilityRating': 'TWG standards met',
    'taxonAuthor': '(Vieillot, 1809)',
    'kingdom': 'Animalia',
    'parentTSN': '178978',
    'rankID': '220',
    'rank': 'Species',
    'synonyms': ['950052:$Dendroica discolor$'],
    'synonymTSNs': ['950052:$178918$'],
    '_version_': 1641159186509201413,
    'date_created': '2014-09-17 11:43:16',
    'date_modified': '2014-09-17 00:00:00',
    'geographicDivision': [{'geographic_value': 'North America',
      'update_date': '2014-09-17 00:00:00'},
 

In some cases more than one ITIS record was returned from the function. In every case, the “processing_metadata” structure indicates that the scientific species names used at the point of discovery were accepted. However, the "itis_data" structure indicates that potentially invalid names were provided from the WLCI literature database or xDD source information yet have a valid record in ITIS. The following code block lets us examine what is going on in these cases. 

The “processing_metadata” structure provides information about what the function does. It includes the URLs to the ITIS API that resulted in some action. Both the valid/accepted and invalid/unaccepted names from ITIS are recorded. We reach back to the itis_cahce to show that record.

In [4]:
#Select ITIS records with more than one record 
validate =[i for i in itis_cache if "itis_data" in i.keys() and len(i["itis_data"]) > 1]

In [5]:
validate

[{'processing_metadata': {'status': 'success',
   'date_processed': '2019-08-28T22:13:02.362286',
   'status_message': 'Followed Accepted TSN',
   'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:503283'},
    {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=nameWOInd:Bassia\\%20prostrata'}]},
  'parameters': {'Scientific Name': 'Bassia prostrata'},
  'itis_data': [{'tsn': '503283',
    'nameWInd': 'Kochia prostrata',
    'nameWOInd': 'Kochia prostrata',
    'unit1': 'Kochia',
    'unit2': 'prostrata',
    'usage': 'accepted',
    'credibilityRating': 'TWG standards met',
    'taxonAuthor': '(L.) Schrad.',
    'kingdom': 'Plantae',
    'parentTSN': '20693',
    'rankID': '220',
    'rank': 'Species',
    'synonyms': ['503283:$Bassia prostrata$Kochia prostrata villosissima$'],
    'synonymTSNs': ['503283:$822851$822929$'],
    '_version_': 1641159123757170695,
    'date_created': '1996-11-04 17:57:08',
    'date_modified': '2011-08-30 00:00:00',


This list indicates a few things:
1. There is one case where a source name was misspelled, Equus burchelli should have been Equus quagga burchellii. In this case, the search used the TSN of the misspelled species name to correctly identify the species. ITIS considers the record containing the misspelled name to be invalid and the other record valid, even though they are for the same name.  
2. In three cases, the names provided were considered invalid or not accepted because ITIS considered the names to be synonyms or junior synonyms of the correct species names. 
3. In one case, ITIS considered the name provided to be invalid because it was an  “original name/combination”.
4. In one case, ITIS considered the name provided to be invalid because it was a “subsequent name/combination”.
    
In the case of any disagreement between WLCI scientists and the taxonomic authority, both the invalid/not accepted and valid/accepted ITIS names from the WLCI species names list will be used for further consultations with additional systems. Two seperate lists, invalid_itis and valid_itis, will be used to distinguish the invalid/not accepted and valid/accepted ITIS names in these consultations. Any information that results from consultations using the invalid_itis list will be cached separately and evaluated for their utility.  

In [6]:
#Create list containing sciencitific names and ITIS usage classifications
itis_explore = list()
for itis_doc_set in itis_cache:
    for itis_doc in itis_doc_set["itis_data"]:
        itis_explore.append({"scientific_name": itis_doc["nameWInd"],"itis_usage": itis_doc["usage"]})

In [7]:
#Display the ITIS list of species names and ITIS usage (valid/invalid and accepted/not accepted)
itis_explore

[{'scientific_name': 'Setophaga discolor', 'itis_usage': 'valid'},
 {'scientific_name': 'Pascopyrum smithii', 'itis_usage': 'accepted'},
 {'scientific_name': 'Lasiurus cinereus semotus', 'itis_usage': 'valid'},
 {'scientific_name': 'Pseudoroegneria spicata', 'itis_usage': 'accepted'},
 {'scientific_name': 'Cronartium ribicola', 'itis_usage': 'accepted'},
 {'scientific_name': 'Lepidochelys olivacea', 'itis_usage': 'valid'},
 {'scientific_name': 'Pterodroma sandwichensis', 'itis_usage': 'valid'},
 {'scientific_name': 'Anoplopoma fimbria', 'itis_usage': 'valid'},
 {'scientific_name': 'Gopherus agassizii', 'itis_usage': 'valid'},
 {'scientific_name': 'Microtus californicus', 'itis_usage': 'valid'},
 {'scientific_name': 'Tympanuchus pallidicinctus', 'itis_usage': 'valid'},
 {'scientific_name': 'Pinus flexilis', 'itis_usage': 'accepted'},
 {'scientific_name': 'Betula nana', 'itis_usage': 'accepted'},
 {'scientific_name': 'Peromyscus maniculatus', 'itis_usage': 'valid'},
 {'scientific_name': 

In [None]:
# Cache the array of retrieved documents and return/display a random sample for verification
display(bis_utils.doc_cache("cache/itis_explore.json", itis_explore))


In [8]:
#Create list of specie names considered invalid or not accepted by ITIS
invalid_itis=[i for i in itis_explore if i['itis_usage'] == 'invalid']
invalid_itis.extend([i for i in itis_explore if i['itis_usage'] == 'not accepted'])

In [9]:
#Display list of specie names considered invalid or not accepted by ITIS
invalid_itis

[{'scientific_name': 'Brucella abortus', 'itis_usage': 'invalid'},
 {'scientific_name': 'Tetrao tetrix', 'itis_usage': 'invalid'},
 {'scientific_name': 'Amphispiza belli', 'itis_usage': 'invalid'},
 {'scientific_name': 'Clupea pallasi', 'itis_usage': 'invalid'},
 {'scientific_name': 'Thrichomys fosteri', 'itis_usage': 'invalid'},
 {'scientific_name': 'Equus burchelli', 'itis_usage': 'invalid'},
 {'scientific_name': 'Melanitta deglandi', 'itis_usage': 'invalid'},
 {'scientific_name': 'Bassia prostrata', 'itis_usage': 'not accepted'}]

In [10]:
# Cache the array of retrieved documents and return/display a random sample for verification
display(bis_utils.doc_cache("cache/invalid_itis.json", invalid_itis))

{'Doc Cache File': 'cache/invalid_itis.json',
 'Number of Documents in Cache': 8,
 'Document Number 7': {'scientific_name': 'Bassia prostrata',
  'itis_usage': 'not accepted'}}

In [11]:
#Create an updated list of species names considered valid or accepted by ITIS
updated_itis=[e for e in itis_explore if e['scientific_name'] not in {'Amphispiza belli','Thrichomys fosteri','Tetrao tetrix','Brucella abortus','Equus burchelli','Bassia prostrata'}]
len(updated_itis)

164

In [12]:
# Cache the array of retrieved documents and return/display a random sample for verification
display(bis_utils.doc_cache("cache/valid_itis.json", updated_itis))

{'Doc Cache File': 'cache/valid_itis.json',
 'Number of Documents in Cache': 164,
 'Document Number 62': {'scientific_name': 'Corvus corax',
  'itis_usage': 'valid'}}

Here we examine the geographic division and jurisdiction information provided by ITIS for each species record to better understand the geographical ranges and jurisdictions of the species included in the species name list created in the Build Species List notebook. 

First we create a set of lists, one that includes species whose designated geographic values are listed as "North America" and another that includes species with their jurisdictions listed "Coninental US". Both of these lists are intended to represent the species that might encompass the WLCI boundary. 

Then we create another set of lists, one that includes species whose designated geographic values are listed as places other than "North America" and another that includes species with their jurisdictions listed as areas other than the "Continental US". These lists are intended to indicate species that may not be appropriate to include in the WLCI species list. 

In [14]:
#Create a list that includes only the tsn number, scientific name, geographic division information, and jurisdiction information. 
geo_info=[]
for item in itis_cache:
    for record in item['itis_data']:
        tsn=record['tsn']
        scientific_name=record['nameWInd']
        itis_usage=record['usage']
        if 'geographicDivision' in record:
            geo_division=record['geographicDivision']
        if 'jurisdiction' in record:
            jurisdiction=record['jurisdiction']
            geo_info.append({'tsn':tsn, 'scientific_name':scientific_name,'itis_usage':itis_usage, 'geo_division':geo_division,'jurisdiction':jurisdiction}) 

Here we look at the species records with geographic values listed as "North America"

In [61]:
#Create a list which only includes tsn number, scientific name, and geographic value information.
geographic_value=[]
for record in geo_info:
    tsn=record['tsn']
    scientific_name=record['scientific_name']
    for item in record['geo_division']:
        geo_value=item['geographic_value']
        geographic_value.append({'tsn':tsn, 'scientific_name':scientific_name, 'geo_value':geo_value})

In [28]:
#Select all records that list the geographic value as "North America"
north_america=[i for i in geographic_value if i['geo_value'] in 'North America']

In [58]:
north_america

[{'tsn': '950052',
  'scientific_name': 'Setophaga discolor',
  'geo_value': 'North America'},
 {'tsn': '504637',
  'scientific_name': 'Pseudoroegneria spicata',
  'geo_value': 'North America'},
 {'tsn': '173840',
  'scientific_name': 'Lepidochelys olivacea',
  'geo_value': 'North America'},
 {'tsn': '173856',
  'scientific_name': 'Gopherus agassizii',
  'geo_value': 'North America'},
 {'tsn': '180305',
  'scientific_name': 'Microtus californicus',
  'geo_value': 'North America'},
 {'tsn': '175838',
  'scientific_name': 'Tympanuchus pallidicinctus',
  'geo_value': 'North America'},
 {'tsn': '183343',
  'scientific_name': 'Pinus flexilis',
  'geo_value': 'North America'},
 {'tsn': '19479',
  'scientific_name': 'Betula nana',
  'geo_value': 'North America'},
 {'tsn': '180276',
  'scientific_name': 'Peromyscus maniculatus',
  'geo_value': 'North America'},
 {'tsn': '503283',
  'scientific_name': 'Kochia prostrata',
  'geo_value': 'North America'},
 {'tsn': '179628',
  'scientific_name': '

Here we look at the species records with jurisdiction information listed as "Continental US"

In [65]:
#Create list which only contains the tsn number, scientific name, jurisdiction information, and origin information
jurisdiction_info=[]
for record in geo_info:
    tsn=record['tsn']
    scientific_name=record['scientific_name']
    for item in record['jurisdiction']:
        jurisdiction=item['jurisdiction_value']
        origin=item['origin']
        jurisdiction_info.append({'tsn':tsn,'scientific_name':scientific_name,'jurisdiction':jurisdiction, 'origin':origin})

In [36]:
#Select all records that list the jurisdiction as "Continental US"
continental_us=[i for i in jurisdiction_info if i['jurisdiction'] == 'Continental US']

In [59]:
continental_us

[{'tsn': '950052',
  'scientific_name': 'Setophaga discolor',
  'jurisdiction': 'Continental US',
  'origin': 'Native'},
 {'tsn': '504637',
  'scientific_name': 'Pseudoroegneria spicata',
  'jurisdiction': 'Continental US',
  'origin': 'Native'},
 {'tsn': '173840',
  'scientific_name': 'Lepidochelys olivacea',
  'jurisdiction': 'Continental US',
  'origin': 'Native'},
 {'tsn': '173856',
  'scientific_name': 'Gopherus agassizii',
  'jurisdiction': 'Continental US',
  'origin': 'Native'},
 {'tsn': '180305',
  'scientific_name': 'Microtus californicus',
  'jurisdiction': 'Continental US',
  'origin': 'Native'},
 {'tsn': '175838',
  'scientific_name': 'Tympanuchus pallidicinctus',
  'jurisdiction': 'Continental US',
  'origin': 'Native'},
 {'tsn': '183343',
  'scientific_name': 'Pinus flexilis',
  'jurisdiction': 'Continental US',
  'origin': 'Native'},
 {'tsn': '180276',
  'scientific_name': 'Peromyscus maniculatus',
  'jurisdiction': 'Continental US',
  'origin': 'Native'},
 {'tsn': '503

Here we look at the species with geographic values listed as places other than "North America"

In [63]:
not_north_america=[e for e in geographic_value if e['geo_value'] not in 'North America']

In [64]:
not_north_america

[{'tsn': '950052',
  'scientific_name': 'Setophaga discolor',
  'geo_value': 'Caribbean'},
 {'tsn': '202344',
  'scientific_name': 'Lasiurus cinereus semotus',
  'geo_value': 'Oceania'},
 {'tsn': '173840',
  'scientific_name': 'Lepidochelys olivacea',
  'geo_value': 'Middle America'},
 {'tsn': '173840',
  'scientific_name': 'Lepidochelys olivacea',
  'geo_value': 'Oceania'},
 {'tsn': '173840',
  'scientific_name': 'Lepidochelys olivacea',
  'geo_value': 'Africa'},
 {'tsn': '173840',
  'scientific_name': 'Lepidochelys olivacea',
  'geo_value': 'Australia'},
 {'tsn': '173840',
  'scientific_name': 'Lepidochelys olivacea',
  'geo_value': 'Caribbean'},
 {'tsn': '173840',
  'scientific_name': 'Lepidochelys olivacea',
  'geo_value': 'East Pacific'},
 {'tsn': '173840',
  'scientific_name': 'Lepidochelys olivacea',
  'geo_value': 'Eastern Atlantic Ocean'},
 {'tsn': '173840',
  'scientific_name': 'Lepidochelys olivacea',
  'geo_value': 'Indo-West Pacific'},
 {'tsn': '173840',
  'scientific_name

Here we look at species with jurisdiction information listed as areas other than the "Continental US"

In [70]:
not_continental_us=[e for e in jurisdiction_info if e['jurisdiction'] not in 'Continental US']

In [68]:
not_continental_us

[{'tsn': '950052',
  'scientific_name': 'Setophaga discolor',
  'jurisdiction': 'Canada',
  'origin': 'Native'},
 {'tsn': '202344',
  'scientific_name': 'Lasiurus cinereus semotus',
  'jurisdiction': 'Hawaii',
  'origin': 'Native'},
 {'tsn': '504637',
  'scientific_name': 'Pseudoroegneria spicata',
  'jurisdiction': 'Alaska',
  'origin': 'Native'},
 {'tsn': '173840',
  'scientific_name': 'Lepidochelys olivacea',
  'jurisdiction': 'Hawaii',
  'origin': 'Native'},
 {'tsn': '173840',
  'scientific_name': 'Lepidochelys olivacea',
  'jurisdiction': 'Mexico',
  'origin': 'Native'},
 {'tsn': '173856',
  'scientific_name': 'Gopherus agassizii',
  'jurisdiction': 'Mexico',
  'origin': 'Native'},
 {'tsn': '180305',
  'scientific_name': 'Microtus californicus',
  'jurisdiction': 'Mexico',
  'origin': 'Native'},
 {'tsn': '183343',
  'scientific_name': 'Pinus flexilis',
  'jurisdiction': 'Canada',
  'origin': 'Native'},
 {'tsn': '19479',
  'scientific_name': 'Betula nana',
  'jurisdiction': 'Canada