This notebook explores the records retrieved from the consultation between the WLCI scientific species name list and the ITIS API. The cached ITIS results are examined and next steps are determined.

__Source(s)__

_'cache/itis.json'_ : Information from ITIS on WLCI referenced species.  This file is created in _'workflow/Consult-with-ITIS.ipynb'_

__Output(s)__

'/cache/valid_itis.json' : Specie names classified as valid or accepted in ITIS

'cache/invalid_itis.json' : Specie names classified as invalid or not accepted in ITIS



In [2]:
#Import needed packages
import json
import bispy
import requests
from IPython.display import display
from joblib import Parallel, delayed

bis_utils = bispy.bis.Utils()

In [3]:
#Open cache ITIS list created from WLCI-Species-List-from-Literature.ipynb
with open("../cache/itis.json", "r") as f:
    itis_cache = json.loads(f.read())

In [4]:
#Print first record in ITIS list as an example
itis_cache[0]

{'processing_metadata': {'status': 'success',
  'date_processed': '2019-09-25T16:45:51.362570',
  'status_message': 'Exact Match',
  'details': [{'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=nameWOInd:Otis\\%20tarda'}]},
 'parameters': {'Scientific Name': 'Otis tarda'},
 'data': [{'tsn': '176419',
   'nameWInd': 'Otis tarda',
   'nameWOInd': 'Otis tarda',
   'unit1': 'Otis',
   'unit2': 'tarda',
   'usage': 'valid',
   'credibilityRating': 'TWG standards met',
   'taxonAuthor': 'Linnaeus, 1758',
   'kingdom': 'Animalia',
   'parentTSN': '176418',
   'rankID': '220',
   'rank': 'Species',
   '_version_': 1643585546269753353,
   'date_created': '1996-06-13 14:51:08',
   'date_modified': '2006-11-28 00:00:00',
   'expert': [{'reference_type': 'EXP',
     'expert_id': '11',
     'expert_name': 'Alan P. Peterson, M.D.',
     'expert_comment': 'PO Box 1999 Walla Walla, Washington 99362-0999',
     'create_date': '2001-09-28 00:00:00',
     'update_date': ''}],
   'publication

In some cases more than one ITIS record was returned from the function. In every case, the “processing_metadata” structure indicates that the scientific species names used at the point of discovery were accepted. However, the "itis_data" structure indicates that potentially invalid names were provided from the WLCI literature database or xDD source information yet have a valid record in ITIS. The following code block lets us examine what is going on in these cases. 

The “processing_metadata” structure provides information about what the function does. It includes the URLs to the ITIS API that resulted in some action. Both the valid/accepted and invalid/unaccepted names from ITIS are recorded. We reach back to the itis_cache to show that record.

In [22]:
#Select ITIS records with more than one record 
validate =[i["data"] for i in itis_cache if "data" in i.keys() and len(i["data"]) > 1]

In [24]:
validate

[[{'tsn': '997724',
   'nameWInd': 'Artemisiospiza belli',
   'nameWOInd': 'Artemisiospiza belli',
   'unit1': 'Artemisiospiza',
   'unit2': 'belli',
   'usage': 'valid',
   'credibilityRating': 'TWG standards met',
   'taxonAuthor': '(Cassin, 1850)',
   'kingdom': 'Animalia',
   'parentTSN': '997695',
   'rankID': '220',
   'rank': 'Species',
   'synonyms': ['997724:$Amphispiza belli$'],
   'synonymTSNs': ['997724:$179402$'],
   '_version_': 1643585615848013824,
   'date_created': '2015-10-28 14:07:43',
   'date_modified': '2015-10-28 00:00:00',
   'geographicDivision': [{'geographic_value': 'Middle America',
     'update_date': '2015-10-28 00:00:00'},
    {'geographic_value': 'North America',
     'update_date': '2015-10-28 00:00:00'}],
   'jurisdiction': [{'jurisdiction_value': 'Continental US',
     'origin': 'Native',
     'update_date': '2015-10-28 00:00:00'},
    {'jurisdiction_value': 'Mexico',
     'origin': 'Native',
     'update_date': '2015-10-28 00:00:00'}],
   'otherSourc

This list indicates a few things:
1. There is one case where a source name was misspelled, Equus burchelli should have been Equus quagga burchellii. In this case, the search used the TSN of the misspelled species name to correctly identify the species. ITIS considers the record containing the misspelled name to be invalid and the other record valid, even though they are for the same name.  
2. In three cases, the names provided were considered invalid or not accepted because ITIS considered the names to be synonyms or junior synonyms of the correct species names. 
3. In one case, ITIS considered the name provided to be invalid because it was an  “original name/combination”.
4. In one case, ITIS considered the name provided to be invalid because it was a “subsequent name/combination”.
    
In the case of any disagreement between WLCI scientists and the taxonomic authority, both the invalid/not accepted and valid/accepted ITIS names from the WLCI species names list will be used for further consultations with additional systems. Two seperate lists, invalid_itis and valid_itis, will be used to distinguish the invalid/not accepted and valid/accepted ITIS names in these consultations. Any information that results from consultations using the invalid_itis list will be cached separately and evaluated for their utility.  

In [25]:
#Create list containing sciencitific names and ITIS usage classifications
itis_explore = list()
for itis_doc_set in itis_cache:
    for itis_doc in itis_doc_set["data"]:
        itis_explore.append({"scientific_name": itis_doc["nameWInd"],"itis_usage": itis_doc["usage"]})

In [26]:
#Display the ITIS list of species names and ITIS usage (valid/invalid and accepted/not accepted)
itis_explore

[{'scientific_name': 'Otis tarda', 'itis_usage': 'valid'},
 {'scientific_name': 'Clangula hyemalis', 'itis_usage': 'valid'},
 {'scientific_name': 'Artemisiospiza belli', 'itis_usage': 'valid'},
 {'scientific_name': 'Amphispiza belli', 'itis_usage': 'invalid'},
 {'scientific_name': 'Taxidea taxus', 'itis_usage': 'valid'},
 {'scientific_name': 'Schoenoplectus acutus', 'itis_usage': 'accepted'},
 {'scientific_name': 'Shepherdia canadensis', 'itis_usage': 'accepted'},
 {'scientific_name': 'Ericameria nauseosa', 'itis_usage': 'accepted'},
 {'scientific_name': 'Festuca idahoensis', 'itis_usage': 'accepted'},
 {'scientific_name': 'Elymus lanceolatus', 'itis_usage': 'accepted'},
 {'scientific_name': 'Brucella melitensis', 'itis_usage': 'valid'},
 {'scientific_name': 'Brucella abortus', 'itis_usage': 'invalid'},
 {'scientific_name': 'Buteo regalis', 'itis_usage': 'valid'},
 {'scientific_name': 'Bombus griseocollis', 'itis_usage': 'valid'},
 {'scientific_name': 'Oncorhynchus nerka', 'itis_usage'

In [27]:
# Cache the array of retrieved documents and return/display a random sample for verification
display(bis_utils.doc_cache("../cache/itis_explore.json", itis_explore))


{'Doc Cache File': '../cache/itis_explore.json',
 'Number of Documents in Cache': 176,
 'Document Number 144': {'scientific_name': 'Melanitta deglandi',
  'itis_usage': 'invalid'}}

In [28]:
#Create list of specie names considered invalid or not accepted by ITIS
invalid_itis=[i for i in itis_explore if i['itis_usage'] == 'invalid']
invalid_itis.extend([i for i in itis_explore if i['itis_usage'] == 'not accepted'])

In [29]:
#Display list of specie names considered invalid or not accepted by ITIS
invalid_itis

[{'scientific_name': 'Amphispiza belli', 'itis_usage': 'invalid'},
 {'scientific_name': 'Brucella abortus', 'itis_usage': 'invalid'},
 {'scientific_name': 'Thrichomys fosteri', 'itis_usage': 'invalid'},
 {'scientific_name': 'Tetrao tetrix', 'itis_usage': 'invalid'},
 {'scientific_name': 'Equus burchelli', 'itis_usage': 'invalid'},
 {'scientific_name': 'Clupea pallasi', 'itis_usage': 'invalid'},
 {'scientific_name': 'Melanitta deglandi', 'itis_usage': 'invalid'},
 {'scientific_name': 'Rangifer tarandus groenlandicus',
  'itis_usage': 'invalid'},
 {'scientific_name': 'Arctostaphylos patula', 'itis_usage': 'not accepted'},
 {'scientific_name': 'Thinopyrum ponticum', 'itis_usage': 'not accepted'},
 {'scientific_name': 'Thinopyrum ponticum', 'itis_usage': 'not accepted'},
 {'scientific_name': 'Poa secunda', 'itis_usage': 'not accepted'},
 {'scientific_name': 'Bassia prostrata', 'itis_usage': 'not accepted'},
 {'scientific_name': 'Pascopyrum smithii', 'itis_usage': 'not accepted'}]

In [30]:
# Cache the array of retrieved documents and return/display a random sample for verification
display(bis_utils.doc_cache("../cache/invalid_itis.json", invalid_itis))

{'Doc Cache File': '../cache/invalid_itis.json',
 'Number of Documents in Cache': 14,
 'Document Number 9': {'scientific_name': 'Thinopyrum ponticum',
  'itis_usage': 'not accepted'}}

In [31]:
#Create an updated list of species names considered valid or accepted by ITIS
updated_itis=[e for e in itis_explore if e['scientific_name'] not in {'Amphispiza belli','Thrichomys fosteri','Tetrao tetrix','Brucella abortus','Equus burchelli','Bassia prostrata'}]
len(updated_itis)

170

In [32]:
# Cache the array of retrieved documents and return/display a random sample for verification
display(bis_utils.doc_cache("../cache/valid_itis.json", updated_itis))

{'Doc Cache File': '../cache/valid_itis.json',
 'Number of Documents in Cache': 170,
 'Document Number 58': {'scientific_name': 'Corvus corax',
  'itis_usage': 'valid'}}

Here we examine the geographic division and jurisdiction information provided by ITIS for each species record to better understand the geographical ranges and jurisdictions of the species included in the species name list created in the Build Species List notebook. 

First we create a set of lists, one that includes species whose designated geographic values are listed as "North America" and another that includes species with their jurisdictions listed "Coninental US". Both of these lists are intended to represent the species that might encompass the WLCI boundary. 

Then we create another set of lists, one that includes species whose designated geographic values are listed as places other than "North America" and another that includes species with their jurisdictions listed as areas other than the "Continental US". These lists are intended to indicate species that may not be appropriate to include in the WLCI species list. 

In [34]:
#Create a list that includes only the tsn number, scientific name, geographic division information, and jurisdiction information. 
geo_info=[]
for item in itis_cache:
    for record in item['data']:
        tsn=record['tsn']
        scientific_name=record['nameWInd']
        itis_usage=record['usage']
        if 'geographicDivision' in record:
            geo_division=record['geographicDivision']
        if 'jurisdiction' in record:
            jurisdiction=record['jurisdiction']
            geo_info.append({'tsn':tsn, 'scientific_name':scientific_name,'itis_usage':itis_usage, 'geo_division':geo_division,'jurisdiction':jurisdiction}) 

Here we look at the species records with geographic values listed as "North America"

In [35]:
#Create a list which only includes tsn number, scientific name, and geographic value information.
geographic_value=[]
for record in geo_info:
    tsn=record['tsn']
    scientific_name=record['scientific_name']
    for item in record['geo_division']:
        geo_value=item['geographic_value']
        geographic_value.append({'tsn':tsn, 'scientific_name':scientific_name, 'geo_value':geo_value})

In [36]:
#Select all records that list the geographic value as "North America"
north_america=[i for i in geographic_value if i['geo_value'] in 'North America']

In [37]:
north_america

[{'tsn': '175147',
  'scientific_name': 'Clangula hyemalis',
  'geo_value': 'North America'},
 {'tsn': '997724',
  'scientific_name': 'Artemisiospiza belli',
  'geo_value': 'North America'},
 {'tsn': '179402',
  'scientific_name': 'Amphispiza belli',
  'geo_value': 'North America'},
 {'tsn': '180565',
  'scientific_name': 'Taxidea taxus',
  'geo_value': 'North America'},
 {'tsn': '507785',
  'scientific_name': 'Schoenoplectus acutus',
  'geo_value': 'North America'},
 {'tsn': '27779',
  'scientific_name': 'Shepherdia canadensis',
  'geo_value': 'North America'},
 {'tsn': '507594',
  'scientific_name': 'Ericameria nauseosa',
  'geo_value': 'North America'},
 {'tsn': '40816',
  'scientific_name': 'Festuca idahoensis',
  'geo_value': 'North America'},
 {'tsn': '502267',
  'scientific_name': 'Elymus lanceolatus',
  'geo_value': 'North America'},
 {'tsn': '175377',
  'scientific_name': 'Buteo regalis',
  'geo_value': 'North America'},
 {'tsn': '714807',
  'scientific_name': 'Bombus griseoco

Here we look at the species records with jurisdiction information listed as "Continental US"

In [38]:
#Create list which only contains the tsn number, scientific name, jurisdiction information, and origin information
jurisdiction_info=[]
for record in geo_info:
    tsn=record['tsn']
    scientific_name=record['scientific_name']
    for item in record['jurisdiction']:
        jurisdiction=item['jurisdiction_value']
        origin=item['origin']
        jurisdiction_info.append({'tsn':tsn,'scientific_name':scientific_name,'jurisdiction':jurisdiction, 'origin':origin})

In [39]:
#Select all records that list the jurisdiction as "Continental US"
continental_us=[i for i in jurisdiction_info if i['jurisdiction'] == 'Continental US']

In [40]:
continental_us

[{'tsn': '175147',
  'scientific_name': 'Clangula hyemalis',
  'jurisdiction': 'Continental US',
  'origin': 'Native'},
 {'tsn': '997724',
  'scientific_name': 'Artemisiospiza belli',
  'jurisdiction': 'Continental US',
  'origin': 'Native'},
 {'tsn': '179402',
  'scientific_name': 'Amphispiza belli',
  'jurisdiction': 'Continental US',
  'origin': 'Native'},
 {'tsn': '180565',
  'scientific_name': 'Taxidea taxus',
  'jurisdiction': 'Continental US',
  'origin': 'Introduced'},
 {'tsn': '507785',
  'scientific_name': 'Schoenoplectus acutus',
  'jurisdiction': 'Continental US',
  'origin': 'Native'},
 {'tsn': '27779',
  'scientific_name': 'Shepherdia canadensis',
  'jurisdiction': 'Continental US',
  'origin': 'Native'},
 {'tsn': '507594',
  'scientific_name': 'Ericameria nauseosa',
  'jurisdiction': 'Continental US',
  'origin': 'Native'},
 {'tsn': '40816',
  'scientific_name': 'Festuca idahoensis',
  'jurisdiction': 'Continental US',
  'origin': 'Native'},
 {'tsn': '502267',
  'scienti

Here we look at the species with geographic values listed as places other than "North America"

In [41]:
not_north_america=[e for e in geographic_value if e['geo_value'] not in 'North America']

In [42]:
not_north_america

[{'tsn': '175147',
  'scientific_name': 'Clangula hyemalis',
  'geo_value': 'Europe & Northern Asia (excluding China)'},
 {'tsn': '175147',
  'scientific_name': 'Clangula hyemalis',
  'geo_value': 'Southern Asia'},
 {'tsn': '175147',
  'scientific_name': 'Clangula hyemalis',
  'geo_value': 'Oceania'},
 {'tsn': '997724',
  'scientific_name': 'Artemisiospiza belli',
  'geo_value': 'Middle America'},
 {'tsn': '180565',
  'scientific_name': 'Taxidea taxus',
  'geo_value': 'Middle America'},
 {'tsn': '180607',
  'scientific_name': 'Vulpes velox',
  'geo_value': 'Middle America'},
 {'tsn': '175863',
  'scientific_name': 'Colinus virginianus',
  'geo_value': 'Caribbean'},
 {'tsn': '175863',
  'scientific_name': 'Colinus virginianus',
  'geo_value': 'Middle America'},
 {'tsn': '179628',
  'scientific_name': 'Passer domesticus',
  'geo_value': 'Oceania'},
 {'tsn': '179628',
  'scientific_name': 'Passer domesticus',
  'geo_value': 'Caribbean'},
 {'tsn': '180706',
  'scientific_name': 'Bison biso

Here we look at species with jurisdiction information listed as areas other than the "Continental US"

In [43]:
not_continental_us=[e for e in jurisdiction_info if e['jurisdiction'] not in 'Continental US']

In [44]:
not_continental_us

[{'tsn': '175147',
  'scientific_name': 'Clangula hyemalis',
  'jurisdiction': 'Alaska',
  'origin': 'Native'},
 {'tsn': '175147',
  'scientific_name': 'Clangula hyemalis',
  'jurisdiction': 'Canada',
  'origin': 'Native'},
 {'tsn': '175147',
  'scientific_name': 'Clangula hyemalis',
  'jurisdiction': 'Hawaii',
  'origin': 'Incidental'},
 {'tsn': '997724',
  'scientific_name': 'Artemisiospiza belli',
  'jurisdiction': 'Mexico',
  'origin': 'Native'},
 {'tsn': '179402',
  'scientific_name': 'Amphispiza belli',
  'jurisdiction': 'Canada',
  'origin': 'Native'},
 {'tsn': '180565',
  'scientific_name': 'Taxidea taxus',
  'jurisdiction': 'Canada',
  'origin': 'Native'},
 {'tsn': '180565',
  'scientific_name': 'Taxidea taxus',
  'jurisdiction': 'Mexico',
  'origin': 'Native'},
 {'tsn': '507785',
  'scientific_name': 'Schoenoplectus acutus',
  'jurisdiction': 'Alaska',
  'origin': 'Native'},
 {'tsn': '507785',
  'scientific_name': 'Schoenoplectus acutus',
  'jurisdiction': 'Canada',
  'origin