To keep up with current literature on species of interest this notebook accesses eXtract Dark Data (xDD), formaly known as GeoDeepDive.  The xDD database is continuously being updated with published scientific literature from many of the large publishers including Elsiver, Taylor & Francis, GSA, and USGS.  We are currently working with the xDD team  on a number of tools and techniques for a) identifying literature potentially applicable to species-based research and b) using natural language processing tools to pull specific data from those sources for use. This is an ongoing effort that will result in improved production capabilities over time.

In the near term, we take advantage of some basic and enhanced search functionality to identify potential articles of interested in the xDD library of millions of documents that are increasing daily. The xdd module in the bispy package contains some search and packaging functionality that interfaces with the xDD REST API.

In [12]:
#Import needed packages
import requests
import json
import bispy
from IPython.display import display
from joblib import Parallel, delayed

xdd = bispy.xdd.Xdd()
bis_utils = bispy.bis.Utils()

In [2]:
with open("cache/valid_itis.json", 'r') as f:
    valid_itis=json.loads(f.read())

In [3]:
valid_itis

[{'scientific_name': 'Festuca idahoensis', 'itis_usage': 'accepted'},
 {'scientific_name': 'Ursus americanus', 'itis_usage': 'valid'},
 {'scientific_name': 'Pseudotsuga menziesii', 'itis_usage': 'accepted'},
 {'scientific_name': 'Tamiasciurus hudsonicus', 'itis_usage': 'valid'},
 {'scientific_name': 'Otis tarda', 'itis_usage': 'valid'},
 {'scientific_name': 'Melanitta perspicillata', 'itis_usage': 'valid'},
 {'scientific_name': 'Taxidea taxus', 'itis_usage': 'valid'},
 {'scientific_name': 'Puccinellia rupestris', 'itis_usage': 'accepted'},
 {'scientific_name': 'Artemisiospiza belli', 'itis_usage': 'valid'},
 {'scientific_name': 'Rangifer tarandus groenlandicus', 'itis_usage': 'valid'},
 {'scientific_name': 'Urocitellus armatus', 'itis_usage': 'valid'},
 {'scientific_name': 'Centrocercus urophasianus', 'itis_usage': 'valid'},
 {'scientific_name': 'Procapra gutturosa', 'itis_usage': 'valid'},
 {'scientific_name': 'Juniperus communis', 'itis_usage': 'accepted'},
 {'scientific_name': 'Cron

In [4]:
#Select all "valid" ITIS species names
valid_xdd=[i for i in valid_itis if i['itis_usage'] == 'valid' ]

In [5]:
# Use joblib to run multiple requests to xDD in parallel via "valid" scientific names
valid_xdd_results = Parallel(n_jobs=8)(delayed(xdd.snippets)(r["scientific_name"]) for r in valid_xdd)

In [6]:
# Filter to give just cases where xDD records matched with "valid" ITIS species names
success_valid_xdd=[i for i in valid_xdd_results if i['processing_metadata']['status'] == 'success' ]

In [7]:
#Select all "accepted" ITIS species names
accepted_xdd=[i for i in valid_itis if i['itis_usage'] == 'accepted']

In [8]:
# Use joblib to run multiple requests to xDD in parallel via "accepted" scientific names
accepted_xdd_results = Parallel(n_jobs=8)(delayed(xdd.snippets)(r["scientific_name"]) for r in accepted_xdd)

In [9]:
# Filter to give just cases where xDD records matched with "accepted" ITIS species names
success_accepted_xdd=[i for i in accepted_xdd_results if i['processing_metadata']['status'] == 'success' ]

In [13]:
# Cache the array of retrieved documents and return/display a random sample for verification
display(bis_utils.doc_cache("cache/xdd.json", success_valid_xdd and success_accepted_xdd ))

{'Doc Cache File': 'cache/xdd.json',
 'Number of Documents in Cache': 48,
 'Document Number 31': {'processing_metadata': {'status': 'success',
   'date_processed': '2019-08-12T17:50:50.861504',
   'api': 'https://geodeepdive.org/api/snippets?full_results&clean&term=Kochia prostrata'},
  'parameters': {'Search Term': 'Kochia prostrata'},
  'xdd_documents': [{'pubname': 'Global and Planetary Change',
    'publisher': 'Elsevier',
    '_gddid': '55c918becf58f1a8110ba7ca',
    'doi': '10.1016/S0921-8181(00)00081-3',
    'coverDate': 'February 2001',
    'authors': 'Sidorchuk, Aleksey; Borisova, Olga; Panin, Andrey',
    'highlight': ['xerophytes as Ephedra distachya, Eurotia ceratoides, Kochia prostrata and others.',
     'distachya, Eurotia ceratoides, Kochia prostrata and others. Cryophytes Ž Botrychium boreale,'],
    'document_title': 'Fluvial response to the Late Valdai/Holocene environmental change on the East European Plain',
    'document_link': 'http://www.sciencedirect.com/science

Check to see if any invalid/not accepted ITIS specie names matched with xDD records

In [14]:
with open("cache/invalid_itis.json", 'r') as f:
    invalid_itis=json.loads(f.read())

In [15]:
# Use joblib to run multiple requests to xDD in parallel via scientific names
invalid_xdd_results = Parallel(n_jobs=8)(delayed(xdd.snippets)(r["scientific_name"]) for r in invalid_itis)
len(invalid_xdd_results)

6

In [17]:
# Filter to give just cases where xDD records matched with invalid/not accepted ITIS species names
invalid_result=[i for i in invalid_xdd_results if i['processing_metadata']['status'] == 'success' ]
len(invalid_result)

6

In [18]:
# Cache the retrieved document and return/display a sample for verification
display(bis_utils.doc_cache("cache/Results of Consultations with Invalid ITIS Species Names/invalid_xdd.json", invalid_result))

{'Doc Cache File': 'cache/Results of Consultations with Invalid ITIS Species Names/invalid_xdd.json',
 'Number of Documents in Cache': 6,
 'Document Number 5': {'processing_metadata': {'status': 'success',
   'date_processed': '2019-08-12T17:57:50.870139',
   'api': 'https://geodeepdive.org/api/snippets?full_results&clean&term=Bassia prostrata'},
  'parameters': {'Search Term': 'Bassia prostrata'},
  'xdd_documents': [{'pubname': 'Acta Oecologica',
    'publisher': 'Elsevier',
    '_gddid': '5ad47446cf58f1b1b9733bb8',
    'doi': '10.1016/j.actao.2016.03.002',
    'coverDate': 'May 2016',
    'authors': "Ashouri, Parvaneh; Jalili, Adel; Danehkar, Afshin; Chahouki, Mohammad Ali Zare; Hamzeh'ee, Behnam",
    'highlight': ['Hohen.) Podlech. Stipa hohenackeriana Trin. & Rupr. Bassia prostrata (L.) Beck Astragalus',
     'hohenackeriana Trin. & Rupr. Bassia prostrata (L.) Beck Astragalus chrysostachys Boiss.'],
    'document_title': 'Is there any support for the humped-back model in some ste