To keep up with current literature on species of interest this notebook accesses eXtract Dark Data (xDD), formaly known as GeoDeepDive.  The xDD database is continuously being updated with published scientific literature from many of the large publishers including Elsiver, Taylor & Francis, GSA, and USGS.  We are currently working with the xDD team  on a number of tools and techniques for a) identifying literature potentially applicable to species-based research and b) using natural language processing tools to pull specific data from those sources for use. This is an ongoing effort that will result in improved production capabilities over time.

In the near term, we take advantage of some basic and enhanced search functionality to identify potential articles of interested in the xDD library of millions of documents that are increasing daily. The xdd module in the bispy package contains some search and packaging functionality that interfaces with the xDD REST API.

In [1]:
#Import needed packages
import json
import bispy
from IPython.display import display
from joblib import Parallel, delayed

xdd = bispy.xdd.Xdd()
bis_utils = bispy.bis.Utils()

In [2]:
# Open up the cached explore ITIS list with species names from ITIS Exploration.ipynb
with open("cache/itis_explore.json", 'r') as f:
    itis_explore=json.loads(f.read())

In [3]:
itis_explore

[{'scientific_name': 'Oncorhynchus clarkii pleuriticus',
  'itis_usage': 'valid'},
 {'scientific_name': 'Ictidomys tridecemlineatus', 'itis_usage': 'valid'},
 {'scientific_name': 'Oncorhynchus mykiss', 'itis_usage': 'valid'},
 {'scientific_name': 'Podiceps grisegena', 'itis_usage': 'valid'},
 {'scientific_name': 'Shepherdia canadensis', 'itis_usage': 'accepted'},
 {'scientific_name': 'Grus americana', 'itis_usage': 'valid'},
 {'scientific_name': 'Connochaetes taurinus', 'itis_usage': 'valid'},
 {'scientific_name': 'Pinus flexilis', 'itis_usage': 'accepted'},
 {'scientific_name': 'Aythya affinis', 'itis_usage': 'valid'},
 {'scientific_name': 'Zea mays', 'itis_usage': 'accepted'},
 {'scientific_name': 'Juniperus communis var. depressa',
  'itis_usage': 'accepted'},
 {'scientific_name': 'Clupea pallasii', 'itis_usage': 'valid'},
 {'scientific_name': 'Clupea pallasi', 'itis_usage': 'invalid'},
 {'scientific_name': 'Urocitellus armatus', 'itis_usage': 'valid'},
 {'scientific_name': 'Agave a

In [4]:
#Select all "valid" ITIS species names
valid_xdd=[i for i in itis_explore if i['itis_usage'] == 'valid' ]

In [5]:
# Use joblib to run multiple requests to xDD in parallel via "valid" scientific names
valid_xdd_results = Parallel(n_jobs=8)(delayed(xdd.snippets)(r["scientific_name"]) for r in valid_xdd)

In [6]:
# Filter to give just cases where xDD records matched with "valid" ITIS species names
success_valid_xdd=[i for i in valid_xdd_results if i['processing_metadata']['status'] == 'success' ]

In [7]:
#Select all "accepted" ITIS species names
accepted_xdd=[i for i in itis_explore if i['itis_usage'] == 'accepted']

In [8]:
# Use joblib to run multiple requests to xDD in parallel via "accepted" scientific names
accepted_xdd_results = Parallel(n_jobs=8)(delayed(xdd.snippets)(r["scientific_name"]) for r in accepted_xdd)

In [9]:
# Filter to give just cases where xDD records matched with "accepted" ITIS species names
success_accepted_xdd=[i for i in accepted_xdd_results if i['processing_metadata']['status'] == 'success' ]

Check to see if any invalid/not accepted ITIS specie names matched with xDD records

In [10]:
#Select all "invalid" ITIS species names
invalid_xdd=[i for i in itis_explore if i['itis_usage'] == 'invalid']

In [11]:
# Use joblib to run multiple requests to xDD in parallel via "invalid" scientific names
invalid_xdd_results = Parallel(n_jobs=8)(delayed(xdd.snippets)(r["scientific_name"]) for r in invalid_xdd)

In [12]:
# Filter to give just cases where xDD records matched with "invalid" ITIS species names
success_invalid_xdd=[i for i in invalid_xdd_results if i['processing_metadata']['status'] == 'success' ]

In [13]:
#View results xDD results for names considered "invalid" by ITIS
success_invalid_xdd

[{'processing_metadata': {'status': 'success',
   'date_processed': '2019-08-27T20:17:47.825926',
   'api': 'https://geodeepdive.org/api/snippets?full_results&clean&term=Clupea pallasi'},
  'parameters': {'Search Term': 'Clupea pallasi'},
  'xdd_documents': [{'pubname': 'Ethology Ecology & Evolution',
    'publisher': 'Taylor and Francis',
    '_gddid': '595a51f5cf58f17497a42cf4',
    'doi': '10.1080/08927014.2009.9522513',
    'coverDate': '2009 01',
    'authors': '',
    'highlight': ['of the decline the Cherry Point Pacific herring (Clupea pallasi) stock, W.G.',
     'Point Pacific herring (Clupea pallasi) stock, W.G. Landis; 15. Endocrine disruption'],
    'document_title': 'Book news',
    'document_link': 'http://www.tandfonline.com/doi/abs/10.1080/08927014.2009.9522513'},
   {'pubname': 'Journal of Aquatic Animal Health',
    'publisher': 'Taylor and Francis',
    '_gddid': '59548253cf58f159a580f66e',
    'doi': '10.1577/H04-041.1',
    'coverDate': '2005 09',
    'authors': 'H

In [14]:
#Select all "not accepted" ITIS species names
not_accepted_xdd=[i for i in itis_explore if i['itis_usage'] == 'not accepted']

In [15]:
# Use joblib to run multiple requests to xDD in parallel via "not accepted" scientific names
not_accepted_xdd_results = Parallel(n_jobs=8)(delayed(xdd.snippets)(r["scientific_name"]) for r in not_accepted_xdd)

In [16]:
# Filter to give just cases where xDD records matched with "not accepted" ITIS species names
success_not_accepted_xdd=[i for i in not_accepted_xdd_results if i['processing_metadata']['status'] == 'success' ]

In [17]:
#View results xDD results for names considered "not accepted" by ITIS
success_not_accepted_xdd

[{'processing_metadata': {'status': 'success',
   'date_processed': '2019-08-27T20:18:26.152862',
   'api': 'https://geodeepdive.org/api/snippets?full_results&clean&term=Bassia prostrata'},
  'parameters': {'Search Term': 'Bassia prostrata'},
  'xdd_documents': [{'pubname': 'Acta Oecologica',
    'publisher': 'Elsevier',
    '_gddid': '5ad47446cf58f1b1b9733bb8',
    'doi': '10.1016/j.actao.2016.03.002',
    'coverDate': 'May 2016',
    'authors': "Ashouri, Parvaneh; Jalili, Adel; Danehkar, Afshin; Chahouki, Mohammad Ali Zare; Hamzeh'ee, Behnam",
    'highlight': ['Hohen.) Podlech. Stipa hohenackeriana Trin. & Rupr. Bassia prostrata (L.) Beck Astragalus',
     'hohenackeriana Trin. & Rupr. Bassia prostrata (L.) Beck Astragalus chrysostachys Boiss.'],
    'document_title': 'Is there any support for the humped-back model in some steppe and semi steppe regions of Iran?',
    'document_link': 'https://www.sciencedirect.com/science/article/pii/S1146609X16300467'},
   {'pubname': 'Feddes Repe

In [18]:
# Cache the array of retrieved documents and return/display a random sample for verification
display(bis_utils.doc_cache("cache/xdd.json", success_valid_xdd and success_accepted_xdd and success_invalid_xdd and success_not_accepted_xdd))

{'Doc Cache File': 'cache/xdd.json',
 'Number of Documents in Cache': 1,
 'Document Number 0': {'processing_metadata': {'status': 'success',
   'date_processed': '2019-08-27T20:18:26.152862',
   'api': 'https://geodeepdive.org/api/snippets?full_results&clean&term=Bassia prostrata'},
  'parameters': {'Search Term': 'Bassia prostrata'},
  'xdd_documents': [{'pubname': 'Acta Oecologica',
    'publisher': 'Elsevier',
    '_gddid': '5ad47446cf58f1b1b9733bb8',
    'doi': '10.1016/j.actao.2016.03.002',
    'coverDate': 'May 2016',
    'authors': "Ashouri, Parvaneh; Jalili, Adel; Danehkar, Afshin; Chahouki, Mohammad Ali Zare; Hamzeh'ee, Behnam",
    'highlight': ['Hohen.) Podlech. Stipa hohenackeriana Trin. & Rupr. Bassia prostrata (L.) Beck Astragalus',
     'hohenackeriana Trin. & Rupr. Bassia prostrata (L.) Beck Astragalus chrysostachys Boiss.'],
    'document_title': 'Is there any support for the humped-back model in some steppe and semi steppe regions of Iran?',
    'document_link': 'http