To keep up with current literature on species of interest this notebook accesses eXtract Dark Data (xDD), formaly known as GeoDeepDive.  The xDD database is continuously being updated with published scientific literature from many of the large publishers including Elsiver, Taylor & Francis, GSA, and USGS.  We are currently working with the xDD team  on a number of tools and techniques for a) identifying literature potentially applicable to species-based research and b) using natural language processing tools to pull specific data from those sources for use. This is an ongoing effort that will result in improved production capabilities over time.

In the near term, we take advantage of some basic and enhanced search functionality to identify potential articles of interested in the xDD library of millions of documents that are increasing daily. The xdd module in the bispy package contains some search and packaging functionality that interfaces with the xDD REST API.

In [23]:
import requests
import json
import bispy
from IPython.display import display
from joblib import Parallel, delayed
import random

xdd = bispy.xdd.Xdd()

import warnings
warnings.filterwarnings('ignore')

In [24]:
# Open source WLCI list created from build-specie-list.ipynb
with open("cache/WLCI Species List from Literature.json", "r") as f:
    sp_list = json.loads(f.read())
sci_name_list = list(set([spp["Scientific Name"] for spp in sp_list]))

In [25]:
# Use joblib to run multiple requests to xDD in parallel via scientific names
xdd_results = Parallel(n_jobs=8)(delayed(xdd.snippets)(name) for name in [r for r in sp_list])

In [26]:
xdd_results

[{'processing_metadata': {'status': 'success',
   'date_processed': '2019-08-05T19:42:13.215970',
   'api': "https://geodeepdive.org/api/snippets?full_results&clean&term={'Scientific Name': 'Branta canadensis', 'xdd_id': '5c4e3f571faed655489408c3', 'wlci_id': 'http://zotero.org/groups/2341914/items/KP3R7Q33', 'n_hits': '1'}"},
  'parameters': {'Search Term': {'Scientific Name': 'Branta canadensis',
    'xdd_id': '5c4e3f571faed655489408c3',
    'wlci_id': 'http://zotero.org/groups/2341914/items/KP3R7Q33',
    'n_hits': '1'}},
  'xdd_documents': [{'pubname': 'Open-File Report',
    'publisher': 'USGS',
    '_gddid': '55b02c55e138238f9de48556',
    'coverDate': '2006',
    'authors': 'Powell, Brian F.; Schmidt, Cecilia A.; Halvorson, William Lee',
    'highlight': ['non-native.  Order  Family  Anseriformes Anatidae  Scientific name Branta',
     'non-native.  Order  Family  Anseriformes Anatidae  Scientific name Branta canadensis  Common',
     'Scientific name Branta canadensis  Common n

In [27]:
success_xdd = [i for i in xdd_results if i["processing_metadata"]["status"] == "success"]

In [37]:
# Cache the array of retrieved documents 
with open('cache/xdd.json', 'w') as f:
    f.write(json.dumps(success_xdd, indent=4))

In [38]:
with open("cache/xdd.json", "r") as f:
    xdd_cache = json.loads(f.read())

print(len(xdd_cache))
display(xdd_cache[random.randint(0,len(xdd_cache)-1)])

116


{'processing_metadata': {'status': 'success',
  'date_processed': '2019-08-05T19:42:42.133100',
  'api': "https://geodeepdive.org/api/snippets?full_results&clean&term={'Scientific Name': 'Abies lasiocarpa', 'xdd_id': '5c4e3f571faed655489408c3', 'wlci_id': 'http://zotero.org/groups/2341914/items/KP3R7Q33', 'n_hits': '1'}"},
 'parameters': {'Search Term': {'Scientific Name': 'Abies lasiocarpa',
   'xdd_id': '5c4e3f571faed655489408c3',
   'wlci_id': 'http://zotero.org/groups/2341914/items/KP3R7Q33',
   'n_hits': '1'}},
 'xdd_documents': [{'pubname': 'Botany',
   'publisher': 'Canadian Science Publishing',
   '_gddid': '57634f22cf58f1b46d57df0b',
   'doi': '10.1139/b11-040',
   'coverDate': 'October 2011',
   'authors': 'Lemly, Joanna M.; Cooper, David J.',
   'highlight': ['Cyperaceae Cyperaceae Cyperaceae Cyperaceae Cyperaceae  Scientific name Abies',
    'Cyperaceae Cyperaceae  Scientific name Abies lasiocarpa',
    'Scientific name Abies lasiocarpa Picea sp. Pinus',
    'name Abies las