I ended up using the need to process GAP species in new ways through the Taxa Information Registry as a vehicle for testing several new technologies and methods that we will weave into the overall process. I've decided that we're going to be much better off using some actual NoSQL database capability for storing the bits of information that fit well into a simple document model than we were trying to make the JSONB functionality in PostgreSQL via the GC2 package work for us. I tried several different NoSQL technologies that I could spin up locally - CouchDB, CouchBase, MongoDB, and DynamoDB. I found that MongoDB has, by far, the lowest bar for entry and the most developed higher level API in pymongo. Because of that and our FORT team's experience with MongoDB, I think that's what we may end up with for a while, and I am reworking the various code components needed for putting the GAP species information fully online into a MongoDB frame. For now, until we get something official spun up, I'm using a free account from mLab's hosted MongoDB service. They give me enough space to test everything with the small number of GAP species items.

This notebook conducts the process of picking up registration information from ScienceBase for the GAP species and putting that into documents within a "gapspecies" collection in a bis database. In many ways, we could simply drive all the various things we need to do to work with GAP species using ScienceBase directly, but I keep running into challenges with the system remaining online and fully viable for our use. So, I think it's going to be best to give the records a presence online via a different API that we can use for our purposes. We'll still count on ScienceBase providing the "bedrock" information, but we'll kick things off with this script to put those documents in a slightly different and more usable form. We'll also investigate versioning of these documents in future.

In [1]:
import requests
import pysb
from IPython.display import display
from bis import gap
from bis2 import mlab

In [2]:
bisDB = mlab.getDB("bis")
gapspecies = bisDB["gapspecies"]

In [3]:
sb = pysb.SbSession()
username = input("Username: ")
sb.loginc(str(username))

Username: sbristol@usgs.gov
········


<pysb.SbSession.SbSession at 0x1083d76d8>

In [4]:
itemCount = 0
items = sb.find_items("parentId=527d0a83e4b0850ea0518326&fields=title,identifiers,tags,contacts,dates,purpose,body,citation,webLinks")
while items and 'items' in items:
    for item in items['items']:
        gapItem = gap.gapToTIR_flat(item)
        if gapspecies.find({'GAP_SpeciesCode': gapItem["GAP_SpeciesCode"]}).count() == 0:
            post_id = gapspecies.insert_one(gapItem).inserted_id
            print(post_id)
    items = sb.next(items)

59b848bb3339a2d2a5efaefd
59b848bc3339a2d2a5efaefe
59b848bc3339a2d2a5efaeff
59b848bc3339a2d2a5efaf00
59b848bc3339a2d2a5efaf01
59b848bd3339a2d2a5efaf02
59b848bd3339a2d2a5efaf03
59b848bd3339a2d2a5efaf04
59b848bd3339a2d2a5efaf05
59b848bd3339a2d2a5efaf06
59b848be3339a2d2a5efaf07
59b848be3339a2d2a5efaf08
59b848be3339a2d2a5efaf09
59b848bf3339a2d2a5efaf0a
59b848bf3339a2d2a5efaf0b
59b848c03339a2d2a5efaf0c
59b848c13339a2d2a5efaf0d
59b848c13339a2d2a5efaf0e
59b848c13339a2d2a5efaf0f
59b848c23339a2d2a5efaf10
59b848c23339a2d2a5efaf11
59b848c33339a2d2a5efaf12
59b848c33339a2d2a5efaf13
59b848c33339a2d2a5efaf14
59b848c43339a2d2a5efaf15
59b848c53339a2d2a5efaf16
59b848c53339a2d2a5efaf17
59b848c53339a2d2a5efaf18
59b848c63339a2d2a5efaf19
59b848c63339a2d2a5efaf1a
59b848c63339a2d2a5efaf1b
59b848c63339a2d2a5efaf1c
59b848c73339a2d2a5efaf1d
59b848c73339a2d2a5efaf1e
59b848c73339a2d2a5efaf1f
59b848c83339a2d2a5efaf20
59b848c83339a2d2a5efaf21
59b848c83339a2d2a5efaf22
59b848c93339a2d2a5efaf23
59b848c93339a2d2a5efaf24
