I ran through and did some utility work on the top-level ScienceBase items to categorize them as "NDC Organizations" using a new vocabulary I set up in the ScienceBase code lists. This helps to tease out the high level containers that were established to set permissions for management of the items.

The expectation is that these top-level collections will contain the collection records that are the heart of the catalog. This notebook describes how to programmitically access the org items in ScienceBase.

In [1]:
import requests
from IPython.display import display

The following lays out the logical query path, specifying a parent ID (the National Digital Catalog in ScienceBase) and the explicit tag I added into the items. This query will unequivically retrieve the org items we want to work with and build additional logic around.

In [2]:
parentId = '4f4e4760e4b07f02db47dfb4'
tagScheme = {"scheme":"https://www.sciencebase.gov/vocab/category/NGGDPP/nggdpp_collection_types","name":"ndc_organization"}
sbQueryPath = f'https://www.sciencebase.gov/catalog/items?format=json&max=50&fields=title,contacts,spatial&parentId={parentId}&filter=tags%3D{tagScheme}'
print(sbQueryPath)

https://www.sciencebase.gov/catalog/items?format=json&max=50&fields=title,contacts,spatial&parentId=4f4e4760e4b07f02db47dfb4&filter=tags%3D{'scheme': 'https://www.sciencebase.gov/vocab/category/NGGDPP/nggdpp_collection_types', 'name': 'ndc_organization'}


In [3]:
# Retrieve the org items
r_ndc_org = requests.get(sbQueryPath).json()

I pulled two essential properties for the org items that will likely be used to help improve metadata for the collection items in these containers:

* Contacts contains a "Data Owner" type contact that should be a reasonably populated responsible party entity for these records. These should be reviewed for current information as they look like contacts that I added a long time ago to the ScienceBase Directory. I did go through and do a little cleanup manually in ScienceBase in a few cases where I did not see Data Owner contacts listed.
* The spatial property here contains a reasonable bounding box for most of the items generated by tying a state ID to the items in ScienceBase. This might be used to generate a bounding box for collection items to build at least reasonable harvestable metadata for cases where there are not actual items presented in some way as yet.

In [4]:
display(r_ndc_org['items'])

[{'contacts': [{'active': True,
    'contactType': 'organization',
    'logoUrl': 'http://my.usgs.gov/static-cache/images/dataOwner/v1/logosMed/NDLogo.gif',
    'name': 'North Dakota Geological Survey',
    'oldPartyId': 18256,
    'onlineResource': 'https://www.dmr.nd.gov/ndgs/',
    'primaryLocation': {'mailAddress': {'city': 'Bismarck',
      'line1': '600 East Boulevard Avenue',
      'state': 'ND',
      'zip': '58505-0840'},
     'name': 'North Dakota Geological Survey',
     'officePhone': '7013288000',
     'streetAddress': {'city': 'Bismarck',
      'line1': '1016 E. Calgary Ave.',
      'state': 'ND',
      'zip': '58503'}},
    'smallLogoUrl': 'http://my.usgs.gov/static-cache/images/dataOwner/v1/logosSmall/NDLogo.gif',
    'type': 'Data Owner'}],
  'id': '4f4e4761e4b07f02db47dfe0',
  'link': {'rel': 'self',
   'url': 'https://www.sciencebase.gov/catalog/item/4f4e4761e4b07f02db47dfe0'},
  'relatedItems': {'link': {'rel': 'related',
    'url': 'https://www.sciencebase.gov/cata