One of my goals in the project is to develop a roadmap for collection metadata improvements. I will be building all of the items within those collections into a suite of search indexes with an API for discovery, but at the end of the day, the metadata at the collection level is going to be necessary for further understanding, using, and gaining access to the described materials. I may end up wanting to incorporate additional attributes in the items themselves, pulled in from collection metadata, in order to make them more discoverable and functional. Users who find something interesting in a search will need details about who to get in touch with for more information.

The collection records will also be vital in controlling the process for registering and maintaining International GeoSample Numbers associated with items in the collections. We will need necessary attributes in the metadata to determine how numbers should be assigned and specific rules that should be applied. Collection metadata will determine the IGSN namespace to use, and some collections might want to generate IGSNs that put the namespace together with some onboard ID to build IGSNs that can be referenced back to source material.

My plan is to pursue high quality metadata for collections using the ISO19115 standard. We have a start to these records via the ScienceBase transformation to ISO XML, but it is pretty rudimentary and based on first generation ISO19115/19139. It does include the essential elements of descriptive information, contacts, ScienceBase identifier, and links (including any onboard webLinks and access to basic footprint geospatial services). If I feedback a webLink for access to the API I'll be building, then the metadata records would provide a distribution point for the actual items within a collection. As we work through ways of getting back to richer source material from data owners, we will also be including those as web links, providing additional detail for collections. Spatial information may be sufficient if we either use the ScienceBase footprinting mechanism or else feed something back to the collection items from the API with a bounding box for their actual contents. Some code can also be written to align metadata at the "ndc_organization" level with their associated collections (e.g., inherit bounding box for a state if spatial information not specified for a collection).

In [1]:
import requests
from IPython.display import display

parentId = '4f4e4760e4b07f02db47dfb4'
queryRoot = 'https://www.sciencebase.gov/catalog/items?format=json&max=1000&'

def ndc_collection_type_tag(tag_name,include_type=True):
    vocab_search_url = f'https://www.sciencebase.gov/vocab/5bf3f7bce4b00ce5fb627d57/terms?nodeType=term&format=json&name={tag_name}'
    r_vocab_search = requests.get(vocab_search_url).json()
    if len(r_vocab_search['list']) == 1:
        tag = {'name':r_vocab_search['list'][0]['name'],'scheme':r_vocab_search['list'][0]['scheme']}
        if include_type:
            tag['type'] = 'theme'
        return tag
    else:
        return None

# ISO metadata listing
There is currently no advertised method in the ScienceBase Item output from the API for the ISO XML form of the metadata. This seems to be purely a web UI construct. To help provide at least a listing for every collection, I've put together a report here grouped by organization.

In [2]:
tag_scheme_orgs = ndc_collection_type_tag('ndc_organization',False)
fields_orgs = 'title'
sb_query_orgs = f'{queryRoot}fields={fields_orgs}&folderId={parentId}&filter=tags%3D{tag_scheme_orgs}'
ndc_orgs = requests.get(sb_query_orgs).json()

In [3]:
fields_collections = 'title'
tag_scheme_collections = ndc_collection_type_tag('ndc_collection',False)
catalog_item_root = 'https://www.sciencebase.gov/catalog/item/'

for org in ndc_orgs['items']:
    print(org['title'])
    org_id = org['id']
    sb_query_orgs = f'{queryRoot}fields={fields_collections}&folderId={org_id}&filter=tags%3D{tag_scheme_collections}'
    org_collections = requests.get(sb_query_orgs).json()
    for collection in org_collections['items']:
        print('* ', collection['title'], f"{catalog_item_root}{collection['id']}?format=iso")
    print('==============')

North Dakota Geological Survey
*  Collection of Field notes from North Dakota https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df458?format=iso
*  Collection of Rock cores from North Dakota https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df1e9?format=iso
*  Collection of Well logs from North Dakota https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df537?format=iso
*  Collection of Paleontological samples from North Dakota https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df1eb?format=iso
*  Collection of Photographs from North Dakota https://www.sciencebase.gov/catalog/item/4f4e49cfe4b07f02db5da58a?format=iso
*  Collection of Hand samples from North Dakota https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df1df?format=iso
Tennessee Geological Survey
*  Collection of Oil & Gas Well Data File from Tennessee https://www.sciencebase.gov/catalog/item/4f4e4aaae4b07f02db66937b?format=iso
*  Collection of Geotechnical Engineering Reports fro

*  Collection of Geologic Maps  from California https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df319?format=iso
*  Aerial photography from California https://www.sciencebase.gov/catalog/item/4fb5480ce4b04cb937751d69?format=iso
*  Seismic Hazard Zonation Geotechnical Reports for California https://www.sciencebase.gov/catalog/item/56e85bb0e4b0f59b85d712ff?format=iso
*  Aerial Photography Collection - Southern California https://www.sciencebase.gov/catalog/item/5bc79d1ce4b0fc368ebe063b?format=iso
Alaska Division of Geological and Geophysical Surveys
*  Collection of Field Photographs from Alaska https://www.sciencebase.gov/catalog/item/55ce5b1ee4b01487cbfc7104?format=iso
*  Collection of Field Notes and Unpublished Maps from Alaska https://www.sciencebase.gov/catalog/item/57bb5f55e4b03fd6b7dd0532?format=iso
*  Collection of Thin sections and polished sections from Alaska https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df4fa?format=iso
*  Collection of hand samples

*  Colorado Denver, Hanson Creek, Uranium Exploration Prospect Miscellaneous Information https://www.sciencebase.gov/catalog/item/5a9d855de4b06990607186f4?format=iso
*  Colorado Denver, Bijou Creek, Uranium Exploration Prospect Miscellaneous Information https://www.sciencebase.gov/catalog/item/5a9d8465e4b06990607186ed?format=iso
U.S. Geological Survey
*  USGS Core Research Center (CRC) Collection of Core https://www.sciencebase.gov/catalog/item/4f4e49dae4b07f02db5e0486?format=iso
*  USGS Core Research Center (CRC) Collection of Cuttings https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df2d2?format=iso
*  USGS Denver Paleontology Collection https://www.sciencebase.gov/catalog/item/5873ee76e4b0a829a31f8350?format=iso
*  USGS Core Research Center - Physical Samples https://www.sciencebase.gov/catalog/item/54f86585e4b02419550d9a2f?format=iso
*  Doug Rankin Legacy Geologic Maps https://www.sciencebase.gov/catalog/item/59234e52e4b0b7ff9fb11f54?format=iso
National Ice Core Laborato

*  Collection of Chips from TX https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df1fa?format=iso
*  Collection of Rock Cuttings from Texas https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df1fd?format=iso
*  Collection of Well Logs from TX https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df202?format=iso
*  Collection of Well Logs from TX https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df368?format=iso
*  Collection of Rock Cores from TX https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df3d2?format=iso
*  Collection of thin sections, paleo slides from TX https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df1ff?format=iso
*  Collection of Rock Hand Specimens from TX https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df200?format=iso
*  Collection of Rock cores from TX https://www.sciencebase.gov/catalog/item/4f4e49cbe4b07f02db5d84fe?format=iso
*  Collection of Scanned Index Cards from Bureau of Economic Geology Co

*  Collection of MX Missile Project Files from Nevada https://www.sciencebase.gov/catalog/item/4f4e49cbe4b07f02db5d88b9?format=iso
*  Collection of Geothermal Resources of Nevada from Nevada https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df214?format=iso
*  Nevada Bureau of Mines and Geology Sample Collection Database https://www.sciencebase.gov/catalog/item/5494ba35e4b0a2b9adad8ee3?format=iso
*  Nevada Bureau of Mines and Geology Nevada Mining District Files https://www.sciencebase.gov/catalog/item/4f4e49cbe4b07f02db5d877f?format=iso
*  Collection of Oil and Gas Well Paper Reports from Nevada https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df325?format=iso
*  Nevada Bureau of Mines and Geology Serial Publications https://www.sciencebase.gov/catalog/item/5a162974e4b09fc93dd171ab?format=iso
*  Collection of Aerial Photographs from Nevada https://www.sciencebase.gov/catalog/item/4f4e49cbe4b07f02db5d885c?format=iso
*  Collection of Engineering Files from Nevada ht

*  Vermont Geological Survey Environmental Geology Series https://www.sciencebase.gov/catalog/item/53625987e4b0c409c6289a90?format=iso
*  Vermont Geological Survey Collection of Rock Geochemistry Data I https://www.sciencebase.gov/catalog/item/53f7913ae4b05ec1f246ef00?format=iso
*  Vermont Geological Survey Collection of Rock Geochemistry Data II https://www.sciencebase.gov/catalog/item/53f796c8e4b05ec1f246ef13?format=iso
*  Vermont Geological Survey Collection of Bedrock Thermal Conductivity Data https://www.sciencebase.gov/catalog/item/53f78eefe4b05ec1f246eef9?format=iso
*  Vermont Geological Survey Collection of Digital Photographs https://www.sciencebase.gov/catalog/item/5362a976e4b0c409c6289b8c?format=iso
*  Vermont Geological Survey Technical Report Series https://www.sciencebase.gov/catalog/item/53626183e4b0c409c6289a9f?format=iso
*  Vermont Geological Survey Open File Reports https://www.sciencebase.gov/catalog/item/536253b7e4b0c409c6289a82?format=iso
*  Vermont Geological Surv

*  Collection of Rock properties database from MN https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df226?format=iso
*  Collection of Mineral exploration files from MN https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df2fe?format=iso
*  Collection of Well logs from MN https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df225?format=iso
*  Collection of Borehole geophysical logs from MN https://www.sciencebase.gov/catalog/item/5bc0f6e1e4b0fc368eb70156?format=iso
*  Collection of Karst database from MN https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df391?format=iso
*  Collection of Geotechnical drilling from MN https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df514?format=iso
*  Collection of Geochemical samples from MN https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df103?format=iso
*  Collection of Sediment textural and lithological data from MN https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df306?format=iso


*  Collection of Hand samples from Nebraska https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df4d2?format=iso
*  Collection of lithology logs from Nebraska https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df485?format=iso
*  Collection of Maps from Nebraska https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df47c?format=iso
*  Collection of Cores from Nebraska https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df1ee?format=iso
*  Collection of Well logs from Nebraska https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df480?format=iso
*  Collection of Drilling/completion reports from Nebraska https://www.sciencebase.gov/catalog/item/4f4e49d3e4b07f02db5dc6b5?format=iso
*  Collection of Auger samples from Nebraska https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df0f3?format=iso
*  Collection from NE https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df477?format=iso
*  Collection of Thin sections and polished sections f

*  Collection of geologic maps from Louisiana https://www.sciencebase.gov/catalog/item/4f4e4814e4b07f02db4db032?format=iso
*  Collection of well logs from Louisiana https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df346?format=iso
*  Collection of environmental fluid samples from Louisiana https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df3db?format=iso
*  Collection of Louisiana Geological Survey Publications https://www.sciencebase.gov/catalog/item/5612cd9fe4b0ba4884c609ff?format=iso
*  Collection of rock cores from Louisiana https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df34a?format=iso
Kansas Geological Survey
*  Collection of routine aquifer analysis data from Kansas https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df294?format=iso
*  Collection of rock cores from Kansas https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df244?format=iso
*  Collection of rock cuttings from Kansas https://www.sciencebase.gov/catalog/item/4f4e

*  Collection of Rock cores from Connecticut https://www.sciencebase.gov/catalog/item/4f4e49cfe4b07f02db5da90e?format=iso
*  Connecticut Quaternary and Surficial Geology Quadrangle https://www.sciencebase.gov/catalog/item/5228d413e4b06291bed806d4?format=iso
*  Hartford 2 degree Maps https://www.sciencebase.gov/catalog/item/530e5361e4b0929320b22a4d?format=iso
*  Rodgers Bedrock Compilation Sheets https://www.sciencebase.gov/catalog/item/52aa1f43e4b098bc4034b247?format=iso
*  Unpublished Surficial Geology Quadrangle Maps https://www.sciencebase.gov/catalog/item/52caf61de4b017ba5c69d147?format=iso
*  Unpublished Connecticut Carbonate Set https://www.sciencebase.gov/catalog/item/5228df63e4b06291bed806fd?format=iso
*  Unpublished Smith Map Set of Mines and Quarries of Connecticut https://www.sciencebase.gov/catalog/item/5294d093e4b01cca2b11df4a?format=iso
*  Collection of Rock cuttings from Connecticut https://www.sciencebase.gov/catalog/item/4f4e49cfe4b07f02db5da945?format=iso
*  Connectic

*  Collection of Geochemical Samples from Michigan https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df3de?format=iso
*  Collection of Paleontological samples from Michigan https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df288?format=iso
*  Collection of Scout Tickets from Michigan https://www.sciencebase.gov/catalog/item/537b6e28e4b0929ba496abf2?format=iso
*  Collection of Well Logs from Michigan https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df27c?format=iso
*  Collection of hand samples from Michigan https://www.sciencebase.gov/catalog/item/4f4e49d8e4b07f02db5df289?format=iso
*  Collection of Rock Core from Michigan https://www.sciencebase.gov/catalog/item/51dddff5e4b0f72b44722554?format=iso
*  Collection of Thin Sections from Michigan https://www.sciencebase.gov/catalog/item/5b856479e4b05f6e321d03eb?format=iso
*  Collection of Rock Core Analyses from Michigan - 2013 addition https://www.sciencebase.gov/catalog/item/51e034b2e4b0d332bf22f5b8?format=

# Validation and enhancement
I think it would be an interesting exercise to approach validation and enhancement for NDC collection metadata from the standpoint of working toward those records as fully built ISO19115 metadata as opposed to the current state that is limited to the ScienceBase Item model. ScienceBase should be moving to a point where many of its items are based on the ISO standard (or a small number of other viable options). This should be the approach whenever the standard itself or an implementation community has evolved the structure suited to documenting a particular class of items. In this case, the collections are documenting a set of physical artifacts with a partial digital representation, and the collection resource type seems well suited for this purpose. The current ScienceBase instantiation of a simplistic ISO19139 representation of an item is hard coded to "dataset" for resource type (gmd:MD_ScopeCode).

The rubric approach developed by Ted Haberman and others for ISO metadata would seem a good way to go here, and I'll reach out to see about soliciting interest in partnering and contributing to the project. We can use that to point out specific areas of this metadata collection that could be improved, and then engage data owners from the State Geological Surveys and other institutions in the process.

We will need to figure out and recommend an approach to improving the records. One route would be to send data owners to a viable external metadata editor such as the [MDEditor](https://go.mdeditor.org) developed by our friends in Alaska. That tool has some built-in integration with ScienceBase that I haven't yet tested that might prove useful here. ScienceBase does not have an ability to parse uploaded ISO metadata into applicable parts of the ScienceBase Item model now, so sending users elsewhere to create good metadata and then tying that back into the ScienceBase Items might be the best approach. Unfortunately, it looks like with the MDEditor approach, we would first have to help users run a conversion to the mdJSON format (probably directly from sbJSON) to import into the tool, since they seem to have not built support for XML import as yet. But there are also other options if we take this tack. In any case, we would probably want to take a fresh look at the sum total of current documentation, including the metadata.xml files from the original survey of State Geological Surveys about their collections to make sure we give the documentation process the most complete start from existing information.

## Fiddling
I ran across the gis_metadata Python package some time ago and threw in some quick examples here of reading one of the current ISO XML records from ScienceBase and parsing it to display a few attributes. A much more user friendly but costlier and less flexible approach to improving collection documentation would be to provide a custom application that builds from a rubric approach evaluating current documentation, presents a user with a series of simple questions to help fill in specific blanks and improve existing information, and then feeds the results right back to the items. This could all be built into the API that I'm developing for this project and then incorporated into the NDC Dashboard for user interface, but that is more involved than I will personally have time for.

In [9]:
from gis_metadata.iso_metadata_parser import IsoParser
import urllib.request

In [15]:
with urllib.request.urlopen('https://www.sciencebase.gov/catalog/item/4f4e4aaae4b07f02db669370?format=iso') as iso_file:
    iso_from_file = IsoParser(iso_file.read())

    print(iso_from_file.title)
    print(iso_from_file.abstract)
    print(iso_from_file.place_keywords)
    print(iso_from_file.thematic_keywords)
    print(iso_from_file.attributes)
    print(iso_from_file.bounding_box)
    print(iso_from_file.contacts)
    print(iso_from_file.dates)
    print(iso_from_file.digital_forms)
    print(iso_from_file.larger_works)
    print(iso_from_file.process_steps)
    print(iso_from_file.raster_info)
    
    iso_from_file.validate()
    iso_string = iso_from_file.serialize()
    print(iso_string)


Collection of Coal Geology Maps from Tennessee
This coal map collection consists of maps depicting coal geology, cropline, coal thickness, mined out areas, and drill hole locations. Map detail ranges from intricate to simple. Maps include published and unpublished data. Some information is repeated on different maps. Various maps are printed on regular bond paper, blueline paper, and tracing paper. Maps are located by 7.5 minute quadrangle or 15 minute quadrangle basis.
[]
[]
[]
{'east': '-81.6474533081', 'south': '34.9838981628', 'west': '-90.3107452393', 'north': '36.6792449951'}
[]
{}
[{'name': '', 'content': '', 'decompression': '', 'version': '', 'specification': '', 'access_desc': 'Link to the ScienceBase Item Summary page for the item described by this metadata record', 'access_instrs': 'WWW:LINK-1.0-http--link', 'network_resource': 'https://www.sciencebase.gov/catalog/item/4f4e4aaae4b07f02db669370'}, {'name': '', 'content': '', 'decompression': '', 'version': '', 'specification