# USGS Thesaurus Terms
The Science Data Catalog indexing process pulls out terms that are declared in metadata to reference the USGS Thesaurus into their own set of keywords for search faceting. These are expressed through the API and pulled into our graphing process in the sdc_cache process executed previously. The iSAID - Verify Terms in Metadata notebook runs through a process that checks the asserted terms against the full USGS Thesaurus source (including other reference vocabularies included with the Thesaurus) to find those terms that actually do align in some way, building a dataset containing the terms, URLs/identifiers, and scope notes. In this codeblock, we process just these terms into the graph, so that we will only establish confirmed links to defined terms from that source to limit noise in our graph.

In [1]:
import isaid_helpers
import pandas as pd


In [2]:
pd.read_csv(isaid_helpers.f_graphable_thesaurus_terms).head()

Unnamed: 0,term,valid_term,usable_term,code,name,parent,scope,thesaurus_name,thesaurus_id,possible_sources,url
0,mining hazards,True,True,750,mining hazards,548.0,Dangerous conditions which result from the ext...,USGS Thesaurus,2,,https://apps.usgs.gov/thesaurus/term-simple.ph...
1,basins,True,True,95,basins,816.0,"Bowl-shaped, natural depressions in the surfac...",Alexandria Digital Library Feature Type Thesaurus,3,,https://apps.usgs.gov/thesaurus/term-simple.ph...
2,air temperature,True,True,27,air temperature,67.0,,USGS Thesaurus,2,,https://apps.usgs.gov/thesaurus/term-simple.ph...
3,Seismology,True,True,1050,seismology,470.0,Branch of earth sciences concerned with the st...,USGS Thesaurus,2,"[{'code': 739, 'name': 'seismology', 'parent':...",https://apps.usgs.gov/thesaurus/term-simple.ph...
4,Algae,True,True,29,algae,843.0,Chlorophyll-bearing primarily aquatic nonvascu...,USGS Thesaurus,2,,https://apps.usgs.gov/thesaurus/term-simple.ph...


In [3]:
%%time
with isaid_helpers.graph_driver.session(database=isaid_helpers.graphdb) as session:
    session.run("""
        LOAD CSV WITH HEADERS FROM '%(source_path)s/%(source_file)s' AS row
        WITH row
            MERGE (t:DefinedSubjectMatter {name: row.name})
            ON CREATE
                SET t.url = row.url,
                t.description = row.scope,
                t.thesaurus_name = row.thesaurus_name
    """ % {
        "source_path": isaid_helpers.local_cache_path,
        "source_file": isaid_helpers.f_graphable_thesaurus_terms
    })

CPU times: user 1.06 ms, sys: 2.33 ms, total: 3.39 ms
Wall time: 383 ms


In [4]:
pd.read_csv(isaid_helpers.f_graphable_place_names).head()

Unnamed: 0,term,possible_sources,valid_term,usable_term,code,name,parent,scope,thesaurus_name
0,TOGO,"[{'code': 'fTO', 'name': 'Togo', 'parent': 'fL...",True,True,fTO,Togo,fLD50,country,Common geographic areas (USGS Thesaurus)
1,Ronceverte,,True,True,q38082NEE3,Ronceverte,q38082NE,"map quadrangle, 7.5 minute",Common geographic areas (USGS Thesaurus)
2,Arlington,"[{'code': 'f51013', 'name': 'Arlington', 'pare...",True,True,f51013,Arlington,fUS51,county,Common geographic areas (USGS Thesaurus)
3,New Hanover,,True,True,f37129,New Hanover,fUS37,county,Common geographic areas (USGS Thesaurus)
4,Moriches,,True,True,q41074NEB2,Moriches,q41074NE,"map quadrangle, 7.5 minute",Common geographic areas (USGS Thesaurus)


In [5]:
%%time
with isaid_helpers.graph_driver.session(database=isaid_helpers.graphdb) as session:
    session.run("""
        LOAD CSV WITH HEADERS FROM '%(source_path)s/%(source_file)s' AS row
        WITH row
            MERGE (l:Location {name: row.name})
            ON CREATE
                SET l.local_id = row.code,
                l.description = row.scope,
                l.thesaurus_name = row.thesaurus_name
    """ % {
        "source_path": isaid_helpers.local_cache_path,
        "source_file": isaid_helpers.f_graphable_place_names
    })

CPU times: user 993 µs, sys: 1.41 ms, total: 2.4 ms
Wall time: 86.6 ms
