This notebook introduces National Park Service units to the GeoKB. The originating use case for these came from work to represent Geo-Heritage sites, but NPS units will be needed for many other purposes. Like with our other reference sources, we don't need a fully comprehensive representation of "Parks" in the GeoKB, but we need to have established identities we can link to and whatever characteristics of those entities we need for our uses (e.g., attributes we need to query on or include in query outputs/reports). In the initial representation, we are including the following concepts:

* Park name and any alternates we know about
* Point location for basic spatial reference
* US States the park is located within
* URL to an official web site (because it's there and could be useful in some circumstances as a linking factor)

NPS also uses a unique 4-character alpha code for all park units that is used as a route to many other details and related information through their API and other interface points. This necessitated the introduction of a new property. The formatter URL we used for the parkCode essentially gives us the same functionality as the official URL, but we include both for different kinds of utility.

We also needed to introduce the National Park Service itself as a DOI Bureau so we can indicate the higher level managing entity for the units.

I initially tried to build out a same as linkage to Wikidata using a higher level classification for protected areas that gets us many of the units for NPS but also other managing entities, but I need to do some more work on understanding how those entities have been introduced and what I can align with effectively. For now, I've punted on this issue as well in the interest of getting our GeoKB entities established.

In [1]:
from wbmaker import WikibaseConnection
import pandas as pd
import requests
import os

In [2]:
geokb = WikibaseConnection('GEOKB_CLOUD')

# Original Source

Wherever possible in the GeoKB, I am attempting to work with some best available original source. I want something that is an official outlet of whoever the managing authority is that is also technically easy to work with in code. In this case, I'm experimenting with the official NPS REST API, which seems to provide usable enough information on park units along with some other useful assets that we might incorporate for other purposes.

I'm continuing to hash through all of the best ways to document sources in the GeoKB and haven't settled on the ultimate solution (if there is one). I'd like to be able to have the GeoKB fully explain how it gets everything with most if not all configuration details stored within entity claims, but I have to put more time on establishing the model. I at least want to have anything built out from a source point to a source entity as reference that we can continue building out. In this case, I established a [knowledgebase source item](https://geokb.wikibase.cloud/wiki/Item:Q158224) for the specific route used in the REST API. Logically, this could include the base HTTP route. We could be fully detailed and provide the version number of the API, request parameters, and even things like a standard variable used to provide an API key (something required in this case). For now, I'm punting on those details for later work. I drop the source item QID in so we can build a reference on claims.

In [3]:
source_item = "Q158224"
nps_entity = "Q158217"

nps_parks_api = f"https://developer.nps.gov/api/v1/parks?limit=1000&api_key={os.environ['NPS_API_KEY']}"

r_nps_parks = requests.get(nps_parks_api)

In [4]:
r_nps_parks.json()['data'][0]

{'id': '77E0D7F0-1942-494A-ACE2-9004D2BDC59E',
 'url': 'https://www.nps.gov/abli/index.htm',
 'fullName': 'Abraham Lincoln Birthplace National Historical Park',
 'parkCode': 'abli',
 'description': "For over a century people from around the world have come to rural Central Kentucky to honor the humble beginnings of our 16th president, Abraham Lincoln. His early life on Kentucky's frontier shaped his character and prepared him to lead the nation through Civil War. Visit our country's first memorial to Lincoln, built with donations from young and old, and the site of his childhood home.",
 'latitude': '37.5858662',
 'longitude': '-85.67330523',
 'latLong': 'lat:37.5858662, long:-85.67330523',
 'activities': [{'id': '13A57703-BB1A-41A2-94B8-53B692EB7238',
   'name': 'Astronomy'},
  {'id': 'D37A0003-8317-4F04-8FB0-4CF0A272E195', 'name': 'Stargazing'},
  {'id': '1DFACD97-1B9C-4F5A-80F2-05593604799E', 'name': 'Food'},
  {'id': 'C6D3230A-2CEA-4AFE-BFF3-DC1E2C2C4BB4', 'name': 'Picnicking'},
  

In [5]:
df_parks = pd.DataFrame(r_nps_parks.json()['data'])

# Classification

Further work will be needed to synthesize a higher level concept for classification of NPS "parks." The listing of unique designation values below shows that there are units via the API that do not have any designation. The listing presented on [this NPS web site](https://www.nps.gov/aboutus/national-park-system.htm) is likely more reasonable, but that's not what comes through in the API. We'll look at things like how the USGS PADUS source handles this as well to make a determination on what our best route will be. In the meantime, I'm punting on the issue and simply using a high-level temporary classification of a "National Park Service Unit." I did go ahead and pull the designation to use in the description string for now as a way to at least introduce the values (when they exist).

In [6]:
classification_item = "Q158222"

df_parks['designation'].unique()

array(['National Historical Park', 'National Park', '',
       'National Monument', 'National Historic Trail', 'Wild River',
       'National Historic Area', 'National Historic Site', 'Park',
       'National Recreation Area', 'National Monument & Preserve',
       'National Battlefield', 'National Lakeshore',
       'National Scenic Trail', 'National Memorial', 'National Seashore',
       'Parkway', 'National Preserve', 'National River & Recreation Area',
       'National Scenic River', 'National Battlefield Site',
       'National River', 'Part of Colonial National Historical Park',
       'National Military Park', 'National Reserve',
       'National Park & Preserve', 'Memorial',
       'National Historical Reserve',
       'Part of Statue of Liberty National Monument',
       'National Monument and Historic Shrine', 'Memorial Parkway',
       'National Geologic Trail', 'National Historical Park and Preserve',
       'National Battlefield Park', 'National Wild and Scenic River',
   

# Data Prep

We need to do a couple of things to prep our park unit data for representation in the GeoKB.
* In order to link to state/territory items, we pull those as a reference using the FIPS alpha codes that are what NPS has in their "states" fields. We pull a reference set from the GeoKB via SPARQL and set it up as a simple key/value store.
* I generate a description using the designation where it exists or a default string just so the item is recognizable for what it is.

In [7]:
query_fips_codes = """
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?fips
WHERE {
  ?item wdt:P13 ?fips .
}
"""

fips_codes = geokb.sparql_query(
    query=query_fips_codes,
    endpoint=geokb.sparql_endpoint,
    output="dataframe"
)

fips_codes['qid'] = fips_codes['item'].apply(lambda x: x.split('/')[-1])
fips_lookup = fips_codes.set_index('fips')['qid'].to_dict()

In [8]:
park_units = df_parks[['parkCode','fullName','name','url','states','latitude','longitude','designation']].reset_index(drop=True)

park_units['description'] = park_units['designation'].apply(lambda x: f'a {x} managed by the National Park Service' if len(x) > 0 else 'a National Park Service Unit')
park_units.drop(columns='designation', inplace=True)
park_units['states'] = park_units['states'].apply(lambda x: x.split(','))

# Commit to GeoKB

In [9]:
park_units.head()

Unnamed: 0,parkCode,fullName,name,url,states,latitude,longitude,description
0,abli,Abraham Lincoln Birthplace National Historical...,Abraham Lincoln Birthplace,https://www.nps.gov/abli/index.htm,[KY],37.5858662,-85.67330523,a National Historical Park managed by the Nati...
1,acad,Acadia National Park,Acadia,https://www.nps.gov/acad/index.htm,[ME],44.409286,-68.247501,a National Park managed by the National Park S...
2,adam,Adams National Historical Park,Adams,https://www.nps.gov/adam/index.htm,[MA],42.2553961,-71.01160356,a National Historical Park managed by the Nati...
3,afam,African American Civil War Memorial,African American Civil War Memorial,https://www.nps.gov/afam/index.htm,[DC],38.9166,-77.026,a National Park Service Unit
4,afbg,African Burial Ground National Monument,African Burial Ground,https://www.nps.gov/afbg/index.htm,[NY],40.71452681,-74.00447358,a National Monument managed by the National Pa...


In [12]:
refs = geokb.models.References()
refs.add(
    geokb.datatypes.Item(
        prop_nr=geokb.prop_lookup['data source'],
        value=source_item
    )
)

for index, row in park_units.iterrows():
    item = geokb.wbi.item.new()

    item.labels.set('en', row['fullName'])
    item.descriptions.set('en', row['description'])

    if row['name'] != row['fullName']:
        item.aliases.set('en', row['name'])

    item.claims.add(
        geokb.datatypes.Item(
            prop_nr=geokb.prop_lookup['instance of'],
            value=classification_item,
            references=refs
        )
    )

    item.claims.add(
        geokb.datatypes.ExternalID(
            prop_nr=geokb.prop_lookup['NPS Park Code'],
            value=row['parkCode'],
            references=refs
        )
    )

    item.claims.add(
        geokb.datatypes.Item(
            prop_nr=geokb.prop_lookup['operator'],
            value=nps_entity,
            references=refs
        )
    )

    item.claims.add(
        geokb.datatypes.URL(
            prop_nr=geokb.prop_lookup['official website'],
            value=row['url'],
            references=refs
        )
    )

    if len(row['latitude']) > 0:
        item.claims.add(
            geokb.datatypes.GlobeCoordinate(
                prop_nr=geokb.prop_lookup['coordinate location'],
                latitude=float(row['latitude']),
                longitude=float(row['longitude']),
                references=refs
            )
        )

    state_location_claims = []
    for st_fips in row['states']:
        state_location_claims.append(
            geokb.datatypes.Item(
                prop_nr=geokb.prop_lookup['located in the administrative territorial entity'],
                value=fips_lookup[st_fips],
                references=refs
            )
        )
    item.claims.add(state_location_claims)

    response = item.write(
        summary="Added initial entity representing a National Park Service Unit"
    )
    print(row['fullName'], response.id)

Abraham Lincoln Birthplace National Historical Park Q158226
Acadia National Park Q158227
Adams National Historical Park Q158228
African American Civil War Memorial Q158229
African Burial Ground National Monument Q158230
Agate Fossil Beds National Monument Q158231
Ala Kahakai National Historic Trail Q158232
Alagnak Wild River Q158233
Alaska Public Lands Q158234
Alcatraz Island Q158235
Aleutian Islands World War II National Historic Area Q158236
Alibates Flint Quarries National Monument Q158237
Allegheny Portage Railroad National Historic Site Q158238
Amache National Historic Site Q158239
American Memorial Park Q158240
Amistad National Recreation Area Q158241
Anacostia Park Q158242
Andersonville National Historic Site Q158243
Andrew Johnson National Historic Site Q158244
Aniakchak National Monument & Preserve Q158245
Antietam National Battlefield Q158246
Apostle Islands National Lakeshore Q158247
Appalachian National Scenic Trail Q158248
Appomattox Court House National Historical Park Q1