# Class activities November 20, 2023

Notes on: 

* process of extracting recrods from loc.gov (review)
* creation of CSV for import


## loc.gov JSON API

Querying the API

In [1]:
import requests 
import json 

In [2]:
endpoint = 'https://www.loc.gov/free-to-use/libraries/'

In [3]:
r = requests.get(endpoint, params={ 'fo':'json' })

In [4]:
r.status_code

200

In [5]:
libraries_set = r.json()

In [6]:
with open('library-set-json.json', 'w') as f:
    json.dump(libraries_set, f, indent=2)

Look for the elements in the file where you can find the list of items in the set... 

To see the initial elements:

In [7]:
for item in libraries_set:
    print(item)

breadcrumbs
content
content_is_post
description
expert_resources
next
next_sibling
options
pages
portal
previous
previous_sibling
site_type
timestamp
title
type


Then, individual elements can be referenced by name like keys in a dictionary. Thus:

In [8]:
for item in libraries_set['content']['set']['items']:
    print(item)

{'image': '/static/portals/free-to-use/public-domain/libraries/libraries-1.jpg', 'link': '/resource/cph.3f05183/', 'title': 'For greater knowledge, on more subjects, use your library more often. Illinois WPA Arts Project, 1936-1941. Prints & Photographs Division'}
{'image': '/static/portals/free-to-use/public-domain/libraries/libraries-2.jpg', 'link': '/resource/highsm.20336/', 'title': 'Noyes Library for Young Children. Kensington, Maryland. Photo by Carol M. Highsmith,  2011. Prints & Photographs Division'}
{'image': '/static/portals/free-to-use/public-domain/libraries/libraries-3.jpg', 'link': '/resource/fsa.8d24709/', 'title': 'Bethune-Cookman College. Students in the library reading room, Daytona Beach, Florida. Gordon Parks, 1943. Prints & Photographs Division'}
{'image': '/static/portals/free-to-use/public-domain/libraries/libraries-4.jpg', 'link': '/resource/highsm.36052/', 'title': 'Public library in Antonito,  Colorado, near the New Mexico border. Photo by Carol M. Highsmith,

And, looking further, find the identifiers (called `link`):

In [9]:
for item in libraries_set['content']['set']['items']:
    print(item['link'])

/resource/cph.3f05183/
/resource/highsm.20336/
/resource/fsa.8d24709/
/resource/highsm.36052/
/resource/highsm.51772/
/resource/cph.3b43255/
/resource/highsm.20483/
/resource/highsm.29207/
/resource/fsa.8b32222/
/resource/highsm.64003/
/resource/ppmsca.15412/
/resource/highsm.49335/
/resource/highsm.20497/
/resource/npcc.28724/
/resource/ds.06560/
/resource/hhh.hi0135.photos
/resource/cph.3f05168/
/resource/ppmsca.15375/
/resource/highsm.53335/
/resource/hhh.ks0072.photos/?sp=2
/resource/highsm.34640/
/resource/ppmsca.17588/
/resource/ppmsca.18016/
/resource/hhh.me0057.photos/?sp=1
/resource/highsm.41101/
/resource/det.4a17925/
/resource/ppmsca.15426/
/resource/ppmscd.00084/
/resource/highsm.18384/
/resource/g3851e.ct006252/
/resource/ppmsca.35590/
/resource/hhh.il0998.sheet/?sp=1
/resource/hhh.ok0012.sheet/?sp=8&q=hhh.ok0012
/resource/hhh.sc0767.photos/?sp=1
/resource/hhh.nj0089.photos/?sp=4
/resource/highsm.31350/
/resource/hhh.ri0071.photos/?sp=8
/resource/highsm.48241/
/resource/de

Finally, you can create valid URLs (which can then be looped back to individual API requests)...  

In [10]:
endpoint = 'https://www.loc.gov'

for item in libraries_set['content']['set']['items']:
    url = endpoint + item['link']
    print(url)

https://www.loc.gov/resource/cph.3f05183/
https://www.loc.gov/resource/highsm.20336/
https://www.loc.gov/resource/fsa.8d24709/
https://www.loc.gov/resource/highsm.36052/
https://www.loc.gov/resource/highsm.51772/
https://www.loc.gov/resource/cph.3b43255/
https://www.loc.gov/resource/highsm.20483/
https://www.loc.gov/resource/highsm.29207/
https://www.loc.gov/resource/fsa.8b32222/
https://www.loc.gov/resource/highsm.64003/
https://www.loc.gov/resource/ppmsca.15412/
https://www.loc.gov/resource/highsm.49335/
https://www.loc.gov/resource/highsm.20497/
https://www.loc.gov/resource/npcc.28724/
https://www.loc.gov/resource/ds.06560/
https://www.loc.gov/resource/hhh.hi0135.photos
https://www.loc.gov/resource/cph.3f05168/
https://www.loc.gov/resource/ppmsca.15375/
https://www.loc.gov/resource/highsm.53335/
https://www.loc.gov/resource/hhh.ks0072.photos/?sp=2
https://www.loc.gov/resource/highsm.34640/
https://www.loc.gov/resource/ppmsca.17588/
https://www.loc.gov/resource/ppmsca.18016/
https://

Use the URLs to save JSON locally:

In [18]:
endpoint = 'https://www.loc.gov'

for item in libraries_set['content']['set']['items']:
    url = endpoint + item['link']
    id = item['link'].split('/')[2]
    fname = 'item_metadata_' + id + '.json'
    try:
        r = requests.get(url, params={'fo':'json'})
        with open(fname, 'w', encoding='utf-8') as f:
            json.dump(r.json(), f, indent=2)
        print('wrote', fname)
    except:
        print('error for',id)

wrote item_metadata_cph.3f05183.json
wrote item_metadata_highsm.20336.json
wrote item_metadata_fsa.8d24709.json
wrote item_metadata_highsm.36052.json
wrote item_metadata_highsm.51772.json
wrote item_metadata_cph.3b43255.json
wrote item_metadata_highsm.20483.json
wrote item_metadata_highsm.29207.json
wrote item_metadata_fsa.8b32222.json
wrote item_metadata_highsm.64003.json
wrote item_metadata_ppmsca.15412.json
wrote item_metadata_highsm.49335.json
wrote item_metadata_highsm.20497.json
wrote item_metadata_npcc.28724.json
wrote item_metadata_ds.06560.json
wrote item_metadata_hhh.hi0135.photos.json
wrote item_metadata_cph.3f05168.json
wrote item_metadata_ppmsca.15375.json
wrote item_metadata_highsm.53335.json
wrote item_metadata_hhh.ks0072.photos.json
wrote item_metadata_highsm.34640.json
error for ppmsca.17588
wrote item_metadata_ppmsca.18016.json
wrote item_metadata_hhh.me0057.photos.json
wrote item_metadata_highsm.41101.json
wrote item_metadata_det.4a17925.json
wrote item_metadata_ppms

Most of the class was devoted to walking through the ETL workflow as it is outlined in the slides for Nov 20, 
then working through the full notebook and demonstrating the process of creating the item list CSV,
then using the CSV Import module in python to ingest the items to your Omeka S install.   