# JSON (Dictionaries) & APIs

This notebook introduces Python dictionaries and provides a walkthrough of using the `requests` library
to retrieve data from a REST API. 

## Contents

* Working with [JSON in Python](https://docs.python.org/3/library/json.html) 
  * `.load()`
  * `.loads()`
  * `.dump()`
  * `.dumps()`
* Using [Requests](https://requests.readthedocs.io/en/latest/)
  * Allows you to use python to make HTTP requests, then create an API call
  * Use a dictionary for paramters
  * Make an API call
  * Save to a local file

### Reading JSON into Python

To convert JSON into a processable object for python, use the JSON module and the `.loads()` function. For example:

In [1]:
import json

In [2]:
stringOfJsonData = '{ "Results": 2, "isPrinted" : true, "format" : null, "books": [{ "title": "Gone with the Wind", "author":"Mitchell" },{ "title": "Wuthering Heights", "author": "Bronte" } ]}'

jsonDataAsPythonValue = json.loads(stringOfJsonData)

jsonDataAsPythonValue

{'Results': 2,
 'isPrinted': True,
 'format': None,
 'books': [{'title': 'Gone with the Wind', 'author': 'Mitchell'},
  {'title': 'Wuthering Heights', 'author': 'Bronte'}]}

In [3]:
for key in jsonDataAsPythonValue:
    print(key)

Results
isPrinted
format
books


### Writing JSON from Python

To write out the data into a JSON file, use the `.dumps()` function. For example:

In [4]:
jsonDataAsPythonValue

{'Results': 2,
 'isPrinted': True,
 'format': None,
 'books': [{'title': 'Gone with the Wind', 'author': 'Mitchell'},
  {'title': 'Wuthering Heights', 'author': 'Bronte'}]}

In [5]:
# uncomment next line if you haven't yet imported the json module
#import json

stringOfJsonData = json.dumps(jsonDataAsPythonValue)

stringOfJsonData

'{"Results": 2, "isPrinted": true, "format": null, "books": [{"title": "Gone with the Wind", "author": "Mitchell"}, {"title": "Wuthering Heights", "author": "Bronte"}]}'

In [6]:
# write to a file

with open('json-out-test.json', 'w', encoding='utf-8') as f:
    f.write(json.dumps(jsonDataAsPythonValue, indent=2))
    print('wrote',f.name)

wrote json-out-test.json


Next, we will use the JSON techniques to parse and process the data from an API response. The response is providing us a serialization of metadata in JSON.

## Making an API call

### First, using requests

Let's use requests to scrape some data from an API endpoint. In this case, we can use the Library of Congress
search function, which is a REST API that responds to HTTP requests.

The documentation for requests can be found here: http://docs.python-requests.org/en/master/ 

The endpoint for the search query is `http://www.loc.gov/search/`

In [7]:
import requests

searchEndpoint = 'http://www.loc.gov/search/'

To pass in the parameters, we can use a dictionary! Let's try using `params`

In [8]:
parameters = {
    'fo' : 'json',
    'q'  : 'kittens',
    'fa' : 'online-format:image'
}

In [9]:
r = requests.get(searchEndpoint, params = parameters)

print('You requested:',r.url)
print('HTTP server response code:',r.status_code)
print('HTTP response headers',r.headers)

# notice that the headers method returns a dictionary, too? 
# We could ask what sort of content it's returning:

print('\nYour request has this content type:\n',r.headers['content-type'])

You requested: https://www.loc.gov/search/?fa=online-format%3Aimage&fo=json&q=kittens
HTTP server response code: 200
HTTP response headers {'Date': 'Sun, 27 Oct 2024 23:43:01 GMT', 'Content-Type': 'application/json', 'Content-Length': '185805', 'Connection': 'keep-alive', 'access-control-allow-origin': '*', 'referrer-policy': 'no-referrer-when-downgrade', 'strict-transport-security': 'max-age=3600; preload', 'x-content-type-options': 'nosniff', 'x-robots-tag': 'noindex, nofollow', 'x-frame-options': 'sameorigin', 'etag': '"d0ebe27c6ccf37b4dd4c04796110cdf2"', 'expires': 'Mon, 28 Oct 2024 23:43:00 GMT', 'content-security-policy': "block-all-mixed-content;         default-src https://loc.gov/ https://*.loc.gov/ ;         media-src https://loc.gov/ https://*.loc.gov/              https://*.readspeaker.com/             https://*.arcgis.com/ https://*.arcgisonline.com/  https://webapps-cdn.esri.com/             blob:;         worker-src https://loc.gov/ https://*.loc.gov/              blob:;

So the request has returned a json object! Access the response using the `.text` method. 

In [10]:
r.text[:500]

'{"breadcrumbs": [{"Library of Congress": "https://www.loc.gov"}, {"Search": "https://www.loc.gov/search/?fa=online-format:image&fo=json&q=kittens"}], "expert_resources": null, "facet_trail": [{"facet": "searchTerms", "field": "searchTerms", "superset": "https://www.loc.gov/search/?fa=online-format:image&fo=json", "value": "kittens"}, {"facet": "Available Online", "field": "digitized", "superset": "https://www.loc.gov/search/?all=true&fa=online-format:image&fo=json&q=kittens", "value": "digitized'

In [11]:
for element in r.json():
    print(element)

breadcrumbs
expert_resources
facet_trail
facet_views
facets
form_facets
options
pagination
results
search
timestamp
views


In [12]:
r.json()['pagination']['total']

7179

In [13]:
type(r.text)

str

#### API Call question

We made a request to the loc.gov JSON API. Can you fill in the following & explain the missing elements? 

```
http://www.loc.gov/_______/?fo=_______&q=_______
```

What other items might you use after the `?`...

### Parsing the Data from the API

Now, we can get the response, let's save to a file. To do this, use the `json` module. 

In [14]:
import json

In [15]:
data = json.loads(r.text)

# what are the keys?
for element in data:
    print(element)

breadcrumbs
expert_resources
facet_trail
facet_views
facets
form_facets
options
pagination
results
search
timestamp
views


In [16]:
for item in data['results']:
    print(item)

{'access_restricted': False, 'aka': ['http://www.loc.gov/pictures/collection/hec/item/2016892679/', 'http://www.loc.gov/item/2016892679/', 'http://www.loc.gov/pictures/item/2016892679/', 'https://hdl.loc.gov/loc.pnp/hec.43433', 'http://www.loc.gov/resource/hec.43433/', 'http://lccn.loc.gov/2016892679'], 'campaigns': [], 'contributor': ['harris & ewing'], 'date': '1923-01-01', 'dates': ['1923'], 'description': ['1 negative : glass ; 4 x 5 in. or smaller'], 'digitized': True, 'extract_timestamp': '2021-09-01T22:39:14.884Z', 'group': ['hec', 'catalog-split-02', 'catalog', 'harris-ewing', 'main-catalog-split-02', 'main-catalog'], 'hassegments': False, 'id': 'http://www.loc.gov/item/2016892679/', 'image_url': ['https://tile.loc.gov/storage-services/service/pnp/hec/43400/43433_150px.jpg#h=116&w=150', 'https://tile.loc.gov/storage-services/service/pnp/hec/43400/43433t.gif#h=116&w=150', 'https://tile.loc.gov/storage-services/service/pnp/hec/43400/43433r.jpg#h=496&w=640', 'https://tile.loc.gov/

In [17]:
print(len(data['results']))

25


When compared with the html version here, notice that that page also has 25 results! 

See https://www.loc.gov/photos/?fa=online-format:image&q=kittens

Is it possible to extract each result into its own file? 

In [18]:
# block testing an extaction of each result into a separate file

data = json.loads(r.text)

#grab the images into a list
kittensList = data['results']
print(len(kittensList))

25


In [19]:
fname = 'kitten-result-'
format = '.json'
n = 0 

for item in kittensList:
    n = n + 1
    file = fname + str(n) + format
#    print(item)
    with open(file, 'w') as f:
        f.write(json.dumps(item))#, f, encoding='utf-8', sort_keys=True)
        print('wrote',file)
print('wrote',n,'files!')

wrote kitten-result-1.json
wrote kitten-result-2.json
wrote kitten-result-3.json
wrote kitten-result-4.json
wrote kitten-result-5.json
wrote kitten-result-6.json
wrote kitten-result-7.json
wrote kitten-result-8.json
wrote kitten-result-9.json
wrote kitten-result-10.json
wrote kitten-result-11.json
wrote kitten-result-12.json
wrote kitten-result-13.json
wrote kitten-result-14.json
wrote kitten-result-15.json
wrote kitten-result-16.json
wrote kitten-result-17.json
wrote kitten-result-18.json
wrote kitten-result-19.json
wrote kitten-result-20.json
wrote kitten-result-21.json
wrote kitten-result-22.json
wrote kitten-result-23.json
wrote kitten-result-24.json
wrote kitten-result-25.json
wrote 25 files!


How could we extract the image URLs?                       

In [20]:
for key in kittensList[0]:
    print(key)

access_restricted
aka
campaigns
contributor
date
dates
description
digitized
extract_timestamp
group
hassegments
id
image_url
index
item
language
location
location_country
mime_type
number
number_former_id
number_lccn
number_source_modified
online_format
original_format
partof
related
reproductions
resources
shelf_id
site
subject
timestamp
title
type
unrestricted
url


In [21]:
for kitten in kittensList:
    print(kitten['url'])

https://www.loc.gov/item/2016892679/
https://www.loc.gov/item/2017650796/
https://www.loc.gov/item/2013646722/
https://www.loc.gov/item/jukebox-668708/
https://www.loc.gov/item/2022653071/
https://www.loc.gov/item/2016796464/
https://www.loc.gov/item/2016816441/
https://www.loc.gov/item/2016817090/
https://www.loc.gov/item/2002697127/
https://www.loc.gov/item/89708607/
https://www.loc.gov/item/2002697126/
https://www.loc.gov/item/20002503/
https://www.loc.gov/item/2005681032/
https://www.loc.gov/item/2022652300/
https://www.loc.gov/item/2023835671/
https://www.loc.gov/item/2014717546/
https://www.loc.gov/item/2008660988/
https://www.loc.gov/item/2002706499/
https://www.loc.gov/item/2022653887/
https://www.loc.gov/item/afc9999005.24310/
https://www.loc.gov/item/afc9999005.27026/
https://www.loc.gov/item/jukebox-61618/
https://www.loc.gov/item/2002697129/
https://www.loc.gov/item/90708798/
https://www.loc.gov/item/07028973/


## Lab Questions

See the course activities page for the lab questions.