## HTTP in Python

*Date: October 7, 2024*

Activities to demonstrate HTTP in Python using the requests library.

In [2]:
import requests

## Defining a URL

Construct a basic Request - defining a URL

In [3]:
url = 'https://www.loc.gov/'
lccn_url = 'https://lccn.loc.gov/'

In [5]:
print(lccn_url)

https://lccn.loc.gov/


In [4]:
resource_list = ['resource/cph.3f05183/','resource/fsa.8d24709/','resource/highsm.64003/']

In [7]:
for resource in resource_list:
    r = requests.get(url + resource)
    print(r.url)

https://www.loc.gov/resource/cph.3f05183/
https://www.loc.gov/resource/fsa.8d24709/
https://www.loc.gov/resource/highsm.64003/


## Placing requests

Basically, use the HTTP verbs. The one that is akin to requesting a web page is ... `GET`

Once a request is placed, you can also ask for information about the response, including status code, headers, and the response content:

In [5]:
for resource in resource_list:
    r = requests.get(url + resource)
    print(r.url, r.status_code)

https://www.loc.gov/resource/cph.3f05183/ 200
https://www.loc.gov/resource/fsa.8d24709/ 200
https://www.loc.gov/resource/highsm.64003/ 200


In [8]:
for resource in resource_list:
    r = requests.get(url + resource)
    print(r.url, r.status_code, '\n', r.headers)

https://www.loc.gov/resource/cph.3f05183/ 200 
 {'Date': 'Mon, 07 Oct 2024 02:25:05 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Content-Length': '1335612', 'Connection': 'keep-alive', 'access-control-allow-origin': '*', 'referrer-policy': 'no-referrer-when-downgrade', 'strict-transport-security': 'max-age=3600; preload', 'x-content-type-options': 'nosniff', 'x-frame-options': 'sameorigin', 'etag': '"5e37d63134a4b421208621073554d45b"', 'expires': 'Tue, 08 Oct 2024 00:20:22 GMT', 'content-security-policy': "block-all-mixed-content;         default-src https://loc.gov/ https://*.loc.gov/ ;         media-src https://loc.gov/ https://*.loc.gov/              https://*.readspeaker.com/             https://*.arcgis.com/ https://*.arcgisonline.com/  https://webapps-cdn.esri.com/             blob:;         worker-src https://loc.gov/ https://*.loc.gov/              blob:;         font-src https://loc.gov/ https://*.loc.gov/              https://*.arcgis.com/ https://*.arcgisonline.com/  h

You can look at the headers as a dictionary:

In [16]:
r = requests.get(url + resource_list[0])
print(r.url, r.status_code)
headers = r.headers
for header in headers:
    print(header, ':', headers[header])

https://www.loc.gov/resource/cph.3f05183/ 200
Date : Mon, 07 Oct 2024 02:29:14 GMT
Content-Type : text/html; charset=UTF-8
Content-Length : 1335612
Connection : keep-alive
access-control-allow-origin : *
referrer-policy : no-referrer-when-downgrade
strict-transport-security : max-age=3600; preload
x-content-type-options : nosniff
x-frame-options : sameorigin
etag : "5e37d63134a4b421208621073554d45b"
expires : Tue, 08 Oct 2024 00:20:22 GMT
content-security-policy : block-all-mixed-content;         default-src https://loc.gov/ https://*.loc.gov/ ;         media-src https://loc.gov/ https://*.loc.gov/              https://*.readspeaker.com/             https://*.arcgis.com/ https://*.arcgisonline.com/  https://webapps-cdn.esri.com/             blob:;         worker-src https://loc.gov/ https://*.loc.gov/              blob:;         font-src https://loc.gov/ https://*.loc.gov/              https://*.arcgis.com/ https://*.arcgisonline.com/  https://webapps-cdn.esri.com/             https://

Or, look at the content of the response:

In [17]:
r = requests.get(url + resource_list[0])
print(r.url, r.status_code)
r.text[:500]

https://www.loc.gov/resource/cph.3f05183/ 200


'<!DOCTYPE html>\n\n\n<html lang="en" class="no-js" prefix="lc: http://loc.gov/#">\n<head>\n\n    \n<meta charset="utf-8">\n<meta name="viewport" content="width=device-width,initial-scale=1"/>\n<meta http-equiv="X-UA-Compatible" content="IE=edge">\n<meta name="version" content="$Revision$"/>\n<meta name="msvalidate.01" content="5C89FB9D99590AB2F55BD95C3A59BD81"/>\n<link title="schema(DC)" rel="schema.dc" href="http://purl.org/dc/elements/1.1/"/>\n<meta name="dc.language" content="eng" />\n<meta name="dc.source'

## Adding variables (i.e., parameters) to a request

The pattern here is, adding `key=value` after a `?`...

For example, we can look for the LCCNs if we ask for a JSON response ...

In [18]:
for resource in resource_list:
    r = requests.get(url + resource)
    print(r.url, r.status_code)

https://www.loc.gov/resource/cph.3f05183/ 200
https://www.loc.gov/resource/fsa.8d24709/ 200
https://www.loc.gov/resource/highsm.64003/ 200


In [19]:
params = {'fo': 'json'}
for resource in resource_list:
    r = requests.get(url + resource, params=params)
    print(r.url, r.status_code)

https://www.loc.gov/resource/cph.3f05183/?fo=json 200
https://www.loc.gov/resource/fsa.8d24709/?fo=json 200
https://www.loc.gov/resource/highsm.64003/?fo=json 200


In [43]:
params = {'fo': 'json'}
for resource in resource_list:
    r = requests.get(url + resource, params=params)
    print(r.url, r.status_code)
    r_data = r.json()
    lccn = r_data['item']['library_of_congress_control_number']
    print(lccn)

https://www.loc.gov/resource/cph.3f05183/?fo=json 200
98508155
https://www.loc.gov/resource/fsa.8d24709/?fo=json 200
2017843202
https://www.loc.gov/resource/highsm.64003/?fo=json 200
2020722343


In [48]:
params = {'fo':'json'}
for resource in resource_list:
    r = requests.get(url + resource, params=params)
    print(r.url, r.status_code)
    print('Access JSON data:')
    data = r.json()
    print('  LCCN is:')
    print('    ',data['item']['library_of_congress_control_number'])

https://www.loc.gov/resource/cph.3f05183/?fo=json 200
Access JSON data:
  LCCN is:
     98508155
https://www.loc.gov/resource/fsa.8d24709/?fo=json 200
Access JSON data:
  LCCN is:
     2017843202
https://www.loc.gov/resource/highsm.64003/?fo=json 200
Access JSON data:
  LCCN is:
     2020722343


## Try some different URI patterns

How, for example, would you find the DublinCore metadata for these given items?

We know, for example, that the LCCN permalink, plus, `/dc` should return this. So ...

In [44]:
params = {'fo': 'json'}
for resource in resource_list:
    r = requests.get(url + resource, params=params)
    print(r.url, r.status_code)
    r_data = r.json()
    lccn = r_data['item']['library_of_congress_control_number']
    dc_url = lccn_url + lccn + '/dc'
    print(dc_url)
    dublincore = requests.get(dc_url)
    print(dublincore.text)

https://www.loc.gov/resource/cph.3f05183/?fo=json 200
https://lccn.loc.gov/98508155/dc
<?xml version="1.0" encoding="UTF-8"?><srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-schema" xmlns:zs="http://docs.oasis-open.org/ns/search-ws/sruResponse" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="info:srw/schema/1/dc-schema http://www.loc.gov/standards/sru/resources/dc-schema.xsd">
  <title xmlns="http://purl.org/dc/elements/1.1/">For greater knowledge on more subjects use your library more often [graphic].</title>
  <creator xmlns="http://purl.org/dc/elements/1.1/">Federal Art Project, sponsor.</creator>
  <type xmlns="http://purl.org/dc/elements/1.1/">still image</type>
  <type xmlns="http://purl.org/dc/elements/1.1/">Posters 1930-1950. gmgpc</type>
  <type xmlns="http://purl.org/dc/elements/1.1/">Screen prints Color 1930-1950. gmgpc</type>
  <publisher xmlns="http://purl.org/dc/elements/1.1/">Chicago : Illinois WPA Art Project,</publisher>
  <date xmlns="http://pur

Similarly, you can request `mods`:

In [45]:
params = {'fo': 'json'}
for resource in resource_list:
    r = requests.get(url + resource, params=params)
    print(r.url, r.status_code)
    r_data = r.json()
    lccn = r_data['item']['library_of_congress_control_number']
    mods_url = lccn_url + lccn + '/mods'
    print(mods_url)
    mods_rec = requests.get(mods_url)
    print(mods_rec.text)

https://www.loc.gov/resource/cph.3f05183/?fo=json 200
https://lccn.loc.gov/98508155/mods
<?xml version="1.0" encoding="UTF-8"?><error>System Temporarily Unavailable. <a href="https://lccn.loc.gov/98508155">Retry</a>
</error>

https://www.loc.gov/resource/fsa.8d24709/?fo=json 200
https://lccn.loc.gov/2017843202/mods
<?xml version="1.0" encoding="UTF-8"?><error>System Temporarily Unavailable. <a href="https://lccn.loc.gov/2017843202">Retry</a>
</error>

https://www.loc.gov/resource/highsm.64003/?fo=json 200
https://lccn.loc.gov/2020722343/mods
<?xml version="1.0" encoding="UTF-8"?><error>System Temporarily Unavailable. <a href="https://lccn.loc.gov/2020722343">Retry</a>
</error>

