# Working with Data & Files II

[urllib3 doc](https://urllib3.readthedocs.io/en/stable/)

Recommended tutorial: [Shiffman's "Working with Data and APIs in JavaScript"](https://thecodingtrain.com/tracks/data-and-apis-in-javascript), [YT](https://www.youtube.com/playlist?list=PLRqwX-V7Uu6YxDKpFzf_2D84p0cyk4T7X)

In [None]:
import json
import urllib3

[List of Free APIs](https://github.com/public-apis/public-apis?tab=readme-ov-file)

[html response codes](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status):
- 200: OK, success
- 404: not found

## Random facts about dogs, anyone?!

[Dog facts API](https://dogapi.dog/), [docs](https://dogapi.dog/docs/api-v2)

In [None]:
# this API can take a `number` argument, note the `?` syntax
number_of_facts = 1
url_dogs = f"https://dogapi.dog/api/v2/facts?number={number_of_facts}"

# make a request
dogs_resp = urllib3.request("GET", url_dogs)

# `.status` tells us if our request was successful
print(f"Made a GET request to the Dogs API, response code? {dogs_resp.status}.")
print("---")

# if success, get `.data`, which is a binary string in the JSON format → turn it into an object
if dogs_resp.status == 200:
    # print(dogs_resp.data)
    dogs_json = json.loads(dogs_resp.data)
    # print(dogs_json)
    # examining the object, I can extract just the fact
    print(dogs_json["data"][0]["attributes"]["body"])
else:
    print(f"Status {dogs_resp.status} not successful, moving on...")

## ISS localisation

[doc](http://open-notify.org/Open-Notify-API/ISS-Location-Now/)

(Inspired by [this Shiffman video](https://www.youtube.com/watch?v=uxf0--uiX0I)...)

In [None]:
url_iss = "http://api.open-notify.org/iss-now.json"

iss_resp = urllib3.request("GET", url_iss)

print(f"Made a GET request to the ISS API, response code? {iss_resp.status}.")
print("---")

if iss_resp.status == 200:
    iss_json = json.loads(iss_resp.data)
    # print(iss_json)
    print(iss_json["iss_position"])
else:
    print(f"Status {iss_resp.status} not successful, moving on...")

## Extra: Scraping

[BeautifulSoup doc](https://beautiful-soup-4.readthedocs.io/en/latest/)  

Often programmers and hackers use programs to grab data from the Internet and use it in various ways. This is called "scraping" (and a program doing this is a "scraper"). Below is an example of how you would do this for one webpage. This requires two kinds of knowledge: Python, of course, but also how webpages are structured (`html` mostly, but also the other two main components, `css` and `JavaScript` – beyond the scope of this course). Don't worry about this too much – just note that running the following cells should create a folder called `wolf-rehfeldt`, and download a series of images in there for you – beautiful poems that could serve as an inspiration for your own work!

In [None]:
import os
import pathlib

from bs4 import BeautifulSoup

works_dir = pathlib.Path("wolf-rehfeldt")
img_dir = works_dir / "images"

# create directories for images and markdown files
works_dir.mkdir(exist_ok=True)
img_dir.mkdir(exist_ok=True)

# url of the site
url = "https://www.richardsaltoun.com/viewing-room/7-ruth-wolf-rehfeldt-letters/"

# set up urllib3 PoolManager
http = urllib3.PoolManager()

# fetch the page content (this is the **whole web page code**, all the html, as plain text)
response = http.request('GET', url)
soup = BeautifulSoup(response.data, 'html.parser')

In [None]:
# write the html into a file for examination
with (works_dir / "soup.html").open("w") as o:
    o.write(str(soup))

In [None]:
# convert string dict to dict
# https://stackoverflow.com/a/988251
import ast

# this is is to select the largest image format from the object on the web page
def get_largest_res(data):
    data = ast.literal_eval(data)
    return data[str(max([int(k) for k in data.keys()]))]

# after checking the html manually, found the class of the container with all the images
artworks_items = soup.select(".panel_type_6")[0]
# print(artworks_items)

# select all `img` inside it
image_divs = artworks_items.select("img")
# print(image_divs)

# select all legends inside it
legends = artworks_items.select(".content")
# print(legends)

for i, (img_div, legend) in enumerate(zip(image_divs, legends)):
    
    # print("-" * 40)
    # print(i)
    # print(img_div)
    # print(legend)
    
    # get the link to the pic in the largest resolution
    artwork_max_res_url = get_largest_res(img_div["data-responsive-src"])
    extension = os.path.splitext(artwork_max_res_url)[1]
    # print(artwork_max_res_url)
    # print(extension)

    # create a filename out of the legend
    image_filename = f"{legend.get_text(strip=True).replace('/','-').replace(', ', ',').replace(' ', '-')}{extension}"
    # print(image_filename)    
    
    # Download the image
    img_response = http.request('GET', artwork_max_res_url)
    if img_response.status == 200:
        print(f" - image downloaded, writing to {img_dir}/{image_filename}")
        with open(img_dir / image_filename, 'wb') as o:
            o.write(img_response.data)
    else:
        print(f" - image {image_filename}, url response status: {img_response.status}")