# Example: the MET API
This session examines how to use the MET API in practice to gather and explore data. 

Through the session, we will:
- learn the components of an API request
- practice with a few handy functions for examining request data
    - `type()` and `dir()`
- learn to create more requests with that data
    - `f-strings`
    - requests on requests!
- sort through request data using loops and conditionals
- transforming our data into a dataframe
    - for doing simple data analysis

In [None]:
import requests

In [None]:
# the structure of our request: base_url, path, query

base_url = "https://collectionapi.metmuseum.org"
path = "/public/collection/v1/search"
query = "?q=nude"

## the anatomy of an API request:
- the *root* which consists of the base URL. 
   - https://collectionapi.metmuseum.org/
- the *path* which consists of a directory structure (file structure) where the data is held.
   - /public/collection/v1/objects
   - /public/collection/v1/departments 
   - /public/collection/v1/search
- the *query parameter* or the *endpoint* which is the specific request.
   - ?q=SearchTerm
   - ?q=cypress
   - ?q=van+gogh
  

To read more about the MET API, see here: https://metmuseum.github.io/

In [None]:
# the request, saved to a variable

nudes = requests.get(base_url + path + query)

## introspection: two functions: `type()` and `dir()`
Using `type()` and `dir()` to better understand our response data. The end goal is to sift through the data to discover interesting things--exploratory analysis. 

In [None]:
# what type of object do we have?

type(nudes)

In [None]:
# what can we do with this object?
# spend some time exploring the different methods

dir(nudes)

In [None]:
nudes.raw

In [None]:
nudes.elapsed

In [None]:
nudes.ok

In [None]:
# what do these numbers mean?
nudes.content

## .json() to parse our response object

In [None]:
# why do we need to add parenthesis? 
parsed = nudes.json()

In [None]:
type(parsed)

what is a `dict`? 2D data. We will look more closely soon.

In [None]:
dir(parsed)

In [None]:
parsed.keys()

In [None]:
type(parsed['objectIDs'])

In [None]:
# okay let's check the first one. What do you think it is? Look again at the docs.

parsed['objectIDs'][0]

## passing variables into request object
Goal: to create a new request with data from the original request

In [None]:
# the code to access our first objectID is clunky. Let's save to a variable.

first = parsed['objectIDs'][0]

In [None]:
# now we past the URL for "object" (rather than search)
# and we do it within an f-string, putting the variable "first" 
# into curly brackets at the end

url = f"https://collectionapi.metmuseum.org/public/collection/v1/objects/{first}"

In [None]:
# running the request

first_object = requests.get(url)

In [None]:
# checking the resulting object

first_object

In [None]:
dir(first_object)

look at the json for the first object

In [None]:
# all of the data about this object
# look at the URL! 

first_object.json()

In [None]:
# now we can save this first object to its own variable. 
# will make it easier to do more things to it!

first_obj = first_object.json()

## accessing items from a `dict` by keys

In [None]:
type(first_obj)

In [None]:
# what is a `dict`? 
# key:value pairs

instructor = {
    'name': ['filipa calado', 'patrick smyth', 'stephen zweibel'],
    'age': [35, 37, 38],
    'degree': ['literature', 'literature', 'library science'],
    'job': ['digital scholarship specialist', 'startup', 'digital scholarship librarian']
}

In [None]:
# see the keys

instructor.keys()

In [None]:
# access items through brackets containing keys

instructor['name']

### just like pandas

In [None]:
import pandas as pd
df = pd.DataFrame(instructor)

In [None]:
df

In [None]:
# let's try with the first object

first_obj.keys()

## individual practice: take a few minutes to inspect the dataset by using different keys

In [None]:
# no result!

first_obj['artistGender']

In [None]:
first_obj['department']

In [None]:
first_obj['culture']

## looping through our dataset

Now let's go back to the original list, and pull out all the info for the results. 

In [None]:
parsed['objectIDs']

In [None]:
ids = parsed['objectIDs']

In [None]:
type(ids)

In [None]:
len(ids)

## remember loops?

In [None]:
# let's just try with the first ten items

for item in ids[:10]:
    print(item)

In [None]:
# now let's do the first fifty

first_fifty = []
for item in ids[:50]:
    # passing the objectID variable into the URL
    url = f'https://collectionapi.metmuseum.org/public/collection/v1/objects/{item}'
    # grabbing our response for that object
    response = requests.get(url)
    # parsing our response with json
    parsed = response.json()
    # appending the response to our new list
    first_fifty.append(parsed)

In [None]:
# because we already know the first, let's check the last item

first_fifty[-1]

now guess what type of data we have for `first_fifty`?

In [None]:
type(first_fifty)

let's look at some of the values

In [None]:
# what does this error mean?

for item in first_fifty:
    print(item['title'])

## looping with conditions

In [None]:
for item in first_fifty:
    title = item.get('artistDisplayName')
    print(title)

In [None]:
# combine get() with conditional to get rid of the None's

for item in first_fifty:
    if item.get('artistDisplayName'):
        print(item['artistDisplayName'])

In [None]:
for item in first_fifty:
    title = item.get('artistGender')
    print(title)

In [None]:
# why do you think we see only female?

for item in first_fifty:
    if item.get('artistGender'):
        print(item['artistGender'])

In [None]:
# use try statement to pass over any empty sets

for i in first_fifty:
    try:
        print(i['title'])
    except KeyError:
        continue

In [None]:
titles = []
for item in first_fifty:
    try:
        titles.append(item['title'])
    except KeyError:
        continue

In [None]:
titles

## sorting our data

In [None]:
# look again at the first object
first_fifty[0]

In [None]:
# this syntax allows us to see only the positive values

for item in first_fifty:
    if item.get('artistGender'):
        print(item['artistGender'])

In [None]:
# but by saving the variable, we can also get none values

for item in first_fifty:
    gender = item.get('artistGender')
    print(gender)

In [None]:
# let's get a bunch of this data into lists

titles = []
names = []
genders = []
depts = []
countries = []
urls = []

for item in first_fifty:
    title = item.get('artistGender')
    titles.append(title)
    name = item.get('artistDisplayName')
    names.append(name)
    gender = item.get('artistGender')
    genders.append(gender)
    dept = item.get('department')
    depts.append(dept)
    country = item.get('country')
    countries.append(country)
    url = item.get('objectURL')
    urls.append(url)

In [None]:
countries[:10]

In [None]:
depts[:10]

In [None]:
urls[:10]

## data anlaysis with pandas

In [None]:
import pandas as pd

In [None]:
df = pd.DataFrame({
    'title': titles,
    'name': names,
    'gender': genders,
    'department': depts,
    'country': countries,
    'link': urls
})

In [None]:
df

In [None]:
df.value_counts('department')

In [None]:
df.department.value_counts()[:20].plot(kind = 'barh')

In [None]:
df.department.value_counts()[:10].plot(kind = 'pie')