# Working with APIs in Python

Making API requests in Python can be really simple. There's a low-level module called `urllib` that can also make the kinds of web requests that we want, but it's not as friendly as the `requests` module, which we'll be using.

In [1]:
import requests

## Authentication

You'll have to authenticate each request to the Harvard Art Museum API with an API key. Other APIs may require different kinds of authentication (sometimes very complicated auth! Look for libraries at that point), but HAM has some pretty simple authentication, which makes things easy for us.

You can sign up for a key [here](https://www.harvardartmuseums.org/collections/api). Documentation for the entire API is hosted on GitHub and can be viewed [here](https://github.com/harvardartmuseums/api-docs).

In [3]:
APIKEY = "622b6240-e14a-11e8-9234-6f5a1db33697" # Enter your API key here

## Basic request

We're going to start off with a basic request to the API. This API, like many others, has a variety of endpoints, each with their own url, slightly modified from a base url. We'll worry about the general case in a bit, for now let's look at a basic API request.

In this example, we'll re-create the first example in the [Object endpoint documentation](https://github.com/harvardartmuseums/api-docs/blob/master/sections/object.md), which will give each of you the records for 10 objects that have never been viewed online in the museum's collections.

In [45]:
url = "https://api.harvardartmuseums.org/object"
parameters = {
    "q":"verificationlevel:4",
    "size":10,
    "apikey":APIKEY,
    "sort":'totalpageviews'
}
R = requests.get(url,params=parameters)
R.json()

{'info': {'next': 'https://api.harvardartmuseums.org/object?q=verificationlevel%3A4&size=10&apikey=622b6240-e14a-11e8-9234-6f5a1db33697&sort=totalpageviews&page=2',
  'page': 1,
  'pages': 1582,
  'totalrecords': 15817,
  'totalrecordsperquery': 10},
 'records': [{'accessionmethod': 'Bequest',
   'accessionyear': 1936,
   'accesslevel': 1,
   'century': '19th-20th century',
   'classification': 'Paintings',
   'classificationid': 26,
   'colorcount': 0,
   'commentary': None,
   'contact': 'am_europeanamerican@harvard.edu',
   'contextualtextcount': 0,
   'copyright': None,
   'creditline': 'Harvard Art Museums/Fogg Museum, Bequest of Denman W. Ross, Class of 1875',
   'culture': 'American',
   'datebegin': 0,
   'dated': '19th-20th century',
   'dateend': 0,
   'dateoffirstpageview': None,
   'dateoflastpageview': None,
   'department': 'Department of American Paintings, Sculpture & Decorative Arts',
   'description': None,
   'dimensions': '35.2 x 25.4 cm (13 7/8 x 10 in.)\r\nframed:

In [46]:
from IPython.core.display import display, HTML

records = R.json()['records']
htmlOutput = "<table><tr><th>Views</th><th>Title</th></tr>"
for record in records:
    htmlOutput += f"<tr><td>{record['totalpageviews']}</td><td><a href=\"{record['url']}\" target='blank'>{record['title']}</a></td></tr>"
htmlOutput += "</table>"
display(HTML(htmlOutput))

Views,Title
0,Landscape with Church and Figures
0,Variations on a Theme (No. 144)
0,Variations on a Theme (No. 113)
0,Variations on a Theme (No. 137)
0,Variations on a Theme (No. 115)
0,Variations on a Theme (No. 112)
0,Variations on a Theme (No. 107)
0,Variations on a Theme (No. 133)
0,Variations on a Theme (No. 109)
0,Mexican Street


### Refresher on Dictionaries

Python dictionaries are sets of key / value pairs, where a value can be accessed by its key. You're essentially naming a value in a container, so you can easily call it up later. In the example above, `"size"` is a key which returns the value `10`.

Dictionaries have very fast lookups - usually `O(1)` - so you can get a value from its key very quickly, no matter how large the dictionary is. However, they are also unordered, so if you iterate through all of the key / value pairs in the dictionary, there's no guarantee that they'll be in the same order. If you're coming from Java, you may know dictionaries as Maps (HashMaps, etc). In Javascript, the closest equivalent is JSON (Javascript Object Notation). The general abstract data type is also referred to as an associative array.

We're going to be looking up data in dictionaries as well as setting key-value pairs, so here's a quick refresher on the syntax:

In [8]:
parameters['q']

'totalpageviews:0'

In [10]:
x = 'apikey'
parameters[x] # This also works when we've set the key string to another variable

'622b6240-e14a-11e8-9234-6f5a1db33697'

In [11]:
parameters['q'] = "totalpageviews:1" # You can also set the value of a key like you would a variable

## Making a Request

The request syntax is so simple, you might have missed it. Let's query again for objects with only one pageview, and take a closer look.

In [12]:
R = requests.get(url,params=parameters)

### Formatted parameters

That request has created a request object, which contains not only the data that we get from the Harvard Art Museums, but information on the request we sent, like the URL that it used. Notice that requests has turned our query parameter dictionary into a GET request at the end of our URL.

If you've been working with API requests or web scraping before, you might be used to seeing URLs get constructed like this:

```python
url = "https://api.harvardartmuseums.org/object?q=" + query + "&apikey=" + apikey
```

If you have, I'm sure you'll appreciate how much simpler this is, especially when dealing with more query parameters.

In [13]:
R.url

'https://api.harvardartmuseums.org/object?q=totalpageviews%3A1&size=10&apikey=622b6240-e14a-11e8-9234-6f5a1db33697&sort=random%3A8675309'

### Taking a look at the results

Request objects have a built-in method, `.json()`, which converts a JSON file received as a response to a request from a string of text that happens to be in this data format into Python native data structures, like lists, dictionaries, numbers and strings. We can use this method to see a dictionary representation of what we've gotten from the API request.

In [21]:
R.json()

{'info': {'next': 'https://api.harvardartmuseums.org/object?q=totalpageviews%3A1&size=10&apikey=622b6240-e14a-11e8-9234-6f5a1db33697&sort=random%3A8675309&page=2',
  'page': 1,
  'pages': 2352,
  'totalrecords': 23512,
  'totalrecordsperquery': 10},
 'records': [{'accessionmethod': 'Gift',
   'accessionyear': None,
   'accesslevel': 1,
   'century': '18th century',
   'classification': 'Prints',
   'classificationid': 23,
   'colorcount': 7,
   'colors': [{'color': '#7d7d64',
     'css3': '#808080',
     'hue': 'Yellow',
     'percent': 0.18415204678363,
     'spectrum': '#6cbd45'},
    {'color': '#e1e1c8',
     'css3': '#dcdcdc',
     'hue': 'Green',
     'percent': 0.18391812865497,
     'spectrum': '#e9715f'},
    {'color': '#64644b',
     'css3': '#696969',
     'hue': 'Green',
     'percent': 0.17479532163743,
     'spectrum': '#59ba4a'},
    {'color': '#96967d',
     'css3': '#808080',
     'hue': 'Green',
     'percent': 0.16625730994152,
     'spectrum': '#8e5ea7'},
    {'color

In [23]:
print(R.status_code)

200


In [34]:
R.json()['info']

{'next': 'https://api.harvardartmuseums.org/object?q=totalpageviews%3A1&size=10&apikey=622b6240-e14a-11e8-9234-6f5a1db33697&sort=random%3A8675309&page=2',
 'page': 1,
 'pages': 2352,
 'totalrecords': 23512,
 'totalrecordsperquery': 10}

In [32]:
set(R.json()['records'][0].keys())

{'accessionmethod',
 'accessionyear',
 'accesslevel',
 'century',
 'classification',
 'classificationid',
 'colorcount',
 'colors',
 'commentary',
 'contact',
 'contextualtextcount',
 'copyright',
 'creditline',
 'culture',
 'datebegin',
 'dated',
 'dateend',
 'dateoffirstpageview',
 'dateoflastpageview',
 'department',
 'description',
 'dimensions',
 'division',
 'edition',
 'exhibitioncount',
 'groupcount',
 'id',
 'imagecount',
 'imagepermissionlevel',
 'images',
 'labeltext',
 'lastupdate',
 'markscount',
 'mediacount',
 'medium',
 'objectid',
 'objectnumber',
 'people',
 'peoplecount',
 'period',
 'periodid',
 'primaryimageurl',
 'provenance',
 'publicationcount',
 'rank',
 'relatedcount',
 'seeAlso',
 'signed',
 'standardreferencenumber',
 'state',
 'style',
 'technique',
 'techniqueid',
 'title',
 'titlescount',
 'totalpageviews',
 'totaluniquepageviews',
 'url',
 'verificationlevel',
 'verificationleveldescription',
 'worktypes'}

## Changing our request

Let's say we're not interested in the most obscure parts of the collection, but rather in the most popular parts of the collection. There are a few ways we might go about doing this. One way might be to sort our search results by `totalpageviews`, and see what the top 10 are.

To do that, we can go back to the [Object API documentation](https://github.com/harvardartmuseums/api-docs/blob/master/sections/object.md) and look for hints about what we might be able to do.

In [35]:
parameters = {
    "apikey":APIKEY,
    "sort":"totalpageviews",
    "sortorder":"desc",
    "size":10
} # Add your own query parameters here
R = requests.get(url,params=parameters)
R.json()

{'info': {'next': 'https://api.harvardartmuseums.org/object?apikey=622b6240-e14a-11e8-9234-6f5a1db33697&sort=totalpageviews&sortorder=desc&size=10&page=2',
  'page': 1,
  'pages': 23361,
  'totalrecords': 233609,
  'totalrecordsperquery': 10},
 'records': [{'accessionmethod': 'Bequest',
   'accessionyear': 1951,
   'accesslevel': 1,
   'century': '19th century',
   'classification': 'Paintings',
   'classificationid': 26,
   'colorcount': 10,
   'colors': [{'color': '#64af7d',
     'css3': '#5f9ea0',
     'hue': 'Green',
     'percent': 0.2979781420765,
     'spectrum': '#4fb94f'},
    {'color': '#64c896',
     'css3': '#66cdaa',
     'hue': 'Green',
     'percent': 0.21289617486339,
     'spectrum': '#47b853'},
    {'color': '#323219',
     'css3': '#2f4f4f',
     'hue': 'Brown',
     'percent': 0.19814207650273,
     'spectrum': '#3db657'},
    {'color': '#7d7d4b',
     'css3': '#696969',
     'hue': 'Green',
     'percent': 0.056775956284153,
     'spectrum': '#6cbd45'},
    {'color

### Looking at the results

Often, you'll want to look at some specific aspect of the data you're getting. Since the API returns everything, you'll have to format the output in some friendly, readable format.

We're being pretty low-level with the text formatting here, and one important key to understanding this bit is that "\t" means "tab", so that you can insert that character, which normally does something else.

Feel free to play around with this cell to format it more to your liking. The string `format()` method allows you to interpolate variables or expressions into a string. You use curly braces (`{}`) in the string where you'd like to substitute variables; you can also use named arguments (`"Test {foo}".format(foo="bar")` prints "Test bar"). In this next cell, we'll iterate through the results and print them out in a nicer format.

In [36]:
records = R.json()['records']
print("views\tartwork")
print()
for record in records:
    print(record['totalpageviews'] + "\t" + record['title'])
    print(f"{record['totalpageviews']}\t{record['title']}")
    # `.format` puts its arguments sequentially in the string calling it wherever there are {} pairs
    # It does a lot more than that, with more advanced documentation here: 
    # https://docs.python.org/3.4/library/string.html#id1

views	artwork

30662	Self-Portrait Dedicated to Paul Gauguin
22823	The Gare Saint-Lazare: Arrival of a Train
16562	Bahram Gur Fights the Horned Wolf (painting, verso; text, recto), illustrated folio from a manuscript of the Great Ilkhanid Shahnama (Book of Kings)
14965	Odalisque, Slave, and Eunuch
13478	Jeanne-Antoinette Poisson, Marquise de Pompadour
13061	A Mother and Child and Four Studies of Her Right Hand, 1904; verso:  Self-Portrait Standing, 1903
11909	Red Boats, Argenteuil
10624	Court of Gayumars (painting, recto; text, verso), folio from a manuscript of the Shahnama by Firdawsi
8781	Light Prop for an Electric Stage (Light-Space Modulator)
8770	Self-Portrait in Tuxedo


The top result from this query is a Van Gogh painted titled "Self-Portrait Dedicated to Paul Gauguin." You can grab just the first object by accessing the records list (which is indexed from 0):

In [38]:
topResult = R.json()['records'][0]
topResult

{'accessionmethod': 'Bequest',
 'accessionyear': 1951,
 'accesslevel': 1,
 'century': '19th century',
 'classification': 'Paintings',
 'classificationid': 26,
 'colorcount': 10,
 'colors': [{'color': '#64af7d',
   'css3': '#5f9ea0',
   'hue': 'Green',
   'percent': 0.2979781420765,
   'spectrum': '#4fb94f'},
  {'color': '#64c896',
   'css3': '#66cdaa',
   'hue': 'Green',
   'percent': 0.21289617486339,
   'spectrum': '#47b853'},
  {'color': '#323219',
   'css3': '#2f4f4f',
   'hue': 'Brown',
   'percent': 0.19814207650273,
   'spectrum': '#3db657'},
  {'color': '#7d7d4b',
   'css3': '#696969',
   'hue': 'Green',
   'percent': 0.056775956284153,
   'spectrum': '#6cbd45'},
  {'color': '#969664',
   'css3': '#808080',
   'hue': 'Green',
   'percent': 0.043715846994536,
   'spectrum': '#84c441'},
  {'color': '#afaf7d',
   'css3': '#bdb76b',
   'hue': 'Green',
   'percent': 0.042622950819672,
   'spectrum': '#9ecb3b'},
  {'color': '#4b9664',
   'css3': '#2e8b57',
   'hue': 'Green',
   'perc

You can easily access properties from the image record:

In [40]:
topResult['verificationlevel']

4

### Exercise
- Try using the `person` endpoint to search for information about Van Gogh. Get his `id` number.
- Try displaying all HAM works by Van Gogh using that `id`. Filter your results to only include records with an image associated.

In [47]:
url

'https://api.harvardartmuseums.org/object'

In [49]:
url = "https://api.harvardartmuseums.org/person"
parameters = {
    "apikey":APIKEY,
    "q":"displayname:'van AND Gogh' AND gender:male"
} # Add your own query parameters here
R = requests.get(url,params=parameters)
R.json()['records'][0]['id']

22730

In [52]:
url = "https://api.harvardartmuseums.org/object"
parameters = {
    "apikey":APIKEY,
    "person":"22730",
    'size':100
} # Add your own query parameters here
R = requests.get(url,params=parameters)
R.json()

{'info': {'page': 1,
  'pages': 1,
  'totalrecords': 24,
  'totalrecordsperquery': 100},
 'records': [{'accessionmethod': 'Bequest',
   'accessionyear': 1934,
   'accesslevel': 1,
   'century': '19th century',
   'classification': 'Paintings',
   'classificationid': 26,
   'colorcount': 8,
   'colors': [{'color': '#967d64',
     'css3': '#808080',
     'hue': 'Brown',
     'percent': 0.33080459770115,
     'spectrum': '#b65590'},
    {'color': '#7d644b',
     'css3': '#696969',
     'hue': 'Yellow',
     'percent': 0.1932183908046,
     'spectrum': '#b25593'},
    {'color': '#4b4b32',
     'css3': '#556b2f',
     'hue': 'Green',
     'percent': 0.15729885057471,
     'spectrum': '#4ab851'},
    {'color': '#644b4b',
     'css3': '#696969',
     'hue': 'Brown',
     'percent': 0.14574712643678,
     'spectrum': '#8362aa'},
    {'color': '#af9664',
     'css3': '#bdb76b',
     'hue': 'Yellow',
     'percent': 0.085574712643678,
     'spectrum': '#e9715f'},
    {'color': '#c8af7d',
     'c

## More endpoints to love

That exercise just brought in the `person` endpoint, but you'll notice that there are in fact many endpoints that we may want to query.

A note on terminology: an API endpoint is a one place that you can go to ask specific questions about a certain part of a dataset or service. Many APIs, especially commercial APIs, contain many, many endpoints, to facilitate all sorts of different activity on a platform.

For example, you can take a look at the [reddit API documentation](https://www.reddit.com/dev/api/) (which we won't be using, this is just an example), to see all of the different endpoints that an application might need to serve as an alternative front end for reddit. 

Endpoints on the same API are likely to behave similarly, but they will all serve different purposes. Looking at our HAM endpoints, it looks like they all follow the same basic formulation: `https://api.harvardartmuseums.org/RESOURCE_TYPE`. We can use this to our advantage, and create a function to query any endpoint easily.

In [53]:
def ham_query(apikey, endpoint, **kwargs):
    """Sends kwargs to the specified endpoint, using apikey for authentication"""
    params = kwargs
    params['apikey'] = apikey
    url = f"https://api.harvardartmuseums.org/{endpoint}"
    R = requests.get(url,params=params)
    return R

In [54]:
response = ham_query(APIKEY, "gallery", floor=2)

In [55]:
response.json()

{'info': {'next': 'https://api.harvardartmuseums.org/gallery?floor=2&apikey=622b6240-e14a-11e8-9234-6f5a1db33697&page=2',
  'page': 1,
  'pages': 3,
  'totalrecords': 24,
  'totalrecordsperquery': 10},
 'records': [{'floor': 2,
   'galleryid': 2340,
   'gallerynumber': '2340',
   'id': 2340,
   'labeltext': 'For centuries, silver was among the most revered and highly valued materials in Britain. Shipped across the Atlantic from the vast mines of Central America, silver was used to craft a variety of sacred and secular objects, from cups and salvers to medals and coins. Between 1600 and 1850, silver objects also figured prominently in a wide range of religious and social rituals, from the communion service to the taking of tea.\r\n\r\n\r\nThis gallery displays pieces of silver crafted for four of these rituals, each of which required specific forms and styles of ornamentation. Ritual objects from the religious altar occupy the left side of the cabinet. The center section presents elabor

### Boy, that's convenient!

That function works because Python has this neat ability to take arbitrary arguments in functions, if you tell it to. Essentially, there are two special arguments in function definitions: `*args` and `**kwargs`. These make available `args` and `kwargs` objects, respectively, in your function. `args` is a list, and `kwargs` is a dictionary. This makes it so that you don't have to specify all of the arguments your function can take, you can just give it general rules for lists or key pairs of data as input.

You might be wondering why you wouldn't just use a dictionary or list instead of those arguments. In our case, it's mostly a stylistic choice, and one that saves us a couple of key strokes.

#### Try out some other endpoints!

In [56]:
# Here's an example: all of the current exhibits with their begin and end dates
response = ham_query(APIKEY, "exhibition", status="current", size=100, sort="chronological")
current_exhibits = response.json()['records']
current_exhibits
print()
for exhibit in current_exhibits:
    print(f"({exhibit['begindate']} to {exhibit['enddate']}) {exhibit['title']}")


(2016-09-07 to 2020-09-07) Davis Museum Permanent Gallery reinstallation
(2018-11-17 to 2021-11-14) Clay—Modeling African Design
(2019-03-09 to 2021-01-01) Charlotte Posenenske: A Retrospective
(2019-05-24 to 2020-01-26) Gauguin: Portraits
(2019-05-26 to 2020-01-12) Manet and Modern Beauty
(2019-07-13 to 2020-02-23) Hyman Bloom: A Matter of Life and Death
(2019-08-31 to 2019-12-01) Through a Glass, Darkly: Allegory and Faith in Netherlandish Prints from Lucas van Leyden to Rembrandt
(2019-08-31 to 2020-01-05) Woven Interiors: Early Medieval Textiles of the Eastern Mediterranean
(2019-08-31 to 2020-01-05) Winslow Homer: Eyewitness
(2019-08-31 to 2020-01-05) Early Christian Africa: Arts of Transformation
(2019-08-31 to 2020-01-05) Critical Printing
(2019-09-04 to 2019-12-14) Dharma and Pūnya: Buddhist Ritual Art of Nepal
(2019-09-05 to 2019-12-08) Unto This Last: Two Hundred Years of John Ruskin
(2019-09-05 to 2020-01-12) In a Cloud, in a Wall, in a Chair: Modernists in Mexico at Midcen

### Endpoint Exercises

[HAM API Documentation](https://github.com/harvardartmuseums/api-docs)

- Get a list of all the medium types in the museum
- How many levels of mediums are there?
- Choose a medium from the most specific (highest numerical value) level. Note the medium id. Create a new query to the object endpoint, using the medium id as a filter. How many objects are there created from this medium?
- Choose a level 2 medium instead. Create a new query to the object endpoint, again filtering by this medium id. Print out the medium types of the returned records. What do you notice about the types?
- BONUS: reorganize your list of medium objects into a nested structure so that child media are accessed through a list under their parents. Print out the list like so:
- Metal
    - Pb
    - potin
    - ...
    - copper alloy 
        - copper-antimony-arsenic alloy
        - copper-iron alloy
        - copper-tin-antimony-arsenic alloy
        - leaded copper-tin-antimony alloy
        - etc ...

In [68]:
# How many levels of mediums are there?
mediums = ham_query(APIKEY, 'medium', size=100)
levels = [r['level'] for r in mediums.json()['records']]
set(levels)

{1, 2, 3}

In [69]:
l3s = ham_query(APIKEY,'medium',size=100,q="level:3").json()

In [70]:
l3s

{'info': {'page': 1,
  'pages': 1,
  'totalrecords': 18,
  'totalrecordsperquery': 100},
 'records': [{'haschildren': 0,
   'id': 2040159,
   'lastupdate': '2019-11-06T05:25:12-0500',
   'level': 3,
   'mediumid': 2040159,
   'name': 'leaded arsenical copper',
   'objectcount': 1,
   'parentmediumid': 2040148,
   'pathforward': 'Metal\\copper alloy\\'},
  {'haschildren': 0,
   'id': 2040162,
   'lastupdate': '2019-11-06T05:25:12-0500',
   'level': 3,
   'mediumid': 2040162,
   'name': 'leaded copper-antimony-arsenic alloy',
   'objectcount': 1,
   'parentmediumid': 2040148,
   'pathforward': 'Metal\\copper alloy\\'},
  {'haschildren': 0,
   'id': 2040153,
   'lastupdate': '2019-11-06T05:25:12-0500',
   'level': 3,
   'mediumid': 2040153,
   'name': 'copper-antimony-arsenic alloy',
   'objectcount': 1,
   'parentmediumid': 2040148,
   'pathforward': 'Metal\\copper alloy\\'},
  {'haschildren': 0,
   'id': 2040154,
   'lastupdate': '2019-11-06T05:25:12-0500',
   'level': 3,
   'mediumid':

In [71]:
l3_id = l3s['records'][0]['id']
l3_id

2040159

In [72]:
ham_query(APIKEY,'object',medium=l3_id).json()

{'info': {'page': 1,
  'pages': 1,
  'totalrecords': 1,
  'totalrecordsperquery': 10},
 'records': [{'accessionmethod': 'Gift',
   'accessionyear': 1968,
   'accesslevel': 1,
   'century': '2nd millennium BCE',
   'classification': 'Weapons and Ammunition',
   'classificationid': 155,
   'colorcount': 0,
   'commentary': None,
   'contact': 'am_asianmediterranean@harvard.edu',
   'contextualtextcount': 1,
   'copyright': None,
   'creditline': 'Harvard Art Museums/Arthur M. Sackler Museum, Gift of Richard R. Wagner',
   'culture': 'Levantine',
   'datebegin': -1900,
   'dated': '19th-18th century BCE',
   'dateend': -1700,
   'dateoffirstpageview': '2010-03-03',
   'dateoflastpageview': '2019-10-29',
   'department': 'Department of Ancient and Byzantine Art & Numismatics',
   'description': None,
   'details': {'technical': [{'formattedtext': '<P><SPAN style="FONT-SIZE: 12pt; FONT-FAMILY: \'Times New Roman\',\'serif\'; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-lat

## Individual Objects and IIIF

The HAM object API can provide more information (such as `exhibition`, `citation`, `publication`, and `marks`) if you ask for a specific object by its objectid. For some records that have been extensively annotated (often those with `verificationlevel` == 4) the lists for these properties can contain hundreds of entries.

In [73]:
def ham_query_with_id(apikey, endpoint, ID, **kwargs):
    """Sends kwargs to the specified endpoint, using apikey for authentication. ID is a required arg (appears in the route)"""
    params = kwargs
    params['apikey'] = apikey
    url = "https://api.harvardartmuseums.org/{}/{}".format(endpoint,ID)
    R = requests.get(url,params=params)
    return R

In [74]:
objectid = topResult['objectid']
topResultFull = ham_query_with_id(APIKEY, "object", objectid)

print(topResultFull.url)
print("Verification Level 4: {}".format(topResultFull.json()['verificationlevel'] == 4))
print()
print(topResultFull.json())

https://api.harvardartmuseums.org/object/299843?apikey=622b6240-e14a-11e8-9234-6f5a1db33697
Verification Level 4: True

{'objectid': 299843, 'objectnumber': '1951.65', 'accessionyear': 1951, 'dated': '1888', 'datebegin': 1888, 'dateend': 1888, 'classification': 'Paintings', 'classificationid': 26, 'medium': 'Oil on canvas', 'technique': None, 'techniqueid': None, 'period': None, 'periodid': None, 'century': '19th century', 'culture': 'Dutch', 'style': None, 'signed': 'red paint, l.r.: Vincent/Arles', 'state': None, 'edition': None, 'standardreferencenumber': None, 'dimensions': '61.5 x 50.3 cm (24 3/16 x 19 13/16 in.)\r\nframed: 90.4 x 79.7 x 8.3 cm (35 9/16 x 31 3/8 x 3 1/4 in.)', 'copyright': None, 'creditline': 'Harvard Art Museums/Fogg Museum, Bequest from the Collection of Maurice Wertheim, Class of 1906', 'department': 'Department of Paintings, Sculpture & Decorative Arts', 'division': 'European and American Art', 'contact': 'am_europeanamerican@harvard.edu', 'description': None,

When we printed the 10 most popular records above (under **Looking at the Results**), you may have noticed a sharp dropoff after the first few records. Our Van Gogh painting is particularly popular, with ~8000 more views than the second most popular record and more than 4x as many as the tenth most popular. This particular Art Museum record is used as the default image asset for the demo installation of [Project Mirador](http://projectmirador.org/demo/), an image viewer for [IIIF (International Image Interoperability Framework)](https://iiif.io/) media assets. 

We're not going to go deep into IIIF in this workshop, but want to mention that IIIF is both a community of developers and a collection of APIs and API-compliant tools that you can use to share, manipulate, and display visual materials. The [Image API](https://iiif.io/api/image/2.1/) and [Presentation API](https://iiif.io/api/presentation/2.1/) are the most used outputs as of now, though there are also APIs for Authentication, Search, and beta versions for other media (video and VR).

### IIIF Image API

Within our `topResultFull` object, there is an images list, which contains IIIF baseurls as well as Image Delivery Service URLs:

In [75]:
ham_images = topResultFull.json()['images']
ham_images

[{'baseimageurl': 'https://nrs.harvard.edu/urn-3:HUAM:DDC251942_dynmc',
  'copyright': 'President and Fellows of Harvard College',
  'displayorder': 1,
  'format': 'image/jpeg',
  'height': 2550,
  'idsid': 47174896,
  'iiifbaseuri': 'https://ids.lib.harvard.edu/ids/iiif/47174896',
  'imageid': 429030,
  'publiccaption': None,
  'renditionnumber': 'DDC251942',
  'width': 2087},
 {'baseimageurl': 'https://nrs.harvard.edu/urn-3:HUAM:DDC000072_dynmc',
  'copyright': 'President and Fellows of Harvard College',
  'displayorder': 2,
  'format': 'image/jpeg',
  'height': 2550,
  'idsid': 18737483,
  'iiifbaseuri': 'https://ids.lib.harvard.edu/ids/iiif/18737483',
  'imageid': 185978,
  'publiccaption': None,
  'renditionnumber': 'DDC000072',
  'width': 2088},
 {'baseimageurl': 'https://nrs.harvard.edu/urn-3:HUAM:DDC251934_dynmc',
  'copyright': 'President and Fellows of Harvard College',
  'displayorder': 3,
  'format': 'image/jpeg',
  'height': 2550,
  'idsid': 47174892,
  'iiifbaseuri': 'htt

This particular record has 6 images associated with it. Try copying and pasting some of the `baseimageurl`s in your browser:

In [76]:
for index, image in enumerate(ham_images, start=1):
    print(f"image {index} baseimageurl: {image['baseimageurl']}\nimage {index} iiifbaseuri: {image['iiifbaseuri']}\n")

image 1 baseimageurl: https://nrs.harvard.edu/urn-3:HUAM:DDC251942_dynmc
image 1 iiifbaseuri: https://ids.lib.harvard.edu/ids/iiif/47174896

image 2 baseimageurl: https://nrs.harvard.edu/urn-3:HUAM:DDC000072_dynmc
image 2 iiifbaseuri: https://ids.lib.harvard.edu/ids/iiif/18737483

image 3 baseimageurl: https://nrs.harvard.edu/urn-3:HUAM:DDC251934_dynmc
image 3 iiifbaseuri: https://ids.lib.harvard.edu/ids/iiif/47174892

image 4 baseimageurl: https://nrs.harvard.edu/urn-3:HUAM:30033_dynmc
image 4 iiifbaseuri: https://ids.lib.harvard.edu/ids/iiif/43182083

image 5 baseimageurl: https://nrs.harvard.edu/urn-3:HUAM:50493_dynmc
image 5 iiifbaseuri: https://ids.lib.harvard.edu/ids/iiif/43183405

image 6 baseimageurl: https://nrs.harvard.edu/urn-3:HUAM:50849_dynmc
image 6 iiifbaseuri: https://ids.lib.harvard.edu/ids/iiif/43183422



You'll notice that the `baseimageurls` use Harvard's Name Resolution service, which redirects to an Image Delivery Service URL that displays the image. This is nice, but we're more interested in the `iiifbaseuris` because we can manipulate IIIF resources using the Image API. Try opening one of those. What happens?

The IIIF Image API spec requires that we pass not just a baseurl, but a well-formed IIIF-compliant URI to get an image. Let's check out that [documentation](https://iiif.io/api/image/2.1/) and see what else we need to construct one of those.

From the docs:

>The IIIF Image API URI for requesting an image must conform to the following URI Template:
>
>`{scheme}://{server}{/prefix}/{identifier}/{region}/{size}/{rotation}/{quality}.{format}`
>
>For example:
>
>`http://www.example.org/image-service/abcd1234/full/full/0/default.jpg`
>
The parameters of the Image Request URI include region, size, rotation, quality and format, which define the characteristics of the returned image. These are described in detail in Image Request Parameters.

The `iiifbaseuri`s include up through the `{identifier}`, but we need to include additional parameters to get the server to actually render the image for us. These parameters are passed within the URI itself, rather than in a query string appended after a delimiter (usually `?`), which is what we've been using `requests` to do. Let's write a function that can generate IIIF URIs for us. Because all of the parameters we want to insert are required, we won't use `**kwargs` - instead we'll set default params which you can override by passing in new ones.

In [77]:
def iiif_query(baseuri, region="full", size="full", rotation=0, quality="default", format="jpg", info=False):
    """Creates a valid IIIF URL, with the option to request image information"""
    if baseuri[-1:] != "/":
        baseuri += "/"
    if info == True:
        return baseuri+"info.json"
    else:
        url = baseuri+"{}/{}/{}/{}.{}".format(region, size, rotation, quality, format)
        return url

Now let's try using this function to display the images and links within our notebook. We'll need to add a few more modules to do this: `display`, `Image`, and `HTML`, all from `IPython.display`.

In [78]:
from IPython.display import display, Image, HTML
for img in ham_images:
    image_url = iiif_query(img['iiifbaseuri'])
    display(HTML("<a href='{}'>{}</a>".format(image_url,image_url)))
    display(Image(url=image_url, height=200, width=200))

Now we have some valid image URLs! We've displayed the content of those URLs here directly using Jupyter's display and image libraries, but you can also open them in your browser directly!

This is nice, but the Image API lets us do a lot more by just by passing in some parameters. Maybe we want to generate some square, grayscale images for a gallery:

In [79]:
for img in ham_images:
    image_url = iiif_query(img['iiifbaseuri'], quality="gray", region="square")
    display(HTML("<a href='{}'>{}</a>".format(image_url,image_url)))
    display(Image(url=image_url, height=200, width=200))

### IIIF Image API Exercise
Let's try requesting only the right half of an image (using `region`), in black and white, and getting back a PNG:

In [87]:
# Write your code here
for img in ham_images:
    image_url = iiif_query(img['iiifbaseuri'], quality="bitonal", region="pct:50,0,100,100", format="png")
    display(HTML("<a href='{}'>{}</a>".format(image_url,image_url)))
    display(Image(url=image_url, height=200, width=200))

Feel free to try to manipulate the images in other ways as well! That's it for our quick introduction to the Image API.

### IIIF Presentation API

If you're interested in the Presi API (for presenting structured IIIF resources as part of a more fully-functional web app), check out [this documentation](https://iiif.io/api/presentation/2.1/) to learn how IIIF manifests structure sequences of canvases which image viewers then present to end users. You can find an HAM Object's manifest in the `seeAlso` field, or by by appending the object ID to a baseurl:

In [None]:
print(topResult['seeAlso'])
print('https://iiif.harvardartmuseums.org/manifests/object/{}'.format(topResult['id']))

#### Mirador

You can consume these resources using [Mirador](http://projectmirador.org/), an image viewer which uses the IIIF Image and Presentation APIs. We used to use Mirador in a different version of this workshop which integrated Omeka, a content management system.

If you head to the [Project Mirador Demo](http://projectmirador.org/) page, you can add a new manifest in the top left ("four boxes icon" -> "Replace Object" -> "Add new object from URL"). Paste in your manifest URL there.

Example manifest URL for the second most viewed painting, "The Gare Saint-Lazare: Arrival of a Train": https://iiif.harvardartmuseums.org/manifests/object/228649

## More stuff!

So far, we've only been getting limited sets of object data. But what if there were a big query we wanted to make? Let's try it out on "Unidentified culture" materials in the museum.

In [88]:
unknown = ham_query(APIKEY, "object", culture="Unidentified culture", size=100).json()

Looking at our previous queries, it looks like we've got some information about our query in the "info" section. Let's take a look at that...

In [89]:
unknown['info']

{'next': 'https://api.harvardartmuseums.org/object?culture=Unidentified%20culture&size=100&apikey=622b6240-e14a-11e8-9234-6f5a1db33697&page=2',
 'page': 1,
 'pages': 7,
 'totalrecords': 607,
 'totalrecordsperquery': 100}

### Iterating through pages

It looks like we have 7 pages of data to get, and our response gives us a "next" url for easy iteration. Nice!

However, let's look at how we would iterate even without this convenience factor.

In [90]:
unknown.keys()

dict_keys(['info', 'records'])

It looks like we have two components to our response, info and records. Since `info` is request specific, we're just after `records`, and we'll want to combine them all. 

We could set this up in a regular loop, which would query the API as fast as our processors can go, which can produce many queries per second, and is usually limited more by network speed than by processor speed. However, this can put a strain on the API endpoint, so it can be good practice to build in timers when making many requests. Sometimes an API will specify a number of requests/second that you're allowed to make, sometimes not. Putting even a fraction of a second delay in your code will help make sure that you don't accidentally get yourself banned from the API.

In [91]:
import time

In [92]:
unknown_records = []
keepGoing = True
page = 1

while keepGoing:
    R = ham_query(APIKEY, "object", culture="Unidentified culture", size=100, page=page)
    time.sleep(0.5)
    response = R.json()
    unknown_records.extend(response['records'])
    if response['info']['pages'] == page:
        keepGoing = False
    else:
        page += 1

In [93]:
len(unknown_records)

607

In [94]:
unknown_records[0]

{'accessionmethod': 'Gift',
 'accessionyear': 1952,
 'accesslevel': 1,
 'century': '19th century',
 'classification': 'Drawings',
 'classificationid': 21,
 'colorcount': 0,
 'commentary': None,
 'contact': 'am_europeanamerican@harvard.edu',
 'contextualtextcount': 0,
 'copyright': None,
 'creditline': 'Harvard Art Museums/Fogg Museum, Gift of Agnes Mongan',
 'culture': 'Unidentified culture',
 'datebegin': 0,
 'dated': '19th century',
 'dateend': 0,
 'dateoffirstpageview': '2009-08-30',
 'dateoflastpageview': '2015-12-13',
 'department': 'Department of Drawings',
 'description': None,
 'dimensions': 'actual: 10.7 x 10 cm (4 3/16 x 3 15/16 in.)',
 'division': 'European and American Art',
 'edition': None,
 'exhibitioncount': 0,
 'groupcount': 0,
 'id': 296822,
 'imagecount': 1,
 'imagepermissionlevel': 0,
 'images': [{'baseimageurl': 'https://nrs.harvard.edu/urn-3:HUAM:INV002983_dynmc',
   'copyright': 'President and Fellows of Harvard College',
   'displayorder': 1,
   'format': 'image

# Enhancing Data

We've now used Python and the Art Museum API to create a custom dataset. We've also played with the IIIF Image API to programmatically alter images. In this next section, we're going to provide you space to experiment and enhance your dataset by using a third API to join an additional datasource. We're going to provide some directions for an exercise using GeoNames and Folium, a wrapper for the Javascript mapping library Leaflet. But you're also free to find another API such as Wikidata, the Getty Union List of Artist Names, or the Google Vision API.

## Overview
- Pick a current exhibit to examine; find fields that can be geocoded
- Use the GeoNames API to geocode those fields
- Display the points as markers on a Leaflet map

## Working with HAM Exhibits
Let's start by picking one current exhibit to examine. We've already stored this data in `current_exhibits`.
- We'll want to pick an exhibit that has a number of people associated with it, so that we have multiple locations to georeference.
- People are stored in `exhibit['records']['people']`. Take a look at these and see what we could geocode.
- Within each record, this is stored in a `peoplecount` field (eg `exhibit['records']['peoplecount']`).
- You'll need to do a new query to the `object` API for each exhibit.
- Let's sum these up to get a quick count of people associated with each exhibit. Sort the exhibits by this count, and print those along with the exhibit names and IDs.

In [None]:
# Write code here!

In [106]:
for e in current_exhibits:
    R = ham_query(APIKEY, 'object', exhibition=e['id'], size=100)
    time.sleep(0.5)
    print(e['title'], e['id'])
    print(sum([r['peoplecount'] for r in R.json()['records']]))
    print()

Davis Museum Permanent Gallery reinstallation 5172
0

Clay—Modeling African Design 5820
9

Charlotte Posenenske: A Retrospective 5593
2

Gauguin: Portraits 5471
1

Manet and Modern Beauty 5477
1

Hyman Bloom: A Matter of Life and Death 5777
4

Through a Glass, Darkly: Allegory and Faith in Netherlandish Prints from Lucas van Leyden to Rembrandt 5697
15

Woven Interiors: Early Medieval Textiles of the Eastern Mediterranean 5506
1

Winslow Homer: Eyewitness 5660
122

Early Christian Africa: Arts of Transformation 5924
5

Critical Printing 5925
26

Dharma and Pūnya: Buddhist Ritual Art of Nepal 5281
1

Unto This Last: Two Hundred Years of John Ruskin 5713
2

In a Cloud, in a Wall, in a Chair: Modernists in Mexico at Midcentury 5805
1

Crossing Lines, Constructing Home: Displacement and Belonging in Contemporary Art 5834
40

William Blake 5759
3

City of Dreams. Lyonel Feininger and His Villages 5559
5

Zarina: Atlas of Her World  5866
1

John Singer Sargent: Portraits in Charcoal 5652
1



Pick two of the top three exhibits to further examine.
- Store all the objects in those exhibits
- Store the exhibition information (endpoint `exhibition`) as well

In [117]:
ham_query(APIKEY, 'object', exhibition=5660, size=100).json()['info']['pages']

1

In [118]:
eyewitness_objects = ham_query(APIKEY, 'object', exhibition=5660, size=100).json()['records']

In [119]:
len(eyewitness_objects)

56

In [116]:
eyewitness_exhibit = ham_query_with_id(APIKEY, 'exhibition', 5660).json()
eyewitness_exhibit

{'begindate': '2019-08-31',
 'color': None,
 'description': None,
 'enddate': '2020-01-05',
 'exhibitionid': 5660,
 'id': 5660,
 'images': [{'baseimageurl': 'https://nrs.harvard.edu/urn-3:HUAM:LEG270275',
   'caption': 'Winslow Homer, Prisoners from the Front, 1866. Oil on canvas. The Metropolitan Museum of Art, New York, Gift of Mrs. Frank B. Porter, 1922, 22.207, TL42108. Photo: © The Metropolitan Museum of Art. Image source: Art Resource, NY.',
   'copyright': 'Image copyright © The Metropolitn Museum of Art. Image source: Art Resource, NY',
   'displayorder': 1,
   'format': 'image/jpeg',
   'height': 1606,
   'idsid': 462607066,
   'iiifbaseuri': 'https://ids.lib.harvard.edu/ids/iiif/462607066',
   'imageid': 495715,
   'renditionnumber': 'LEG270275',
   'width': 2550}],
 'lastupdate': '2019-11-06T05:06:53-0500',
 'people': [{'displayname': 'Curated by Ethan Lasser',
   'displayorder': 1,
   'name': 'Ethan Lasser',
   'personid': 58267,
   'prefix': 'Curated by',
   'role': 'Curat

## Geonames API

Let's start by checking out the JSON version of the [GeoNames API](http://www.geonames.org/export/JSON-webservices.html).

- Write a function that hits `searchJSON` and geocodes a placename. It should return the latitude and longitude (`lat` and `lng`) of the top result. You'll need to include your GeoNames username in the API calls.

In [136]:
USERNAME = 'cdc43339'
def search_place(placename):
    """Searches the GeoNames searchJSON endpoint for a placename, returning a latitude and longitude"""
    # Your code here
    endpoint = "http://api.geonames.org/searchJSON"
    parameters = {
        "username": USERNAME,
        "q": placename
    }
    R = requests.get(endpoint, params=parameters)
    top = R.json()['geonames'][0]
    # print(f"{top['name']}")
    return top['lat'], top['lng']

In [137]:
print(search_place("London, UK"))


('51.50853', '-0.12574')


- Write another function which takes an exhibit, gets all the people in the exhibit, and geocodes their birthplace and deathplace. The function should return a dictionary of people objects that also have `birthplace_coordinates` and `deathplace_coordinates` attributes. Be warned that not every person will have a birthplace and/or deathplace.

In [146]:
def geocode_exhibit_people_locations(exhibit):
    exhibit_people = {}
    for record in exhibit:
        if record['peoplecount'] > 0:
            record_people = record['people']
        for person in record_people:
            if person['birthplace'] is not None:
                person['birthplace_coordinates'] = search_place(person['birthplace'])
            if person['deathplace'] is not None:
                person['deathplace_coordinates'] = search_place(person['deathplace'])
        exhibit_people[record['id']] = record_people
    return exhibit_people

In [148]:
eyewitness_people = geocode_exhibit_people_locations(eyewitness_objects)

In [151]:
print(eyewitness_people)

{256334: [{'alphasort': 'Homer, Winslow', 'birthplace': 'Boston, MA', 'name': 'Winslow Homer', 'prefix': 'After', 'personid': 26501, 'gender': 'male', 'role': 'Artist', 'displayorder': 1, 'culture': 'American', 'displaydate': '1836 - 1910', 'deathplace': 'Prouts Neck, ME', 'displayname': 'After Winslow Homer', 'birthplace_coordinates': ('42.35843', '-71.05977'), 'deathplace_coordinates': ('43.53342', '-70.31449')}, {'alphasort': 'Unidentified Artist', 'birthplace': None, 'name': 'Unidentified Artist', 'prefix': 'Engraved by', 'personid': 34147, 'gender': 'unknown', 'role': 'Artist', 'displayorder': 2, 'culture': None, 'displaydate': None, 'deathplace': None, 'displayname': 'Engraved by Unidentified Artist'}], 256471: [{'alphasort': 'Homer, Winslow', 'birthplace': 'Boston, MA', 'name': 'Winslow Homer', 'prefix': 'After', 'personid': 26501, 'gender': 'male', 'role': 'Artist', 'displayorder': 1, 'culture': 'American', 'displaydate': '1836 - 1910', 'deathplace': 'Prouts Neck, ME', 'display

## Folium and Leaflet

We can use [Folium](https://python-visualization.github.io/folium/index.html) to incorporate [Leaflet](https://leafletjs.com/reference-1.5.0.html), an open-source JavaScript library for interactive web maps. Here are some more Folium examples: https://nbviewer.jupyter.org/github/python-visualization/folium/tree/master/examples/.
- Check out both the Folium library documentation and the Leaflet API documentation to understand how Leaflet maps are implemented.
- Install Folium if you haven't already: `conda install folium -c conda-forge` or `pip install folium`
- Import Folium and create a map object. Folium has a number of different basemap options and other possible configurations - check out the documentation to customize the look of your map.

In [139]:
import folium
people_map = folium.Map(
    zoom_start=8,
    tiles='Stamen Toner'
)

mk = folium.features.Marker([0, 0])
pp = folium.Popup('hello')
ic = folium.features.Icon(color='red')

mk.add_child(ic)
mk.add_child(pp)
people_map.add_child(mk)

people_map

- Create a function which can add people to the map as markers. The function should accept an object or list of people; a map, to which the markers will be added; and any other parameters you'd like
- Label each marker with the person's name and the location.
- Bonus points for creating different colored markers for birth and death locations and for styling the label / making it more readable.

In [None]:
def add_people_to_map(people, leaflet_map):
        """Adds people as markers to a Leaflet map"""
    for painting_id in people:
        p = people[painting_id]
        for person in p:
        # your code here

## Exporting

Now we have some cool data, but maybe we want to do something with it outside of Python. It's common to see CSV data traded around, since it's just a plain text spreadsheet file, so most things can parse it. Let's make one of those! We could use the relatively low level `csv` library, but instead, let's use a higher level library, `pandas`

In [152]:
import pandas as pd # Common invocation of pandas. Gotta save those 4 keystrokes.

### "Be a dataframe!" - us

Pandas thinks of things in terms of dataframes, which will be familiar if you work in R. Basically, they're really efficient arrays of data. They also translate really well to a tabular format.

To make an iterable object into a dataframe, sometimes you can just get away with shouting "Hey you! Be a dataframe!" at it (in code). Since we have a list of dictionaries with consistent keys, there's a good chance this process will do something smart for us:

In [154]:
pd.DataFrame(unknown_records).head()

Unnamed: 0,accessionmethod,accessionyear,accesslevel,century,classification,classificationid,colorcount,colors,commentary,contact,...,technique,techniqueid,title,titlescount,totalpageviews,totaluniquepageviews,url,verificationlevel,verificationleveldescription,worktypes
0,Gift,1952.0,1,19th century,Drawings,21,0,,,am_europeanamerican@harvard.edu,...,,,Madonna and Child and Saints,1,4,3,https://www.harvardartmuseums.org/collections/...,3,Good. Object is well described and information...,"[{'worktypeid': '125', 'worktype': 'drawing'}]"
1,Purchase,2004.0,1,19th century,Photographs,17,0,,,am_europeanamerican@harvard.edu,...,Albumen silver print,110.0,Untitled (album page with architectural studie...,1,0,0,https://www.harvardartmuseums.org/collections/...,2,Adequate. Object is adequately described but i...,"[{'worktypeid': '259', 'worktype': 'photograph'}]"
2,Gift,1995.0,1,Unidentified century,Amulets,180,0,,,am_asianmediterranean@harvard.edu,...,"Cast, lost-wax process",1311.0,Phallic Amulet with Snail-Shell Testicles,1,247,224,https://www.harvardartmuseums.org/collections/...,4,"Best. Object is extensively researched, well d...","[{'worktypeid': '12', 'worktype': 'amulet'}]"
3,Gift,1969.0,1,Unidentified century,Sculpture,30,0,,,am_asianmediterranean@harvard.edu,...,"Cast, lost-wax process",1311.0,Double Hand,1,4,3,https://www.harvardartmuseums.org/collections/...,4,"Best. Object is extensively researched, well d...","[{'worktypeid': '317', 'worktype': 'sculpture'}]"
4,Gift,1995.0,1,Unidentified century,Amulets,180,0,,,am_asianmediterranean@harvard.edu,...,"Cast, lost-wax process",1311.0,Phallic Amulet with Wings,1,189,160,https://www.harvardartmuseums.org/collections/...,4,"Best. Object is extensively researched, well d...","[{'worktypeid': '12', 'worktype': 'amulet'}]"


What do you know! It worked. But let's take a look at a more hands on approach to the same thing.

In [156]:
pd.DataFrame.from_dict(unknown_records).head()

Unnamed: 0,accessionmethod,accessionyear,accesslevel,century,classification,classificationid,colorcount,colors,commentary,contact,...,technique,techniqueid,title,titlescount,totalpageviews,totaluniquepageviews,url,verificationlevel,verificationleveldescription,worktypes
0,Gift,1952.0,1,19th century,Drawings,21,0,,,am_europeanamerican@harvard.edu,...,,,Madonna and Child and Saints,1,4,3,https://www.harvardartmuseums.org/collections/...,3,Good. Object is well described and information...,"[{'worktypeid': '125', 'worktype': 'drawing'}]"
1,Purchase,2004.0,1,19th century,Photographs,17,0,,,am_europeanamerican@harvard.edu,...,Albumen silver print,110.0,Untitled (album page with architectural studie...,1,0,0,https://www.harvardartmuseums.org/collections/...,2,Adequate. Object is adequately described but i...,"[{'worktypeid': '259', 'worktype': 'photograph'}]"
2,Gift,1995.0,1,Unidentified century,Amulets,180,0,,,am_asianmediterranean@harvard.edu,...,"Cast, lost-wax process",1311.0,Phallic Amulet with Snail-Shell Testicles,1,247,224,https://www.harvardartmuseums.org/collections/...,4,"Best. Object is extensively researched, well d...","[{'worktypeid': '12', 'worktype': 'amulet'}]"
3,Gift,1969.0,1,Unidentified century,Sculpture,30,0,,,am_asianmediterranean@harvard.edu,...,"Cast, lost-wax process",1311.0,Double Hand,1,4,3,https://www.harvardartmuseums.org/collections/...,4,"Best. Object is extensively researched, well d...","[{'worktypeid': '317', 'worktype': 'sculpture'}]"
4,Gift,1995.0,1,Unidentified century,Amulets,180,0,,,am_asianmediterranean@harvard.edu,...,"Cast, lost-wax process",1311.0,Phallic Amulet with Wings,1,189,160,https://www.harvardartmuseums.org/collections/...,4,"Best. Object is extensively researched, well d...","[{'worktypeid': '12', 'worktype': 'amulet'}]"


`pd.DataFrame.from_dict` gives you more control over the conversion process, so you can provide more options if things don't look how you expect them to.

As a side note, we do have some data structures in here that don't make a lot of sense in a tabular format. Look at `worktypes` at the very end. That's a list, and each cell has list data in it. We won't be able to do much with that in Excel or some other tabular data processing tool, but it also won't break anything for us. It just looks weird. Within the dataframe, they still work like lists though, so you can access the data while you're still in Python if you're clever about it.

### Exporting

From here, our export process is really easy. We just say "Hey you! Be a CSV file now!", and so it shall be.

In [157]:
df = pd.DataFrame(unknown_records)

In [158]:
df.to_csv("unknown_ham_records.csv",index=None)

# Data collected!

Now we've got some interesting data and exported it. We could throw it at a program like Tableau or Excel to visualize it or further explore it. We could also continue to explore it in Python. Another options would be to remix it into an Omeka site. This is a good option if you're interested in exploring other RESTful methods, like `POST`, `DELETE`, or `UPDATE.` Check out the `Omeka` notebook for more information on how to pull off a big data heist!

*Not a real heist, we are using freely available data that the museum has generously made available. Please do not steal any physical art.*