# Introduction

This project is a (ver) brief exploration/proof of concept of using Jupyter Notebooks to get started with APIs and related code libraries. In this case, we're using the DPLA API and the DPyLA code library. Why a Jupyter Notebook? Well, originally, because I couldn't get the requests package working in PyCharm. But once I started working with it I realized that Jupyter Notebooks solve a lot of problems for teaching people how to work with APIs: markdown for lengthy explanations, live links, embedded images, HTML support, and a single environment to support everything from code to display. What follows is a simple tutorial with a few asides highlighting how Jupyter is helping us along.

## Abouts

About the DPLA API
Explore the documentation: https://dp.la/info/developers/codex/. There is a lot of critical information there about what the API supports and the metadata model - that's enough to get started with. 

About the DPyLA Code Library
See the ReadMe (https://github.com/bibliotechy/DPyLA/blob/master/README.md) for the source of most of my code examples and more information about the code library. 

# API keys

The first thing you need to do to access the DPLA API is make sure you have a key. In slightly oversimplified terms, an API key tells the service who you are so it knows it's okay to send you data. There are two ways to go about getting an API key - choose your own adventure: 

1. The DPyLA code library makes it pretty simple to request. You'll have to have your own Jupyter Notebook for other Python environment running. Then simply execute the following code and then check your email:
>from dpla.api import DPLA
>DPLA.new_key("YourEmail@address.com)    

2. You can send a simple request to the API directly as well. You'll need some sort of unix environment with CURL installed (most have it, I think). Execute this command and check your email (the DPLA site has more info about what you can expect to see on the screen to know the command was successful.)
>curl -v -XPOST https://api.dp.la/v2/api_key/YOUR_EMAIL@example.com

Once you have a key, you're off and running! Since you're unlikely to be sending any sensitive informaton back and forth, the method for using the key is fairly simple.

Note about API keys: Keep it secret! Keep it safe! Don't share or publish your key anywhere. 

# Searching DPLA using DPyLA

This illustration will start with a simple search and a look at a few search results. The API will only send back 10 records by default, but there's so much metadata here that even 10 records looks like a lot (don't be intimidated!). This first example will be limited to three results so we can see the response. One of the benefits of doing this in a notebook is that we can move on with our code and still come back and look at the raw results to remind ourselves about the data structure as we go. 

In [7]:
from dpla.api import DPLA
import pprint
import json

dpla = DPLA('YourKeyGoesHere')
result = dpla.search('little kittens')
print("There are", result.count, "matching records. Here are the first 3:")
print(result.items[3])

There are 68 matching records. Here are the first 3:
{'@context': 'http://dp.la/api/items/context', 'isShownAt': 'https://texashistory.unt.edu/ark:/67531/metapth30672/', 'dataProvider': ['Star of the Republic Museum'], '@type': 'ore:Aggregation', 'provider': {'@id': 'http://dp.la/api/contributor/the_portal_to_texas_history', 'name': 'The Portal to Texas History'}, 'object': 'https://texashistory.unt.edu/ark:/67531/metapth30672/small/', 'ingestionSequence': 28, 'id': '384e70f3aa6c07f6f395b968259cd0d4', 'ingestDate': '2017-10-04T01:13:16.293209Z', '_rev': '17-dc821f98ab2b81c14fdb3521004cbb33', 'aggregatedCHO': '#sourceResource', '_id': 'texas--info:ark/67531/metapth30672', 'sourceResource': {'title': ['The Story of the Three Little Kittens'], 'description': ['Front and back covers of a book titled "The Story of the Three Little Kittens." from the "Little Kitten Series.', '[2] p. : col. ill. ; 27 cm.'], 'subject': [{'name': 'Communication Artifacts'}, {'name': 'Documentary Artifact'}, {'n

## Looking at the metadata

We have successfully done a search! But, wow, what is going on with the data? Most APIs return data using a format called JSON (I mostly hear it pronounced jay-sawn). It's a good idea to stop for a second here a learn a little more about it. You won't have to deal with it directly too much, because we'll end up using tools to convert it, but here's a good overview to start with (go ahead, I'll wait here): https://code.tutsplus.com/tutorials/understanding-json--active-8817

To work with it in Python, we need it in a format Python understands. There are tools to convert it and I think the code library has already done that for us. What we're seeing here now is actually a dictionary, well several dictionaries, some with dictionaries embedded in them. 

Let's see if we can make that data a little more readable so we can see what we're dealing with a little bit better. We can use the pprint ("pretty printer": https://docs.python.org/3/library/pprint.html) method (imported above) to look at a formatted version of a given record - in this case, the first one. Take a good look at it, but don't stress about it yet. And again, since we're using a notebook, we can just leave this right here and come back when we need to refer to it. 

In [9]:
result = dpla.search('little kittens')
pprint.pprint(result.items[1])

{'@context': 'http://dp.la/api/items/context',
 '@id': 'http://dp.la/api/items/7c8e3f71923ca01d13c3c4fa01618585',
 '@type': 'ore:Aggregation',
 '_id': 'hathitrust--008673631',
 '_rev': '16-34ef523ea7814284d1f128bd98c5a87b',
 'admin': {'sourceResource': {'title': 'Three little kittens /'}},
 'aggregatedCHO': '#sourceResource',
 'dataProvider': ['New York Public Library'],
 'id': '7c8e3f71923ca01d13c3c4fa01618585',
 'ingestDate': '2017-09-22T20:52:19.239097Z',
 'ingestType': 'item',
 'ingestionSequence': 31,
 'isShownAt': 'http://catalog.hathitrust.org/Record/008673631',
 'originalRecord': {'_id': '008673631',
                    'controlfield': [{'#text': '008673631', 'tag': '001'},
                                     {'#text': 'MiAaHDL', 'tag': '003'},
                                     {'#text': '20100728000000.0',
                                      'tag': '005'},
                                     {'#text': 'm d', 'tag': '006'},
                                     {'#text': 

The nice thing about using a code library is that someone has generally done some of the heavy lifting for you. In this case, the DPyLA readme gives us an example of how to loop through all of our search results (not just the first 10) and pull out a specific field. I've adapted the readme example, using our 'little kitten' search results.

In [10]:
result = dpla.search('little kittens')
for item in result.all_records():
      print(item["sourceResource"]["title"])

Three little kittens
['Three little kittens /']
Three little kittens
['The Story of the Three Little Kittens']
Three little kittens
The 3 little kittens
Three little kittens
Three little kittens
Three little kittens
['The Three Little Kittens']
['Three little kittens, Chicken Little']
['Three little kittens, Chicken Little']
The Color Kittens
Little girl with kittens, Morrison County
['Tales from catland, for little kittens']
['More About the Three Little Kittens']
['Three little kittens : a story for little tots /']
Woman holding child with kittens
['Wonderful history of three little kittens who lost their mittens']
Three little kittens and other favorite nursery rhymes [book review]
Wonderful history of three little kittens who lost their mittens
Woman with kittens and chickens, C. J. French collection
Four Swedish Hospital School of Nursing students who performed in a Skit, "Ma and Three Little Kittens Lost Their Mittens."
['[Brown wax home recording of man reciting Old mother Hubba

## Adding metadata and formatting results

Hmmm. It looks like there are some duplicates in there. I'd like to get some more data back to differentiate the results. I can use the same loop and the same method to pull out more fields, so I need to think about how to find the fields I might want. The code library example has given us a lot to go on. We know there is probably more data we want in the "sourceResource" section, so we can start by looking there. Here again you have a choice. You can look at the example we printed above, but keep in mind that not every item will have every data element. A better option is to study the metadata object structure in the API documentation (https://dp.la/info/developers/codex/responses/object-structure/).  

We are still using the code library and we're still iterating through results. It would be simple to just add a bunch of the fields we've identfied, but it will throw an error if our element is missing from the structure (in Python terms, not every key will be in the dictionary for every record). So we want to tell Python that it's okay to keep going if that's the case. I also ran into a problem with date formats, so I've added an exception for that as well. There are a few ways to go about this, and this is one, simple method. 

Just to make the results a little easier to read, I've used another little helper from the code library, "result.counter" to tell us how many search results we have. Then I added an accumulator to number each result and a  line break at the end of each loop so I can have some visual space between the records.

In [11]:
result = dpla.search('little kittens')
print("There are", result.count, "matching records.")
res_no = 0
for item in result.all_records():
    res_no = res_no + 1
    print("Result:", res_no)
    try:
        print("Title:",item["sourceResource"]["title"])
        print("Image:", item["object"])
        print("Creator:",item["sourceResource"]["creator"])
        print("Provider:", item["provider"]["name"])
        print("Description:", item["sourceResource"]["description"])
        print("Date:", item["sourceResource"]["date"]["displayDate"])
        print("Type:", item["sourceResource"]["type"])
        print("Rights:", item["sourceResource"]["rights"])
        print("\n")
    except KeyError:
        print("Incomplete Data", "\n")
    except TypeError:
        print("Date Error", "\n")

There are 68 matching records.
Result: 1
Title: Three little kittens
Image: http://images.nypl.org/index.php?id=1692922&t=t
Creator: ['Peek, Geo. W']
Provider: The New York Public Library
Description: ['Cover contains a photograph of three kittens with the bottom part of a ladder.', 'Dedication on cover: To A. C. R. Stevens, M. D., New York City.']
Date: 1893
Type: image
Incomplete Data 

Result: 2
Title: ['Three little kittens /']
Incomplete Data 

Result: 3
Title: Three little kittens
Image: http://images.nypl.org/index.php?id=1692923&t=t
Creator: ['Peek, Geo. W']
Provider: The New York Public Library
Description: ['Cover contains a photograph of three kittens with the bottom part of a ladder.', 'Dedication on cover: To A. C. R. Stevens, M. D., New York City.']
Date: 1893
Type: image
Incomplete Data 

Result: 4
Title: ['The Story of the Three Little Kittens']
Image: https://texashistory.unt.edu/ark:/67531/metapth30672/small/
Incomplete Data 

Result: 5
Title: Three little kittens
Ima

Result: 31
Title: ['Little poems, from the German. Part first']
Incomplete Data 

Result: 32
Title: ['Standard Sewing Machine Co.']
Image: http://digitalcollections.philau.edu/utils/getthumbnail/collection/TCards/id/1263
Creator: ['Charles E. Buck']
Provider: PA Digital
Description: ['Picture of a little girl holding kittens as the mother cat looks on.']
Date Error 

Result: 33
Title: Nervous persons who cannot sleep and who also suffer from dyspepsia, should use Carter's little nerve pills, made specially for nervous and dyspeptic men and women
Image: http://digital.library.musc.edu/utils/getthumbnail/collection/weartatc/id/546
Creator: Carter Medicine Co
Provider: South Carolina Digital Library
Description: This apothecary card for Carter's Little Liver Pills displays an illustration of two kittens playing in the snow. Card is shaped like an artist's palette. "Carter Medicine Co., N. Y. City," appears towards the bottom of the back of the card. Circa 1870 through 1920.
Incomplete Dat

Result: 51
Title: Coupon
Image: ['http://contentdm.lib.byu.edu/utils/getthumbnail/collection/HuntBag/id/4677', 'http://contentdm.lib.byu.edu/utils/getthumbnail/collection/HuntBag/id/4677']
Creator: ['Huntington, Elfie, 1868-1949', 'Bagley, Joseph Daniel, 1874-1936']
Provider: Mountain West Digital Library
Description: ['Taken outdoors; a full portrait of a little girl and a baby standing in white dresses, holding kitten in white dresses; a dog is next to them, and trees are in the background.', 'Electronic reproduction', 'Gelatin silver print from a gelatin dry plate negative; 17.78 x 12.7 cm. (7 x 5 in.)']
Date: ca. 1919-1922
Type: image
Rights: Http://lib.byu.edu/about/copyright/special_collections.php; Public Domain; Courtesy L. Tom Perry Special Collections, Harold B. Lee Library, Brigham Young University; Public


Result: 52
Title: ['Boy Pulling Kitten in Wagon; Boy and Kitten in Wagon']
Image: https://texashistory.unt.edu/ark:/67531/metapth54105/small/
Incomplete Data 

Result: 5

Those look a little bit like search result, don't they? And we've gotten this far with basically only having to work with Python! Let's recap: We've used the DPyLA code library to get our API key, to execute a search, and to create a readable result list. Our project is definitely off to a good start. 

What if we wanted to mock up what we think a result display should look like? Maybe we're working with a web developer in our library and we want to show them exactly what we want. Or, maybe we're learning those skills ourselves and we want to try our HTML knowledge here. Jupyter Notebooks make it relatively easy to embed linked images, and we have the URLs for our images in our results. 

If I knew some html, I could use a built in function to change the cell to HTML and really mark up the record to make it look just like a web page. Unfortunately, I don't know enough HTML to make this pretty, so I'm just going to use Jupyter Notebook functonality to mock up an image display. You should be able to double-click on the text below to see how it is marked up.



**'Tales from catland, for three little kittens'** *by Grimalkin, Tabitha*
![Tales from catland, for little kittens](https://books.google.com/books/content?id=NEwDAAAAYAAJ&printsec=frontcover&img=1&zoom=5&edge=curl)

Published 1852

Dedication signed: Tabitha Grimalkin

Rights: Public domain. Learn more at http://www.hathitrust.org/access_use



# Calling the DPLA API directly

Now that we've built up a little confidence, let's see if we can ride the DPLA bike without the DPyLA code library as training wheels. First, let's decide what to query. One of the limitations of the code library is that it doesn't support queries at the collection level. We've decided we absolutely have to query collections, so we'll use the API to do that. 

One you start reviewing the requests documentation (https://dp.la/info/developers/codex/requests/) you'll see that DPLA has given us plenty to get started with. Sending a request isn't going to be all that difficult. We just need to get some help with a couple of pieces that the code library was handling for us: how to send requests, and how to format JSON. 

We must be living right because the Python documentation gives us some good, basic info on both. Take a look at my code below and then review the documentation: http://docs.python-requests.org/en/latest/user/quickstart/#more-complicated-post-requests. 

We have to import the requests package, and then we can both send the request and format the response.

In [5]:
import requests

resp = requests.get('https://api.dp.la/v2/collections?q=american&&fields=title&api_key=YourKeyGoesHere')

print(resp.json())

{'count': 5, 'start': 0, 'limit': 10, 'docs': [{'title': 'Records of the Government of American Samoa, 1900 - 1966'}, {'title': 'Records of the American Commission to Negotiate Peace, 1914 - 1931'}, {'title': 'Records of the American Battle Monuments Commission, 1918 - ca. 1995'}, {'title': 'Records of the American Expeditionary Forces (World War I), 1848 - 1942'}, {'title': 'Records of the American Commission for the Protection and Salvage of Artistic and Historic Monuments in War Areas, 1942 - 1946'}], 'facets': []}


We have successfully sent a query to the API directly! We cheated a little, didn't we? Well, not really, if you know exactly what fields you want back from the query, it's actually going to be more efficient to request only those fields rather than all the rest of the metadata that we don't really want. 

Now, we still need to get the collection titles out. First, we'll assign a variable to the json response and then loop through it and print out the titles. We know we're working with dictionaries and we got a hint about how to do this while using the code library to loop through results. We're going to have to get down to the keys we want and it looks like the 'title' keys are within the 'docs' keys. Python can handle that. 

This time we don't have our handy result.count - remember it's part of the code library and only worked on item searches. So we'll throw in another accumulator to both number and total our results. 

In [13]:
coll = resp.json()
coll_num = 0
for title in coll["docs"]:
    coll_num = coll_num + 1
    print(coll_num, title["title"])
print("\n")
print(coll_num, "collections found")    

1 Records of the Government of American Samoa, 1900 - 1966
2 Records of the American Commission to Negotiate Peace, 1914 - 1931
3 Records of the American Battle Monuments Commission, 1918 - ca. 1995
4 Records of the American Expeditionary Forces (World War I), 1848 - 1942
5 Records of the American Commission for the Protection and Salvage of Artistic and Historic Monuments in War Areas, 1942 - 1946


5 collections found


# Success!

Look at us go! Because Jupyter Notebooks are interactive, you can copy this notebook, get your own API key and start playing with the code that's already right here. Or you can copy some bits into another notebook and take it in a whole new direction. Whether you want to query the API directly or play with the code library more (it supports several other functions we didn't explore), Jupyter Notebooks are a great place to play, leave notes for yourself about what you learned - and what's not working - and capture the results in a way that's easy to share with other people. 

Have fun!