Application Programing Interfaces (API's) are one of the standard ways for interacting with data and software services on the internet. Learning how to use them with your programming is one of the fundamental steps in becoming a fluent developer. Here we will explore one API in particular, the one for the Digital Public Library of America (DPLA). But, first, what exactly is an API?

Imagine the following scenario: you have just accomplished a big task in putting the entire run of your university's literary journal online. People can explore the full text of each issue, and they can also download the images for your texts. Hooray! As we just learned in the lesson on web scraping, an interested digital humanist could use just this information to pull down your materials. They might scrape each page for the full texts, titles, and dates of your journal run, and put together their own little corpus for analysis. But that's a lot of work. Web scraping seems fun and all at first, but the novelty quickly wears off. We wouldn't want to scrape _every_ resource from the web. Surely there must be a better way, and there is! What if we were to package all that data up in a more usable way for our users to consume with their programs? That's where API's come in.

API's are a way for exchanging information and services from one piece of software to another. In this case, we're theorizing an API that would provide data. When a user comes to our journal site, we might imagine them saying, "hey - could you give me all the journals published in the 1950's?" And then our fledgling API would respond with something like the following:

[{"ArticleID":[42901,42902,42903,42904,42905,42906,42907,42908,42909,42910,42911,42912],"ID":1524,"Issue":1,"IssueLabel":"1","Season":"Spring","Volume":1,"Year":1950,"YearLabel":"1950"},{"ArticleID":[42913,42914,42915,42916,42917,42918,42919,42920,42921,42922],"ID":1525,"Issue":2,"IssueLabel":"2","Season":"Summer","Volume":1,"Year":1950,"YearLabel":"1950"},{"ArticleID":[42923,42924,42925,42926,42927,42928,42929,42930,42931,42932],"ID":1526,"Issue":3,"IssueLabel":"3","Season":"Winter","Volume":1,"Year":1950,"YearLabel":"1950"},……]

As we've discussed all along, computers are pretty bad at inferring things, so our API neatly structures a way for your programming to interface with an application that we've made (API - get it?) more easily. The results give us a list of all the articles in each issue, as well as relevant metadata for the issue. In this case, we learn the year, data, season, and issue number. With this information, we could make several other API requests for particular articles. But data isn't the only thing you can get from API's - they can also do things for us as well! Have you ever used a social media account to log in to a different website - say using Facebook to log into the New York Times website? Behind the scenes, the NY Times is using the Facebook API to authenticate you and prove that you're a user. API's let you do an awful lot, and they let you build on the work that others have done.

But this isn't a lesson about how to build API's - we're going to talk about how to use them. There are a couple different ways in which we can do this: from scratch or with a wrapper. In the former, we go through all the different steps of putting together a request for information from the DPLA API. In the latter, we use someone else's code to do the heavy lifting for us. First, we'll do things the easy way by working with DPyLA, a Python wrapper for the DPLA API. Let's pull in the relevant Python pieces. Notice the $, which indicates that we're working in command line and not Python. We'll need to install first.

$ pip install DPLA

We've got the DPLA wrapper installed, now we'll import it into our Python script. Remember, the "from X.Y import Z" will enable us to keep from writing X.Y.Z every time. In this case, it keeps us from writing dpla.api.DPLA when we would rather just write DPLA.

In [2]:
from dpla.api import DPLA

API's generally require you to prove that you are an authentic user (not a bot), and, in some cases, that you have permission to access their interface. You generally do this by authenticating through the service using credentials that you have registered with them. DPLA lets you register by sending a request through terminal. Below, change "YOUR_EMAIL@example.com" to be an email address of your choice. 

In [5]:
$ curl -v -XPOST https://api.dp.la/v2/api_key/YOUR_EMAIL@example.com

SyntaxError: invalid syntax (<ipython-input-5-84428b6a17a3>, line 1)

After running the command you should get an email with your API key. You'll then need to include this API key in every request you send to the DPLA API. For the sake of not sharing my own API key, I won't write it here. In fact, Python has a handy way for making sure that we don't share our login details in situations just like this. What we'll do is we will store our password locally in our file structure, hidden away from GitHub. Python will read in that variable from our system, store the password, and have access to it in a safe way. This process is sometimes called **sanitizing**, because you're cleaning your code to make sure that sensitive information is hidden. Run the following terminal command

In [6]:
$ export API_KEY=YOUR_API_KEY_HERE

SyntaxError: invalid syntax (<ipython-input-6-56a518e5d075>, line 1)

Now our API key is stored locally, so we'll pull it into Python. To do that, we will pull in 'os', a Python module for interacting with the file system on your computer. 

In [7]:
import os
my_api_key = os.getenv('API_KEY')

Now you should have your own API Key stored, and we can use it to make requests. The DPyLA wrapper makes this easy. First we open a connection to the DPLA API. Notice how we're calling it with our stored api_key.

In [8]:
dpla_connection = DPLA(my_api_key)

If you follow along with the [documentation for the wrapper on GitHub](https://github.com/bibliotechy/DPyLA), you can actually see that the wrapper gives us a handy way of requesting our own API key for the first time. We could have done this instead of calling a command from the terminal to get that email sent to us. This is the line of code the documentation gives us for doing so:

In [9]:
DPLA.new_key("your.email.address@here.com")

b'{"message":"API key created and sent via email. Be sure to check your Spam folder, too."}'


But we did this from the command line instead. This is a good first indication of the ways that wrappers can make your lives easier. They provide easy shortcuts for things that we would have to do from scratch otherwise. Now that we're all set up with the API, we can use this our dpla object to get information from their API! Let's do a quick search for something.

In [10]:
result = dpla_connection.search('austen')
print(type(result))

<class 'dpla.api.Results'>


Python's built in type() function tells us what we're doing with - notice that the API did not return us a list of items as you might expect. Instead, it's returned a Results object. This means that we can do all sorts of things to what we've gotten back, and simply dumping out the list of the search results is only one such choice. To see all the different commands that we might call on this object, you can call one of the built in commands. Or, if you're working from iTerm2, you can type "result." and hit tab twice to see options.

In [11]:
result.__dict__

{'count': 1431,
 'dpla': <dpla.api.DPLA at 0x10c240400>,
 'items': [{'@context': 'http://dp.la/api/items/context',
   '@id': 'http://dp.la/api/items/f5fda5d6d8d7011a575f944de8e108c4',
   '@type': 'ore:Aggregation',
   '_id': 'nypl--510d47dc-7ea0-a3d9-e040-e00a18064a99',
   '_rev': '9-0a2b767b300f8ef56a28a2c4d19c2758',
   'admin': {'object_status': 1,
    'sourceResource': {'title': 'J. Austen'},
    'valid_after_enrich': True,
    'validation_message': None},
   'aggregatedCHO': '#sourceResource',
   'dataProvider': 'The Miriam and Ira D. Wallach Division of Art, Prints and Photographs: Print Collection. The New York Public Library',
   'id': 'f5fda5d6d8d7011a575f944de8e108c4',
   'ingestDate': '2017-01-17T01:02:42.938459Z',
   'ingestType': 'item',
   'ingestionSequence': 21,
   'isShownAt': 'http://digitalcollections.nypl.org/items/510d47dc-7ea0-a3d9-e040-e00a18064a99',
   'object': 'http://images.nypl.org/index.php?id=1103428&t=t',
   'originalRecord': {'_id': '510d47dc-7ea0-a3d9-e0

The __dict__ command shows us a range of options. We can get the number of search results, the list of all items, and a couple other bits about the particular connection we've opened up. You can actually use these same tricks - __dict__ and dot + tabbing to explore virtually every other thing that you will encounter in Python. They give you information about the objects that you're working with, which is half the battle in any Python situation. But for now let's get some more information about the API results we see. We'll take a look at the first object here.

In [24]:
item = result.items[0]
item

{'@context': 'http://dp.la/api/items/context',
 '@id': 'http://dp.la/api/items/f5fda5d6d8d7011a575f944de8e108c4',
 '@type': 'ore:Aggregation',
 '_id': 'nypl--510d47dc-7ea0-a3d9-e040-e00a18064a99',
 '_rev': '9-0a2b767b300f8ef56a28a2c4d19c2758',
 'admin': {'object_status': 1,
  'sourceResource': {'title': 'J. Austen'},
  'valid_after_enrich': True,
  'validation_message': None},
 'aggregatedCHO': '#sourceResource',
 'dataProvider': 'The Miriam and Ira D. Wallach Division of Art, Prints and Photographs: Print Collection. The New York Public Library',
 'id': 'f5fda5d6d8d7011a575f944de8e108c4',
 'ingestDate': '2017-01-17T01:02:42.938459Z',
 'ingestType': 'item',
 'ingestionSequence': 21,
 'isShownAt': 'http://digitalcollections.nypl.org/items/510d47dc-7ea0-a3d9-e040-e00a18064a99',
 'object': 'http://images.nypl.org/index.php?id=1103428&t=t',
 'originalRecord': {'_id': '510d47dc-7ea0-a3d9-e040-e00a18064a99',
  'collection': {'@id': 'http://dp.la/api/collections/3f371a92211aa17caadbb21fb98f3b

We get a _lot_ of information from a source like this - far more than you probably wanted to know about this individual object. This is what makes API's both useful and tricky to work with. They often want to set up users with everything they could possibly need, but they can't know what it is that users will be interested in. So very often they seem to err on the side of completion, which can sometimes make it difficult to parse the results. One difficult piece here is that the information is hierarchical - the data is organized a bit like a tree. So you have to respect that hierarchy by unfolding it as it expects. The first line below does not work, but the second does. Can you see why?

In [None]:
item['stateLocatedIn']

In [None]:
item['sourceResource']['stateLocatedIn']

There is no top-level key for 'stateLocatedIn'. That data is actually organized under 'sourceResource', so we have to tell the script exactly where we want to look. We can confirm this by walking down the tree towards the data we're interested in. 

In [38]:
result.items[0]['sourceResource']

{'@id': 'http://dp.la/api/items/f5fda5d6d8d7011a575f944de8e108c4#sourceResource',
 'collection': {'@id': 'http://dp.la/api/collections/3f371a92211aa17caadbb21fb98f3bd4',
  'id': '3f371a92211aa17caadbb21fb98f3bd4',
  'title': 'Jane Austen.'},
 'relation': 'Jane Austen',
 'rights': 'The copyright and related rights status of this item has been reviewed by The New York Public Library, but we were unable to make a conclusive determination as to the copyright status of the item. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use.',
 'stateLocatedIn': [{'name': 'New York'}],
 'subject': [{'name': 'Clippings'}, {'name': 'Portraits'}],
 'title': 'J. Austen',
 'type': 'image'}

This confirms what we had suggested before - things are nested in ways that can be difficult to parse. Imagine saying, "to find Y, first you have to look under X" rather than saying "look at Y. Why can't you find it?" We can get more information by querying 'item.keys()'. We're dealing with a dictionary object, so we can use all the normal dictionary commands.

In [None]:
item.keys()

Let's loop over the first ten items here to get some interesting information about them. Notice here that 'item' is our name for the individual object that we are looking at in each iteration of the loop.

In [36]:
for item in result.items[:9]:
    print(item['sourceResource']['stateLocatedIn'])

[{'name': 'New York'}]
[{'name': 'New York'}]
[{'name': 'New York'}]
[{'name': 'New York'}]


KeyError: 'stateLocatedIn'

Several New York Objects, and then an error. Let's look at the fifth object to see what is wrong with it. 

In [37]:
result.items[4]['sourceResource']

{'@id': 'http://dp.la/api/items/fc468f34a8666570091a694b6ef7259a#sourceResource',
 'creator': ['Malden, Charles, Mrs'],
 'date': {'begin': '1896', 'displayDate': '1896 [c1889]', 'end': '2017'},
 'extent': ['224 p. 18 cm.'],
 'format': ['Electronic resource', 'Language material'],
 'identifier': ['sdr-nrlfGLAD117809692-B',
  '(OCoLC)217878889',
  'LC call number: PR4036 .M3',
  '(OCoLC)764476',
  'Hathi: 006155179'],
 'language': [{'iso639_3': 'eng', 'name': 'English'}],
 'publisher': ['Boston, Roberts'],
 'rights': 'Public domain. Learn more at http://www.hathitrust.org/access_use',
 'specType': ['Book'],
 'subject': [{'name': 'Austen, Jane, 1775-1817'}],
 'title': ['Jane Austen'],
 'type': 'text'}

The first several results all appear to be held by libraries, while the fifth result is an electronic resource. It makes sense that this resourcee would not be held in a particular state. Let's make our results a bit more nuanced so as to account for these edge cases.

In [45]:
for item in result.items[:9]:
    if 'stateLocatedIn' in item['sourceResource']:
        print(item['sourceResource']['stateLocatedIn'])
    else:
        print(item['sourceResource']['format'])
result.items[:9]

[{'name': 'New York'}]
[{'name': 'New York'}]
[{'name': 'New York'}]
[{'name': 'New York'}]
['Electronic resource', 'Language material']
['Electronic resource', 'Language material']
['Electronic resource', 'Language material']
['Electronic resource', 'Language material']
[{'name': 'UT'}]


[{'@context': 'http://dp.la/api/items/context',
  '@id': 'http://dp.la/api/items/f5fda5d6d8d7011a575f944de8e108c4',
  '@type': 'ore:Aggregation',
  '_id': 'nypl--510d47dc-7ea0-a3d9-e040-e00a18064a99',
  '_rev': '9-0a2b767b300f8ef56a28a2c4d19c2758',
  'admin': {'object_status': 1,
   'sourceResource': {'title': 'J. Austen'},
   'valid_after_enrich': True,
   'validation_message': None},
  'aggregatedCHO': '#sourceResource',
  'dataProvider': 'The Miriam and Ira D. Wallach Division of Art, Prints and Photographs: Print Collection. The New York Public Library',
  'id': 'f5fda5d6d8d7011a575f944de8e108c4',
  'ingestDate': '2017-01-17T01:02:42.938459Z',
  'ingestType': 'item',
  'ingestionSequence': 21,
  'isShownAt': 'http://digitalcollections.nypl.org/items/510d47dc-7ea0-a3d9-e040-e00a18064a99',
  'object': 'http://images.nypl.org/index.php?id=1103428&t=t',
  'originalRecord': {'_id': '510d47dc-7ea0-a3d9-e040-e00a18064a99',
   'collection': {'@id': 'http://dp.la/api/collections/3f371a92211

Above we checked to see if the 'sourceResource' dictionary has a particular key, which allows us to skip over electronic resources. And notice how we have a couple different formats for state names already! The first four list the full state, while the last item lists an abbreviation. This can get very tricky very quickly, and it points to why data cleaning is one of the most important tasks you do as a programming humanist. If we were interested in working across these dates, but they are formatted inconsistently, we would have to clean them up.

Maybe add some more stuff here

API's frequently limit the number of requests you can make to their service during a particular time period. For example, Twitter limits the number of requests you can make to 15 requests per 15 minutes. This ensures that you don't accidentally blow up their system with requests while you're learning, but it also helps to ensure that people are using their service for legitimate scripts rather than incessant spam bots.

That's the easy way to do things. When you're interested in using the data provided by a service, you should always look to see if they have an API. And whenever there is an API, it is worth looking to see whether there is also a wrapper for you to use. There are often different wrappers for different programming languages as well, but Python is a pretty common and popular language. So you'll often find someone else's work that you can build on.

Before we move on, I want to give just a taste of how to do things the hard way, if you didn't have a wrapper for this particular API. There are a few things you need to know:

* API Endpoint: the base URL that will be responding to your requests. Think of API's like data that live at particular URLS. If you've ever looked at the URL for a page you're in and seen something like shoppingbaseurl.com/q?=shoes&type=mens&cost=expensive, you're using something similar to an API. Basically, using an API entails constructing a URL that points to exactly the data you want. The API consists of the baseurl that gets you to the root/heart/doorway of the API, and then you give params to nuance your search/request.
* Search Parameters (Params for Short): the particular things you send with your request to get back the information you want. Remember how Python dictionaries used key: value pairs? We'll do the same thing here. In the case of the previous example, you have three params: a search query, a type, and a cost. If this were a Python dictionary, we might write that as {'search': 'shoes', 'type': 'mens', 'cost': 'expensive'}.
* API key: we've already covered this, but the API key is what authenticates you when the API you're using so that they will allow you to use it. Sometimes, you simply pass your key as an additional search paramater. Other times, you might have to authenticate with a separate service (like OAuth).

First, let's import the Python libraries that we'll need:

In [1]:
import requests

'requests' is a library that allows you to make requests to an API. Now we'll store our API endpoint so that we know where we will be making requests to. In this case, we can find out API endpoint by looking at DPLA's great [documentation](https://dp.la/info/developers/codex/api-basics/). Not every API provides you with such great guides to their work, so thanks to the wonderful people at DPLA for making this information available!

In [2]:
endpoint = 'https://api.dp.la/v2/'

Now we will set up our search params. If we're still working in the same terminal session, we should have our api_key stored in 'my_api_key'. And it's important to note that not just any search parameters would work. The API documentation specifies which pieces are allowed. If we sent over 'how_great_is_ethan' as a part of the API, it would not function.

In [3]:
params = {
    'api_key': my_api_key,
    'q': 'Austin, Texas',
    'type': 'items'
    }

NameError: name 'my_api_key' is not defined

I've set up a basic search here for information about Austin, Texas, and I've given the params my personal api key to authenticate me. I've also specified that I want to get items back.

In [None]:
res = requests.get(.endpoint, params)