# Application Programming Interface (API)

This document introduces the use of API to access and query data.  We focus on data access with APIs in this document.

## What is an API?
Broadly speaking, an API is a set of rules and procedures that facilitate interactions between computers and their applications.

A very common type of API is the **Web API**, which, among other things, allows users to query a remote database over the internet.

An API specifies **how** a user or application accesses the data.

Examples: 
- Twitter: tweets, users, replies, etc.
- Art Institute of Chicago: artworks, exhibits, ticketing, etc.
- New York Times archive: articles, headlines, book reviews, etc.

### Representational State Transfer (REST)

RESTful
- APIs are convenient for querying databases by URLs.
- requests are self-contained, meaning that they do not rely on previous requests.
- responses can be cached to improve server response time.

**General methods:**

- GET: access resources located at a URL
- POST: send data to the server
- PUT: update existing resources
- DELETE: delete resources in a server

## Request via a URL

```{figure} ../img/url.svg
---
width: 90%
name: url-example
---
Anatomy of a URL request (rows.com)
```

## Example: Art Institute of Chicago API

The Art Institute of Chicago hosts a JSON-response API.  The preferrable way of using an API is to refer to its documentation, e.g. https://api.artic.edu/docs/.

**Important for any API usage (Terms and conditions):**

- Check rate limits
- Check authentication methods

In [2]:
import requests  # requests.readthedocs.io/
import pandas as pd

### First request

In [4]:
url_artist = 'https://api.artic.edu/api/v1/artists'

r = requests.get(url_artist)

**Note:** We only retrieve 12 records from a total of > 10000 records.

### Request with parameters

In [15]:
r = requests.get(url_artist, params={'limit': 100})

### A considerable way to retrieve multiple pages of data

1. Check how much data there are.
2. Use a for loop with `time.sleep()`.

**Note:* Even though we can "scrape" through the entire database, we typically should not.  See for example, discussion in [Data dump vs API](https://api.artic.edu/docs/#data-dumps-vs-api).

### Search query

Simply having access or a copy of the data is not inheritly useful.  Most APIs allow for either filtering or searching.

In [50]:
url_artist_search = url_artist + '/search'

## Practice 9 - API usage
Several endpoints for this API include `artists`, `exhibits`, `artworks`, etc.  Refer to https://api.artic.edu/docs/#endpoints.

1. How many artworks are in the Art Institute collection?
2. Find the exhibits that has "Van Gogh" in the title.
3. Search for a painting called "A Sunday on La Grande Jatte".  Who is the artist?  When was it painted?

## Possible available APIs with/without authorizations

- See https://github.com/public-apis/public-apis.
- See https://mixedanalytics.com/blog/list-actually-free-open-no-auth-needed-apis/.

## Example with New York Times API


The New York Times API uses an API key to authenticate users.  The [FAQs](https://developer.nytimes.com/faq) page has some good information about the usage of these APIs.  You may request a NYT API key using the instructions listed in https://developer.nytimes.com/get-started.

### Brief word about the use of an API key
An API key is a unique generated string that 

- identifies a particular application (or project),
- authenticates and grants access to a user,
- provides guardrails to controlling API usage traffic, and
- enables resource management over the flow of data.


### Best practices in using an API key

- **Do not** hardcode your API key into any scripts.
  - For example, save your API key in a local file (or environment variables) and access it from your code.
- **Do not** upload your API key anywhere publicly accessible.  
- Rotate API keys.
- Revoke access after the lifespan of a project.

In [1]:
import requests
import pandas as pd

with open('../data/nyt_api.key', 'r') as f:
    apikey = f.readlines()[0]


In [3]:
# formatted string example
url = 'https://api.nytimes.com/svc/search/v2/articlesearch.json?&api-key={:s}'.format(apikey)
url = f'https://api.nytimes.com/svc/search/v2/articlesearch.json?&api-key={apikey}'	

### Search articles

Example API: https://developer.nytimes.com/docs/articlesearch-product/1/routes/articlesearch.json/get

In [17]:
r = requests.get(url, params={'q':'carlos alvarez',
                              'begin_date': '20140101',
                              'facet': 'true',
                              'facet_fields': 'pub_year'})

In [21]:
r.json()['response']['facets']

{'pub_year': {'terms': [{'term': '2023', 'count': 36},
   {'term': '2024', 'count': 32},
   {'term': '2019', 'count': 25},
   {'term': '2014', 'count': 24},
   {'term': '2016', 'count': 24},
   {'term': '2015', 'count': 23},
   {'term': '2022', 'count': 21},
   {'term': '2021', 'count': 20},
   {'term': '2017', 'count': 16},
   {'term': '2020', 'count': 16}]}}

### How many articles matched the search?

### How many articles did we get?

### How many articles are there for the last 10 years (2014 - 2024)?

### How many articles are there in each of the last 10 years?

### Get the first 50 articles

## Practice 10 - MTFBWY

1. Find out (if you don't already know) the year when the first Star Wars movie came out.
2. Retrieve the number of hits per year since then for the following terms:
   - 'Carrie Fisher'
   - 'Harrison Ford'
   - 'Daisy Ridley'
   - 'Star Wars'
   - Add more if you wish...
3. Store the numbers into a dataframe.
4. Plot the number of hits against year on the same graph.