# Lab 1 - Data Collection - Querying Web APIs

In [None]:
# General dependencies
import sys
import json
from IPython.core.display import HTML
from IPython.display import display, IFrame
from collections import defaultdict
%matplotlib inline

## A real life example: Spotify API

In this first part, we'll be looking at *The Beatles* and query the [Spotify API](https://developer.spotify.com/web-api/) for various details about this band.

The Spotify ID of *The Beatles* is:

```
spotify:artist:3WrFJ7ztbogyGnTHbHJFl2
```

As you can read on the [Artist's API documentation](https://developer.spotify.com/web-api/get-artist/), the following URL allows to retrieve some information about the artist:

```
https://api.spotify.com/v1/artists/3WrFJ7ztbogyGnTHbHJFl2
```

If we hit this URL directly in a browser https://api.spotify.com/v1/artists/3WrFJ7ztbogyGnTHbHJFl2, we are given back a set of information about the artist.

```
{
    external_urls: {
        spotify: "https://open.spotify.com/artist/3WrFJ7ztbogyGnTHbHJFl2"
    },
    followers: {
        href: null,
        total: 2271088
    },
    genres: [ ],
    href: "https://api.spotify.com/v1/artists/3WrFJ7ztbogyGnTHbHJFl2",
    id: "3WrFJ7ztbogyGnTHbHJFl2",
    images: [
        {
            height: 1000,
            url: "https://i.scdn.co/image/934c57df9fbdbbaa5e93b55994a4cb9571fd2085",
            width: 1000
        },
        {
            height: 640,
            url: "https://i.scdn.co/image/5f70d98d3e4616a02a3afe2aa9a840b9157b92a1",
            width: 640
        },
        {
            height: 200,
            url: "https://i.scdn.co/image/7fe1a693adc52e274962f1c61d76ca9ccc62c191",
            width: 200
        },
        {
            height: 64,
            url: "https://i.scdn.co/image/857b1ce5b1b372b873b0a8bdb3ff8023b6c61d39",
            width: 64
        }
    ],
    name: "The Beatles",
    popularity: 85,
    type: "artist",
    uri: "spotify:artist:3WrFJ7ztbogyGnTHbHJFl2"
}
```

This is a [JSON](https://en.wikipedia.org/wiki/JSON) object, which structure is documented at https://developer.spotify.com/web-api/object-model/#artist-object-full

The list of API endpoints is available at https://developer.spotify.com/web-api/endpoint-reference/.

### Interacting with APIs in Python
In order to programmatically query the Spotify API, we'll be using the `requests` module that easily allow to interact with HTTP services.

You can consult the  `requests`'s [quickstart](http://docs.python-requests.org/en/master/user/quickstart/#quickstart) documentation for some more details.

The following statement allows us to import the module:

In [None]:
import requests

### First requests

Once the module is imported, we'll first fetch details about the band:

```
r = requests.get('https://api.spotify.com/v1/artists/3WrFJ7ztbogyGnTHbHJFl2')
```
Some more details about this request:
* We are performing a `GET` request (`.get(...)`), asking the server to retrieve us some information
* `https://api.spotify.com` is the URL of the Spotify API
* `/v1/artists` is the API endpoint to retrieve information about artists
* `3WrFJ7ztbogyGnTHbHJFl2` is the Spotify artists ID for *The Beatles*

This endpoint is fully documented on Spotify at https://developer.spotify.com/web-api/get-artist/

In [None]:
r = requests.get('https://api.spotify.com/v1/artists/3WrFJ7ztbogyGnTHbHJFl2')

The response returned is an object of type `requests.models.Response`

In [None]:
type(r)

The documentation can be fetched at http://docs.python-requests.org/en/latest/api/#requests.Response or also inline in this notebook:

In [None]:
help(r)

### HTTP Status & Errors
As you may see, one of the attribute is `status_code` and can be used to test if the query was succesfull:

In [None]:
print('Status code:', r.status_code)

You can consult the list of HTTP status code at http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for more details.

Alternatively, a Python exception can be raised if the status code is not a succesfull one:

In [None]:
r.raise_for_status() # does nothing since 200 is an OK

For an unknown artist, you should get a 404 error, which is the standar HTTP error code for a resource that can not be found by the server. The content of the response contains a more detailed message about the error.

In [None]:
r1 = requests.get('https://api.spotify.com/v1/artists/XXrFJ7ztbogyGnTHbHJFXX')
print('Status code:', r1.status_code)
print('Error content:\n', r1.text)
r1.raise_for_status()

### Request content
Let's come back to the original reply of `r` which contains the result of the API call.

The raw response can be retrieved with the `text` attribute:

In [None]:
print(r.text)

This is a JSON object which can be transformed into a native Python dictionary with the `json()` method:

In [None]:
data = r.json()

The `data` object contains all the properties and is a standard python dictionary:

In [None]:
print('Artist ID:', data['id'])
print('Artist ID:', data['uri'])
print('Artist name:', data['name'])

print('Artist properties:')
for key in data:
    print('\tkey=[%s] value=[%s]' % (key, data[key]))

### Some Jupyter goodies
We can also embed HTML code in this notebook.

In [None]:
display(HTML("""<img src="%s"></img>""" % data['images'][-1]['url']))

or an iframe:

In [None]:
# Display the result of an API call in an iframe
IFrame("https://api.spotify.com/v1/search?q=artist:beatles&type=track&market=FR&limit=50",
       width=600, height=200)

The iframe will be usefull to grab a preview of a song and play it in notebook.

In [None]:
IFrame("https://p.scdn.co/mp3-preview/f7913ebb647d47835c34fa4db7e889c8a87c6d10",
       width=300, height=50)

## Exercices

### Exercice 1

`568ZhdwyaiCyOGJRtNYhWf` is the Spotify Artist ID for *Deep Purple*

- Create a function to retrieve and print the following information from Spotify:
 - Name of the artist
 - Popularity
 - Number of followers
 - All images of width lower than 200
- Apply this function to *Deep Purple* and *The Beatles*
- Test your code with other artists and invalid IDs (randomly change characters or change the length of the string). Which HTTP error codes do you have to deal with?

If needed, consult https://developer.spotify.com/web-api/get-artist/ to get the properties return in the JSON object

In [None]:
# Your code / answers here

### Exercice 2

iTunes has also an API documented at https://affiliate.itunes.apple.com/resources/documentation/itunes-store-web-service-search-api

`135532` is the iTunes artist ID of *Deep Purple*

* Query the iTunes Search API to retrieve details about Deep Purple
* Print all fields present in the reply


In [None]:
# Your code / answers here

### Exercice 3

Given a Spotify artist ID, we can retrieve the list of albums available for this artist.

This new API endpoint is documented at: https://developer.spotify.com/web-api/get-artists-albums/

For *The Beatles*:
* retrieve the list of all albums available in France
* store it into a python list for later use
* print the first 10 albums ordered by release date (ascending)
* print the name of the albums alongside the smallest image associated

*Note: The endpoint implements *paging*, pay attention to the `offset` and `limit` parameters, you may need to query the API many time to fetch all albums.*

In [None]:
# your code / answers here

### Exercice 4

Using the Spotify [Search API](https://developer.spotify.com/web-api/search-item/):

* Retrieve all tracks having the following properties:
 * *market*: France
 * *artist name*: contains the string "*marley*"
* Print how many tracks were retrieved
* Find the 10 most popular tracks and print
 * The artist's name and his Spotify ID
 * The name of the track
 * The popularity of the track
 * An image (if one is associated with the track)
* Identify and print all the distinct artists for the retrieved tracks with their respective count of tracks, ordered by count of tracks

In [None]:
# your code / answers here