# API Advanced (oDCM)

*In practice, most APIs require user authentication to get started. Each and every API has its own workflow to generate a kind of "password" that you need to include in your requests. In this tutorial, we'll have a look at how this works for the Spotify API.*

--- 

## Learning Objectives

Students will be able to: 
* Obtain authentication credentials and tokens, check for a valid connection, and renew tokens if expired. 
* Apply a multitude of filters, possibly from multiple endpoints, to narrow down search requests
* Iterate over a variety of API search pages 
* Learn how to read API documentation independently


--- 

## Acknowledgements
This course draws on a variety of online resources that can be retrieved from the [course website](https://odcm.hannesdatta.com/docs/about/).

--- 

## Contact
For technical issues outside of scheduled classes, please check the [support section](https://odcm.hannesdatta.com/docs/course/support) on the course website.

## 1. Authentication

### 1.1 Client Key & Client Secret 
**Importance**  
As you may remember, the `icanhazdadjoke` and `Reddit` APIs can be used right out of the box. They did not require you to create an account, login with your credentials, or provide any information associated with you. In this tutorial, we will request data from the Spotify Web API which takes a little bit more preparation. 

First, you need to [sign up](https://www.spotify.com/us/signup/) for a Spotify user account (Premium or free) if you do not already have one. Second, you log in to the [developer portal](https://developer.spotify.com/dashboard/applications) of Spotify and create a new app (you can give it any name and description you want). Third, you take note of the `Client ID` and `Client Secret` (we'll need those later on!). 


<img src="images/Spotify_credentials.gif" align="left" width=70%/>

**Let's try it out!**  
Follow the steps above and assign the client key and secret you obtained to the variables below.

In [None]:
# your Spotify App credentials
client_id = "60f45fe73bef4bfbb7549dde2b02cab5"
client_secret = "6855d391f816439dbbeb54b997708efe"

Next, in the [API documentation](https://developer.spotify.com/documentation/general/guides/authorization-guide/) authorization guide, we find that the request requires a so-called base 64 encoded string that contains the client id and client secret that follows the format `Authorization: Basic *<base64 encoded client_id:client_secret>*`

This is a more secure way to pass credentials to the API. We start out with an f-string that concatenates the `client_id` and `client_secret` variables. Thereafter, we encode this variable into base 64 using the `b64encode` function of the `base64` module. 

In [407]:
import base64
client_creds = f"{client_id}:{client_secret}"
print(f"f-string: {client_creds}")

client_creds_encoded = base64.b64encode(client_creds.encode())
print(f"Base64 encoded: {client_creds_encoded}")

f-string: 60f45fe73bef4bfbb7549dde2b02cab5:6855d391f816439dbbeb54b997708efe
Base64 encoded: b'NjBmNDVmZTczYmVmNGJmYmI3NTQ5ZGRlMmIwMmNhYjU6Njg1NWQzOTFmODE2NDM5ZGJiZWI1NGI5OTc3MDhlZmU='


You can think of it as codes and ciphers: you only send your base 64 encoded credentials to the API. So, if anyone would intervene and get their hands on the `client_creds_encoded` they still don't know your `client_id` and `client_secret`. On the other hand, the API is able to decode and thereby verify your authentication credentials. 

Finally, we turn the base64 encoded string into the requested format: 

In [439]:
token_headers = {
    "Authorization": f"Basic {client_creds_encoded.decode()}"
}

token_header

{'Authorization': 'Basic NjBmNDVmZTczYmVmNGJmYmI3NTQ5ZGRlMmIwMmNhYjU6Njg1NWQzOTFmODE2NDM5ZGJiZWI1NGI5OTc3MDhlZmU='}

### 1.2 Access Tokens
**Importance**  
When your `client_id` and `client_secret` have been received, you will need to exchange it with an access token. That is, a temporary key associated with your account that expires in 60 minutes (3600 seconds). In practice, this means you need to regenerate your access code once in a while. To obtain an access token you make a POST request to the Spotify Accounts Service with the following endpoint: `https://accounts.spotify.com/api/token` and include two additional parameters `token_data` and `token_headers` (i.e., encoded client key and secret).

In [440]:
token_url = "https://accounts.spotify.com/api/token"

token_data = {
    "grant_type": "client_credentials"
}

r = requests.post(token_url, data=token_data, headers=token_headers)
token_response_data = r.json()

**Let's try it out!**  
Look up the `r.status_code` of your POST request. What does this tell you? Tip: have a look at the [response status codes](https://developer.spotify.com/documentation/web-api/)!

The `r.json()` method returns a dictionary that contains the `access_token` we're after: 

In [441]:
token_response_data

{'access_token': 'BQA3xn9RiYlQMhV69Vg3Slnohfviiut3yED3tIQOofR4h79d_StF_pnY7vB1Ulw10dIuTN8k5rBpIbt06vo',
 'token_type': 'Bearer',
 'expires_in': 3600,
 'scope': ''}

**Let's try it out!**  
Store the access token of the `token_response_data` object into the `access_token` variable below. What happens to the access token once you make another POST request? 

In [412]:
access_token = ####

### 1.3 Endpoints
**Importance**  
Spotify collects large-scale data from multiple entities: artists, albums, playlists, tracks, not to mention all individual user-level data. These collections of data can be accessed through endpoints that prescribe the required parameters and the expected output. Each endpoint consists of a base URL and an endpoint. For example, the base URL for retrieving information about one or more tracks from the Spotify catalog is `https://spotify.com/v1/` and the endpoint `/tracks/{id}`. Taken together, an API request to `https://api.spotify.com/v1/tracks/2EqlS6tkEnglzr7tkKAAYD` returns track-level data (e.g., duration, popularity, artist) of `Come Together - Remastered 2009` by `The Beatles`. 

In a bit, we'll learn you how to obtain this track-level `id`, for now, it's good to know that you can fill out the identifier (`id`) into the search bar of Spotify to get to the song. For example, this is what `spotify:track:2EqlS6tkEnglzr7tkKAAYD` looks like: 

<img src="images/spotify_search.gif" align="left" width=60%/>

**Let's try it out!**
* Find the items associated with each of the following ids: `6oJ6le65B3SEqPwMRNXWjY` (track), `3fMbdgg4jU18AjLCKBhRSm` (artist), and `0IomjU2bXFng4LQBYn7Het` (album). Tip: depending on the type of collection, you may need to swap `track` for the respective collection you want to search for (e.g., `spotify:artist:{id}`). 

* What happens once you paste the API request URL (`https://api.spotify.com/v1/tracks/2EqlS6tkEnglzr7tkKAAYD`) into your browser? Why is that? 

Next, we create a function `renew_access_token()` that returns a `headers` object from a new access token. This way, we never have to worry our access token expired. 

Then, we make a request to the API endpoint associated with the track `Come Together - Remastered 2009`. :

In [445]:
def renew_access_token(token_data=token_data, token_url=token_url, headers=headers): 
    r = requests.post(token_url, data=token_data, headers=token_headers)
    token_response_data = r.json()
    access_token = token_response_data["access_token"]
    headers = {"Authorization": f"Bearer {access_token}"}
    return headers

r = requests.get("https://api.spotify.com/v1/tracks/2EqlS6tkEnglzr7tkKAAYD", headers=renew_access_token())
r.json()

As you can see, it returns a wide variety of information including the album (`Abbey Road (Remastered)`), artist (`The Beatles`), release date (26th of September 1969), total number of tracks on the album (`17`), duration (`259946` ms), popularity (`80`). 


Similarly, you could retrieve data from any of the following endpoints: 

| Endpoint | Usage | Returns | 
| :----- | :---- | :----- | 
| `/albums/{id}` | Get an album | Album name, total tracks, all seperate tracks, release date |
| `/artists/{id}` | Get an artist | Artist, popularity, followers, and primary music genres |
| `/artists/{id}/related-artists` | Get an artist's related artists | A list of artists with a similar music repertoire |
| `/artists/{id}/albums` | Get an artist's albums | A list of albums from a given artist |
| `/audio-features/{id}` | Get audio features for a track | Music characteristics (e.g., `loudness`, `energy`, `speechiness`) |

**Exercise 1**  
You are asked to conduct a market analysis of the listening behavior of The Beatles fans. Using one or more of the APIs above, compile a list of other related artists the fans frequently listen to. Rank the artists in terms of their popularity (see [API documentation](https://developer.spotify.com/documentation/web-api/reference/artists/get-artist/)). How do The Beatles rank overall? Tip: don't mix up the artist, album, and track ids!

In [448]:
# solution
from operator import itemgetter

r = requests.get("https://api.spotify.com/v1/artists/3WrFJ7ztbogyGnTHbHJFl2/related-artists", headers=renew_access_token())
responses = r.json()

artists = {}
for artist in responses["artists"]:
    name = artist['name']
    popularity = artist['popularity']
    artists[name] = popularity
    
sorted(artists.items(), key=itemgetter(1), reverse=True)
# only Elvis Presley, The Rolling Stones, and Paul McCartney are more popular than The Beatles

[('Elvis Presley', 84),
 ('The Rolling Stones', 82),
 ('Paul McCartney', 81),
 ('John Lennon', 80),
 ('The Beach Boys', 78),
 ('Bob Dylan', 77),
 ('Chuck Berry', 76),
 ('Eric Clapton', 76),
 ('Simon & Garfunkel', 75),
 ('Jimi Hendrix', 74),
 ('George Harrison', 71),
 ('The Kinks', 70),
 ('Roy Orbison', 68),
 ('Wings', 67),
 ('The Hollies', 64),
 ('Buddy Holly', 61),
 ('The Byrds', 60),
 ('Donovan', 60),
 ('Badfinger', 57),
 ('Ringo Starr', 55)]

**Exercise 2**   
A good friend of yours, a true Beatles fan for years, has asked you to take care of the music at his birthday party next week. In your search for tracks, you decide to consult the Spotify Web API and select the best dance numbers from the album `Abbey Road (Super Deluxe Edition)` to get the party going. Perform a comprehensive search query and argue which song should not be missed in any case. Give it a listen on Spotify, do you agree? 

In [450]:
# solution
def retrieve_album_ids(artist_id):
    r = requests.get(f"https://api.spotify.com/v1/artists/{artist_id}/albums", headers=renew_access_token())
    albums = r.json()

    albums_dict = {}

    for album in albums["items"]: 
        album_id = album["id"]
        name = album["name"]
        albums_dict[name] = album_id
        
    return albums_dict


def retrieve_song_id_names(album_id):
    r = requests.get(f"https://api.spotify.com/v1/albums/{album_id}", headers=renew_access_token())
    songs_album = r.json()
    song_name_ids = {}

    for song in songs_album["tracks"]["items"]: 
        song_name_ids[song["id"]] = song["name"]
    
    return song_name_ids


def retrieve_audio_features(song_name_ids, feature="danceability"):
    features_songs = {}
    
    for song_id, song_name in song_name_ids.items():
        r = requests.get(f"https://api.spotify.com/v1/audio-features/{song_id}", headers=renew_access_token())
        audio_features = r.json()
        features_songs[song_name] = audio_features[feature]
        
    return features_songs


# retrieve a list of all album ids for the Beatles    
albums_dict = retrieve_album_ids("3WrFJ7ztbogyGnTHbHJFl2")

# obtain song ids for Abbey Road (Super Deluxe Edition) album
song_name_ids = retrieve_song_id_names(albums_dict['Abbey Road (Super Deluxe Edition)'])

# obtain danceability scores for songs
danceability_songs = retrieve_audio_features(song_name_ids)

print(f"The song with the highest danceabilty score: {max(danceability_songs, key=danceability_songs.get)}")

The song with the highest danceabilty score: Maxwell's Silver Hammer - 2019 Mix


### 1.4 Multiple Query Parameters
**Importance**  
By now, you have probably experienced how time-consuming it can to look up the `id` from a human-readable track or album name. Fortunately, there is a more efficient way by using the search endpoint. As we can derive from the [documentation](https://developer.spotify.com/documentation/web-api/reference/search/search/), it requires both a search query (`q`) and an item type (`type`). For example, we can easily obtain the track id of `Come Together - Remastered 2009` as follows (note that spaces are encoded as `+` (or the hex code `%20`), and the `q` and `type`  parameters are separated by a `&` symbol): 

In [436]:
r = requests.get(f"https://api.spotify.com/v1/search?q=Come+Together+-+Remastered+2009&type=track", headers=renew_access_token())
search_request = r.json()
search_request

{'tracks': {'href': 'https://api.spotify.com/v1/search?query=Come+Together+-+Remastered+2009&type=track&offset=0&limit=20',
  'items': [{'album': {'album_type': 'album',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/3WrFJ7ztbogyGnTHbHJFl2'},
       'href': 'https://api.spotify.com/v1/artists/3WrFJ7ztbogyGnTHbHJFl2',
       'id': '3WrFJ7ztbogyGnTHbHJFl2',
       'name': 'The Beatles',
       'type': 'artist',
       'uri': 'spotify:artist:3WrFJ7ztbogyGnTHbHJFl2'}],
     'available_markets': ['AD',
      'AE',
      'AL',
      'AR',
      'AT',
      'AU',
      'BA',
      'BE',
      'BG',
      'BH',
      'BO',
      'BR',
      'BY',
      'CA',
      'CH',
      'CL',
      'CO',
      'CR',
      'CY',
      'CZ',
      'DE',
      'DK',
      'DO',
      'DZ',
      'EC',
      'EE',
      'EG',
      'ES',
      'FI',
      'FR',
      'GB',
      'GR',
      'GT',
      'HK',
      'HN',
      'HR',
      'HU',
      'ID',
      'IE',
      'I

**Let's try it out!**  
How many search results are there? Why is that? What's the difference between these results? 

---
The [documentation](https://developer.spotify.com/documentation/web-api/reference/search/search/) is a great resource to learn more about how to refine your search queries. In the table below, we have summarized these guidelines: 

| Technique | Example | Interpretation | 
| :---- | :------ | :------------ | 
| Quotation | `q='Come+Together+2009'` | Matches with `Come Together (2009)` but not with `Come Together Remastered (2009)` |
| Union | `q=2009+OR+Remastered+2009`| Matches with both `(2009)` and `Remastered (2009)`|
| Exclusion | `q=Come+Together+2009+NOT+Remastered` | Matches with `Come Together (2009)` but not with `Come Together Remastered (2009)` |
| Multiple queries | `q=track:Come+Together+artist:The+Beatles` | Matches with `Come Together` from `The Beatles` |
| Multiple types | `q=Come+Together&type=album,track` | Matches with both `albums` and `tracks` named `Come Together`|
| Genre | `q=Come+Together+genre:rock` | Matches with `rock` tracks named `Come Together` |
| Year | `q=Come+Together+year:2009` | Matches with tracks named `Come Together` from `2009` |












**Exercise 3**  
Suppose that you have set yourself a goal to run half a marathon by the end of this year. Define an appropriate search strategy to find a collection of `workout` tracks aimed at runners that have been released this year. Since you don't want to continuously pick up your phone while running, the `album` should have listed at least 10 tracks. Note that a variety of solutions are possible here.

In [401]:
# solution
r = requests.get(f"https://api.spotify.com/v1/search?q=running+genre:workout+year:2021&type=track", headers=renew_access_token())
workout_tracks = r.json()

workout_albums = []

for workout_track in workout_tracks["tracks"]["items"]:
    if workout_track["album"]["total_tracks"] >= 10: 
        workout_albums.append(workout_track["album"]["name"])
        
print(workout_albums) # can you think of a plausible reason why there are so many duplicates? 

['Happy Running Hits 2021 Workout Session (60 Minutes Non-Stop Mixed Compilation for Fitness & Workout 128 Bpm)', 'Happy Running Hits 2021 Workout Session (60 Minutes Non-Stop Mixed Compilation for Fitness & Workout 128 Bpm)', 'Happy Running Hits 2021 Workout Session (60 Minutes Non-Stop Mixed Compilation for Fitness & Workout 128 Bpm)', 'Happy Running Hits 2021 Workout Session (60 Minutes Non-Stop Mixed Compilation for Fitness & Workout 128 Bpm)', 'Happy Running Hits 2021 Workout Session (60 Minutes Non-Stop Mixed Compilation for Fitness & Workout 128 Bpm)', 'Happy Running Hits 2021 Workout Session (60 Minutes Non-Stop Mixed Compilation for Fitness & Workout 128 Bpm)', 'Happy Running Hits 2021 Workout Session (60 Minutes Non-Stop Mixed Compilation for Fitness & Workout 128 Bpm)', 'Happy Running Hits 2021 Workout Session (60 Minutes Non-Stop Mixed Compilation for Fitness & Workout 128 Bpm)', 'Happy Running Hits 2021 Workout Session (60 Minutes Non-Stop Mixed Compilation for Fitness & W

In [402]:
# to avoid duplicates you may want to change for a different data-structure: a `set()`)
# its most important characteristic is that it only stores unique values
# to add an item to a set you use `.add()` as opposed to `.append()` for lists

workout_albums_set = set()

for workout_track in workout_tracks["tracks"]["items"]:
    if workout_track["album"]["total_tracks"] >= 10: 
        workout_albums_set.add(workout_track["album"]["name"])
        
print(workout_albums_set) 

{'Happy Running Hits 2021 Workout Session (60 Minutes Non-Stop Mixed Compilation for Fitness & Workout 128 Bpm)'}


**Exercise 4**  
After listening to this running playlist for a while, you become more and more selective about the listed tracks. In particular, you find that although the rhythm of the tracks follow your ideal running pace (127-128 bpm), some of them lack a bit of energy. Hence, you decide to create a playlist yourself that only contains tracks with an `energy` level of at least `.8`. Pick one of the playlists from Exercise 3 and curate the selection of tracks that match your criterium. 

In [435]:
# retrieve album id
r = requests.get(f"https://api.spotify.com/v1/search?q=happy+running+hits+2021+workout+session&type=album", headers=renew_access_token())
album = r.json()
album_id = album["albums"]["items"][0]["id"]

# get songs for album id (see Exercise 2)
album_tracks = retrieve_song_id_names(album_id)

# get audio feature for tracks (see Exercise 2)
energy_tracks =  retrieve_audio_features(album_tracks, "energy")
selected_tracks = []
    
# check whether track meet energy criteria
for track, energy in energy_tracks.items():
    if energy > .8: 
        selected_tracks.append(track)

print(selected_tracks)

['Good As Hell - Workout Remix 128 Bpm', 'Everything I Wanted - Workout Remix 128 Bpm', 'Into The Unknown - Workout Remix 128 Bpm', 'Adore You - Workout Remix 128 Bpm', 'Hot Girl Bummer - Workout Remix 128 Bpm', 'South Of The Border - Workout Remix 128 Bpm', 'Memories - Workout Remix 128 Bpm', 'Ritmo (Bad Boys For Life) - Workout Remix 128 Bpm', 'So Am I - Workout Remix 128 Bpm', 'Trampoline - Workout Remix 128 Bpm', 'Roxanne - Workout Remix 128 Bpm', 'Nice To Meet Ya - Workout Remix 128 Bpm', 'Lose Control - Workout Remix 128 Bpm']


### 1.5 Iterate over Pages


**Importance**  
Like the `icanhazdadjoke` API, the Spotify API only returns a subset of all research results at the time. For example, if you make a generic search request such as `q=come+together&type=track` you end up with thousands of results, including: `Come Together - Remastered 2009`, `Come Together - Live From Fox Theatre Detroit, MI/2012`, `Come Together - 2019 Mix`, and many more! 

By default, the Spotify API only returns the first 20 results. You can change this with the `limit` parameter (up to 50 results):

In [471]:
def number_results(limit):
    search_url = "https://api.spotify.com/v1/search?q=come+together&type=track"
    r = requests.get(search_url + f"&limit={limit}", headers=renew_access_token())
    search_results = r.json()
    print(f"Numer of results for &limit={limit}: {len(search_results['tracks']['items'])}")

number_results(20)
number_results(50)

Numer of results for &limit=20: 20
Numer of results for &limit=50: 50


**Lets' try it out!**  
What happens once you run `number_results(100)`? Are the first 20 results identical for `limit=20` and `limit=50`? 

At the very bottom of the search request, you find the following information: 

* `next`: The URL you need to request to get to the new batch of results. Note that it looks very similar only the `offset` value has been changed (i.e., it has been incremented by the value of `limit`)
* `offset`: Think of it as a starting index for the search results. For example, `offset=20` means: show result `20` up to (`20+limit`)...  
* `previous`: Similar as `next` but here the `offset` value has been subtracted. For example, if `offset=20` for the current request, `previous` can be found at `offset=0`. 
* `total`: The total number of search results. Together with the `offset` value you can determine whether you have reached the final result. 

**Let's try it out!**  
Suppose that the search API returns 7094 results and you set `limit` equal to `50`. How many times do you need to make an API call to obtain all results? What's the `offset` value of the last API call in that case? 

Below we give an example on how to implement the `next` url such that it keeps on iterating over the search results until it stored all track names and ids. First, we make our initial request to determine the total number of results (`total_results`). Second, we store all names and ids of the tracks in a list `track_names`. Third, we find the `next_url` and check whether it exists (`None` would indicate this is the last page after all!). Fourth, we repeat until the number of items in `track_names` equals the total number of results. In other words, we stored all records!

In [535]:
def search_results(search_query):
    r = requests.get(search_query, headers=renew_access_token())
    return r.json()
    
track_names = []
results = search_results("https://api.spotify.com/v1/search?q=track:come+together+year:2020&type=track")
total_results = results['tracks']['total']

while len(track_names) < total_results: 
    track_names.extend([[track["name"], track["id"]] for track in results["tracks"]["items"]])
    next_url = results['tracks']['next']
    if next_url != None: 
        results = search_results(next_url)
        
print(track_names)

[['Come Together', '7DpfOkks38EfsrVcG9Zmhw'], ['Come Together', '2Vf7umz71NibHBgzU3sQav'], ['Come Together', '7n8sDrEcuMt0yezLDhIbnN'], ['Come Together', '170DYhXuUVDyuEZsLb0MBB'], ['Come Together - Mixed', '6xWDBHCxuP7OhCNF2sylKu'], ['Come Together', '75Y9iaqeq3y9cP4ecwnkqY'], ['Come Together', '7GA49BEANCELzwyBxQVxU1'], ['Come Together', '3tui2rMOT8HYr05PRK4S77'], ['We the People (Come Together)', '1iKAD3PTIsjfcw2AinyKVp'], ['Come Together - Extended Mix', '10TCB5AtmzLirlAHM0PzVi'], ['Come Together', '2OVNBbPqoktC11yqbCDgV3'], ['Come Together', '2PPzcXr4zU2XkXRquUdceG'], ['Come Together - Live / Ultimate Mix', '6K6QJTaOBZ9BhbavY9AzB0'], ['Rise/Come Together - Live', '6lv06xUOGsvdrY3CwrydCV'], ['Come Together', '0aITsSU1pXt3Tt3noutwzM'], ["Let's Come Together", '3bmyokSvoVZbWggzSWtnWD'], ['Come Together - Kevin McKay, Fhaken & Yo Land ViP Edit', '3w2EFCQEPlqelhP9RGOHt9'], ['Come Together', '5oOpqalnYqsoJfboSumBT4'], ['Come Get Me', '1AmYc2VeJgVgEQC5aJGTN1'], ['Come Together', '1flV5Tm

**Exercise 5**  
Suppose that you'd listen to all tracks in `track_names` in one go. How long would it take you? Your code should still work if new tracks were added along the way.

In [541]:
# solution
# since our program should be future proof we cannot simply pass all track ids to `/tracks/id` 
# rather, we modify the code snippet above and store the `duration_ms` for each track in a list

track_duration = []
results = search_results("https://api.spotify.com/v1/search?q=track:come+together+year:2020&type=track")
total_results = results['tracks']['total']

while len(track_duration) < total_results: 
    track_duration.extend(track["duration_ms"] for track in results["tracks"]["items"])
    next_url = results['tracks']['next']
    if next_url != None: 
        results = search_results(next_url)

print(f"The total duration is: {round(sum(track_duration)/1000/60/60,1)} hours")

The total duration is: 43.77 hours


### 1.6 Wrap-Up

Good job - you've made it! We hope working with various endpoints from the Spotify API have given you the confidence to explore other [endpoints](https://developer.spotify.com/documentation/web-api/reference/) on your own and - perhaps - have even sparked you interest in analyzing the online streaming market. As a suggestion, you want to look into which tracks are listed on Spotify [playlists](https://developer.spotify.com/documentation/web-api/reference/playlists/), and which playlists are in turn [featured](https://developer.spotify.com/documentation/web-api/reference/browse/) on Spotify. If you're interested in the relevance of playlist curation, have a look at [this](https://www.youtube.com/watch?v=EbmCVRkmCAc) weblecture I recorded for the Universiteit van Nederland (in Dutch).

