# ETL through Spotify public API

# Assignment 1

This notebook contains a set of exercises that will guide you through the different steps of this assignment. Solutions need to be code-based, i.e. hard-coded or manually computed results will not be accepted. Remember to write your solutions to each exercise in the dedicated cells and to not modify the test cells. When you are done completing all the exercises submit this same notebook back to moodle in .ipynb format.

<div class="alert alert-success">The aim of this assignment is to create and save a dataset containing information about every song in a given playlist by requesting data from Spotify's API. You will then use this dataset during the Artifical Intelligence I course to train a predictive model.</div>

<div class="alert alert-danger"><b>Submission deadline:</b> Sunday, October 3rd, 23:55</div>

***Collaborated with Carla Sureda, Lee Yen Tang Cheng, Matilda Minarelli***

## Getting client credentials

Spotify's API uses OAuth as an Authentication scheme. Hence, before starting to make requests, you need to get your client credentials to the Spotify API. 

To do so, you need to have a Spotify account (free or paid). If you don't have one yet, please create a free account before moving on. Once you do, head over to Spotify for Developers, open your [Dashboard](https://developer.spotify.com/dashboard/) and log in with your account. 

![Dashboard](https://www.dropbox.com/s/cpfepk5fbq6ic5a/dashboard.png?raw=1)

Click on “CREATE AN APP”, choose a name and description for your project and work your way through the checkboxes. 

<img src="https://www.dropbox.com/s/afubgs4ar99uh80/app.png?raw=1" width="300">

Don't worry about the actual name and description. The only thing we are interested in is getting the credentials.

![Credentials](https://www.dropbox.com/s/3mmxxeet61nha4l/credentials.png?raw=1)

Once your App has been created, you should see a “Client ID” and “Client Secret” on the left-hand side. These numbers correspond to your client credentials.

<div class="alert alert-info">Create two new variables, <b>client_id</b> and <b>client_secret</b>, that store your ID and Key, respectively</div>

In [1]:
# YOUR CODE HERE
client_id = '262a7bd7ced54a3f8c3897e598e214ef'
client_secret = '360701d35c8e4e6c87cef887a59e3ce1'

Great! We are good to go. Next step is getting an access token.

## Getting an access token

In order to access the various endpoints of the Spotify API, we need to pass an access token. 

To get one, we need to pass a ```POST``` request with our client credentials. This request will create a token resource in the server and respond back with it. We can build this ```POST``` request using ```requests``` library. remember that this library contains all the different methods available when interacting with an API. 

<div class="alert alert-info">Run the following cell to built your POST request</div>

In [2]:
import requests

# URL for token resource
auth_url = 'https://accounts.spotify.com/api/token'

# request body
params = {'grant_type': 'client_credentials',
          'client_id': client_id,
          'client_secret': client_secret}

# POST the request
auth_response = requests.post(auth_url, params).json()

<div class="alert alert-info">Retrieve your token from <b>auth_response</b> and save it in a new variable called <b>access_token</b>.</div>

In [3]:
# YOUR CODE HERE
access_token = auth_response['access_token']
#print(access_token)

This token is your golden ticket to access Spotify's API. A copy of this string is now stored in the server, so that everytime you make a request to the API the server will check that the token you provide and the one it has in store match.

<img src="https://www.dropbox.com/s/hgb02k4h1mtdv22/header.png?raw=1" width="500">

As opposed to NASA's API, where we provided our API Key as part of the request body, Spotify's API expects you to include your access token in the requests header. There is a specific header called 'Authorization' for this purpose. Providing this information is sometimes tricky. Hence, the header has already been formatted for you. 

<div class="alert alert-info">Run the following cell to save the header in a new variable so that you can use it later on.</div>

In [4]:
headers = {'Authorization': 'Bearer {token}'.format(token=access_token)}

## Poking around

Spotify's API provides numerous endpoints to access things like album listings, artist information, playlists, even Spotify-generated audio analysis of individual tracks, which include their time signature or measurements such as their “danceability” or "loudness". You can take a look at all the information available by reading the [Docs](https://developer.spotify.com/documentation/web-api/reference/). In this assignment you will use several of these endpoints.

In order to get a feel of how the API works, we will begin by making a ```GET``` request to the ```audio-features``` endpoint to extract data for a specific track. In particular, let's retrieve all the information for Radiohead's *Creep* song. 

The first thing you need is to identify the appropriate URL or path to direct your request to. The urls for all Spotify API endpoints follow the same structure. They all use the base URL for the API and are then defined as a concatenation of ```base_url + endpoint```. Sometimes, you will also need to provide some additional information as part of the URL. In the case of ```audio-features```, however, it is enough with just the ```base_url``` and the ```endpoint``` name.

The ```base_url``` is defined below:

In [5]:
base_url = 'https://api.spotify.com/v1/'

<div class="alert alert-info">Define the url for the audio-features endpoint by following the instructions above. Store it in a variable called <b>audio_features_endpoint</b>.</div>

In [6]:
# YOUR CODE HERE
endpoint = 'audio-features'
audio_features_endpoint = base_url + endpoint
#audio_features_endpoint

Next thing we need is to fill in the request body. If you check the documentation you'll see that the ```audio-features``` endpoint takes the following query parameters.

<img src="https://www.dropbox.com/s/s4zs6wlue0u16cu/body.png?raw=1" width="500">

Hence, the final thing you need to extract data about Radiohead's Creep song is to locate its ```id```. This is its unique identifier. Spotify has unique ids for tracks, for artists, for albums, for playlists, etc.

![Creep](https://www.dropbox.com/s/kufj6ww2yn069gb/creep.png?raw=1)

You can get the ```id``` for any song by going to Spotify, looking for the song, clicking the “…” by the song name, then “Share” and then “Copy Spotify URI”. 

<div class="alert alert-warning">Note that this procedure also works for retrieving ids for artists, albums or any other resource type.</div>

This URI should be a string that includes something like **spotify:track:**, followed by an alphanumeric sequence. This sequence is the ID you are looking for.

<div class="alert alert-info">Create a new variable called <b>track_id</b> that stores the ID for Radiohead's song Creep.</div>

In [7]:
# YOUR CODE HERE
track_id = '6b2oQwSGFkzsMtQruIWm2p'

Now that we have the id, let's format the body of our request. As we did for NASA's API, we'll provide the body in dictionary form using a variable called *params*. Remember that the keys of this dictionary should correspond to the different query parameters defined in the documentation.

<div class="alert alert-info">Create a dictionary called <b>params</b> that stores the body of your request. Make sure you format it in the right way.</div>

In [8]:
# YOUR CODE HERE
params = {'ids' : track_id}

Now that everything is ready, you can run the actual GET request to retrieve the data.

<div class="alert alert-info">Write the code to make your get request using the requests library. When doing so, remember to pass the <i>url</i>, the <i>headers</i> and the <i>params</i> dictionary as arguments to the <i>get</i> function. Convert the response to <i>json</i> format and store it in a new variable called <b>creep</b>.</div>

<div class="alert alert-warning">If you leave your notebook open for too long, your token might expire. When this happens, you will get an error {'error': {'status': 401, 'message': 'The access token expired'}} when making your request to the server. No worries. Just renew your token by executing the corresponding cell again and you should be good to go</div>

In [9]:
# YOUR CODE HERE
import requests
creep = requests.get(audio_features_endpoint, headers = headers, params = params).json()
#creep

You can run the following cell to check if you obtained the right answer. If you get no error when running the cell it means that you did right. Otherwise, revise your code to ensure you get no error. You can run this cell as many times as you want, just **remember not to modify it**.

<div class="alert alert-danger"> Make sure you pass this test successfully before moving on to the remaining exercises.</div>

In [10]:
import unittest

tc = unittest.TestCase()
tc.assertEqual(creep, {'audio_features': [{'danceability': 0.515,
                                         'energy': 0.43,
                                         'key': 7,
                                         'loudness': -9.935,
                                         'mode': 1,
                                         'speechiness': 0.0369,
                                         'acousticness': 0.0102,
                                         'instrumentalness': 0.000141,
                                         'liveness': 0.129,
                                         'valence': 0.104,
                                         'tempo': 91.841,
                                         'type': 'audio_features',
                                         'id': '6b2oQwSGFkzsMtQruIWm2p',
                                         'uri': 'spotify:track:6b2oQwSGFkzsMtQruIWm2p',
                                         'track_href': 'https://api.spotify.com/v1/tracks/6b2oQwSGFkzsMtQruIWm2p',
                                         'analysis_url': 'https://api.spotify.com/v1/audio-analysis/6b2oQwSGFkzsMtQruIWm2p',
                                         'duration_ms': 238640,
                                         'time_signature': 4}]})

Congrats! You just made your first successful request to Spotify's API! 

Feel free to take a look at the information included in the response. Pay special attention to the way in which information is presented. Once you are done, let's move on to some actual work!

## Getting data from a playlist

In the following exercise you will build a dataset containing data about different songs. You can either use a playlist of your own, or use the one we have created for this purpose. You can find our playlist in the following [link](https://open.spotify.com/playlist/4NVeFUEHBybfh3ITNG1b8n?si=js9BKt5aTOiCWMm_Cx4Vvg). If you prefer not to use ours, feel free to complete the exercise with a playlist of your choosing. 

<div class="alert alert-warning">If you use a playlist of your own, make sure it contains at leats 15 different songs. Preferably from different artists.</div>

<div class="alert alert-info"><b>Exercise 1 </b>Create a variable called <b>playlist_id</b> that stores the id of your playlist of choice.
    <br><i>[0.5 points]</i></div>

In [11]:
playlist_id = '4NVeFUEHBybfh3ITNG1b8n'

The following cells contain the tests that will grade your code. **Don't modify them**. Simply leave them as they are.

In [12]:
# LEAVE BLANK

In [13]:
# LEAVE BLANK

Next step will be making a request to get full details of the tracks included in your chosen playlist. Remember that you can take a look at all the information available at the different endpoints in Spotify's API by reading the [Docs](https://developer.spotify.com/documentation/web-api/reference/). Locate the right endpoint for your query and read the Docs to find out how to build your request. 

<div class="alert alert-info"><b>Exercise 2 </b>Write the code to retrieve all the items from your chosen playlist. When making your request don't use any of the optional arguments. Store the raw response in a new variable called <i>playlist_response</i>.
<br><i>[0.75 points]</i></div>

<div class="alert alert-warning">To complete this exercise you must use the requests library</div>

In [14]:
# YOUR CODE HERE
import requests
url_playlist = 'https://api.spotify.com/v1/playlists/4NVeFUEHBybfh3ITNG1b8n/tracks'

params = {'playlist_id' : playlist_id}

playlist_response = requests.get(url_playlist, params = params, headers = headers)

The following cells contain the tests that will grade your code. **Don't modify them**. Simply leave them as they are.

In [15]:
# LEAVE BLANK

In [16]:
# LEAVE BLANK

In [17]:
# LEAVE BLANK

<div class="alert alert-info"><b>Exercise 3 </b>Convert the response to JSON format and update the variable <i>playlist</i>.<br><i>[0.25 points]</i></div>

In [18]:
# YOUR CODE HERE
playlist = playlist_response.json()
playlist

{'href': 'https://api.spotify.com/v1/playlists/4NVeFUEHBybfh3ITNG1b8n/tracks?offset=0&limit=100',
 'items': [{'added_at': '2020-10-11T08:39:57Z',
   'added_by': {'external_urls': {'spotify': 'https://open.spotify.com/user/niakha'},
    'href': 'https://api.spotify.com/v1/users/niakha',
    'id': 'niakha',
    'type': 'user',
    'uri': 'spotify:user:niakha'},
   'is_local': False,
   'primary_color': None,
   'track': {'album': {'album_type': 'album',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/4Z8W4fKeB5YxbusRsdQVPb'},
       'href': 'https://api.spotify.com/v1/artists/4Z8W4fKeB5YxbusRsdQVPb',
       'id': '4Z8W4fKeB5YxbusRsdQVPb',
       'name': 'Radiohead',
       'type': 'artist',
       'uri': 'spotify:artist:4Z8W4fKeB5YxbusRsdQVPb'}],
     'available_markets': [],
     'external_urls': {'spotify': 'https://open.spotify.com/album/6400dnyeDyD2mIFHfkwHXN'},
     'href': 'https://api.spotify.com/v1/albums/6400dnyeDyD2mIFHfkwHXN',
     'id': '6400dn

The following cells contain the tests that will grade your code. **Don't modify them**. Simply leave them as they are.

In [19]:
# LEAVE BLANK

Take your time to familiarize yourself with the data and how they are presented. Note that, by default, Spotify's API only returns information about a maximum of 100 tracks in a playlist. If your playlist of choice has more that 100 tracks, you'll retrieve the data only for the first 100 of them.

<div class="alert alert-danger">Throghout the following exercises you may come across data that are missing. If so, encode these data using a <b>None</b>. </div>

## Retrieving basic track information

In what follows, you are going to retrieve specific data for each of the tracks contained in your playlist.

<div class="alert alert-info"><b>Exercise 4 </b>Write the code to extract the title (in string form), the name of the album (in string form), the name of the artist (in string form), the duration (in integer form), the track number (in integer form), the release date (in string form), the popularity (in integer form), the id (in string form) and the number of available markets (in integer form) for all the tracks included in your chosen playlist. Store these data in separate lists called <i>title</i>, <i>album</i>, <i>artist</i>, <i>duration</i>, <i>track_number</i>, <i>release_date</i>, <i>track_popularity</i>, <i>track_id</i> and <i>n_available_markets</i>, respectively. In those cases where more than one value is available for these items, retain only the first. In all cases, entries should appear in the same order as they are presented in the playlist.<br><i>[1.5 points]</i></div>

<div class="alert alert-warning">Make sure you correctly set all variable names. They have to be written <b>exactly</b> as given in the instructions.</div>

In [20]:
# YOUR CODE HERE
title = [] #string
album = [] #string
artist = [] #string
duration = [] #int
track_number = [] #int
release_date = [] #string
track_popularity = [] #int
track_id = [] #string
n_available_markets = [] #int

for element in playlist['items']:
    for key in element['track']:
        if key == 'duration_ms':
            duration.append(element['track'][key])
        elif key == 'id':
            track_id.append(element['track'][key])
        elif key == 'name':
            title.append(element['track'][key])
        elif key == 'popularity':
            track_popularity.append(element['track'][key])
        elif key == 'track_number':
            track_number.append(element['track'][key])
        elif key == 'album':
            album.append(element['track'][key]['name'])
            release_date.append(element['track'][key]['release_date'])
        elif key == 'artists':
            artist.append(element['track'][key][0]['name'])
        elif key == 'available_markets':
            n_available_markets.append(len(element['track'][key]))

#turn 0 in to None
for index, item in enumerate(n_available_markets):
    if item == 0:
        n_available_markets[index] = None


#check data types
#type(title[1]),type(album[1]),type(artist[1]),type(duration[1]),type(track_number[1]),type(release_date[1]),type(track_popularity[1]),type(track_id[1]),type(n_available_markets[1])

The following cells check whether your code is correct. Please **don't write any code here**. Just leave them as they are.

In [21]:
# LEAVE BLANK

In [22]:
# LEAVE BLANK

In [23]:
# LEAVE BLANK

In [24]:
# LEAVE BLANK

In [25]:
# LEAVE BLANK

In [26]:
# LEAVE BLANK

In [27]:
# LEAVE BLANK

In [28]:
# LEAVE BLANK

In [29]:
# LEAVE BLANK

In [30]:
# LEAVE BLANK

In [31]:
# LEAVE BLANK

In [32]:
# LEAVE BLANK

In [33]:
# LEAVE BLANK

In [34]:
# LEAVE BLANK

In [35]:
# LEAVE BLANK

In [36]:
# LEAVE BLANK

In [37]:
# LEAVE BLANK

In [38]:
# LEAVE BLANK

In [39]:
# LEAVE BLANK

In [40]:
# LEAVE BLANK

In [41]:
# LEAVE BLANK

In [42]:
# LEAVE BLANK

In [43]:
# LEAVE BLANK

In [44]:
# LEAVE BLANK

In [45]:
# LEAVE BLANK

In [46]:
# LEAVE BLANK

In [47]:
# LEAVE BLANK

In [48]:
# LEAVE BLANK

In [49]:
# LEAVE BLANK

In [50]:
# LEAVE BLANK

In [51]:
# LEAVE BLANK

In [52]:
# LEAVE BLANK

In [53]:
# LEAVE BLANK

In [54]:
# LEAVE BLANK

In [55]:
# LEAVE BLANK

In [56]:
# LEAVE BLANK

## Retrieving additional track information

In what follows, you are going to retrieve additional data for each of the tracks contained in your playlist.

<div class="alert alert-info"><b>Exercise 5 </b>Write the code to extract data about the danceability, energy, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence and tempo for all the tracks included in your chosen playlist. Store these data in separate lists called <i>danceability</i> (in float form), <i>energy</i>, <i>loudness</i> (in float form), <i>mode</i> (in int form), <i>speechiness</i> (in float form), <i>acousticness</i> (in float form), <i>instrulmentalness</i> (in int and float form), <i>liveness</i> (in float form), <i>valence</i> (in float form) and <i>tempo</i> (in float form), respectively. In those cases where more than one value is available for these items, retain only the first. In all cases, entries should appear in the same order as they are presented in the playlist.<br><i>[2 points]</i></div>

In [57]:
# YOUR CODE HERE
track_ids_str = [] #string with all track ids separated by commas

#join all track ids into a single string, separating each element with a comma
track_ids_str = ','.join(track_id)

#use list track_ids_str as input for params
url_playlist_5 = 'https://api.spotify.com/v1/audio-features'

params = {'ids' : track_ids_str}

response_5 = requests.get(url_playlist_5, params = params, headers = headers).json()
#check length of response
#len(response_5['audio_features'])
#response_5

In [58]:
iteration_list = response_5['audio_features']

In [59]:
danceability = [] #float
energy = [] #float
loudness = [] #float
mode = [] #int
speechiness = [] #float form)
acousticness = [] #float form)
instrumentalness = [] #int and float form
liveness = [] #float
valence = [] #float
tempo = [] #float

for item in iteration_list: 
    #extract danceability
    danceability.append(item['danceability'])
    
    #extract energy
    energy.append(item['energy'])
    
    #extract loudness
    loudness.append(item['loudness'])
    
    #extract mode
    mode.append(item['mode'])
    
    #extract speechiness
    speechiness.append(item['speechiness'])
    
    #extract acousticness
    acousticness.append(item['acousticness'])
    
    #extract instrumentalness 
    instrumentalness.append(item['instrumentalness'])
    
    #extract liveliness
    liveness.append(item['liveness'])
    
    #extract valence
    valence.append(item['valence'])
    
    #extract tempo
    tempo.append(item['tempo'])

#print(danceability, '\n', energy, '\n', loudness, '\n',  mode, '\n',  speechiness, '\n',  acousticness, '\n',  instrumentalness, '\n',  liveness,  '\n', valence,  '\n', tempo)

The following cells check whether your code is correct. Please **don't write any code here**. Just leave them as they are.

In [60]:
# LEAVE BLANK

In [61]:
# LEAVE BLANK

In [62]:
# LEAVE BLANK

In [63]:
# LEAVE BLANK

In [64]:
# LEAVE BLANK

In [65]:
# LEAVE BLANK

In [66]:
# LEAVE BLANK

In [67]:
# LEAVE BLANK

In [68]:
# LEAVE BLANK

In [69]:
# LEAVE BLANK

In [70]:
# LEAVE BLANK

In [71]:
# LEAVE BLANK

In [72]:
# LEAVE BLANK

In [73]:
# LEAVE BLANK

In [74]:
# LEAVE BLANK

In [75]:
# LEAVE BLANK

In [76]:
# LEAVE BLANK

In [77]:
# LEAVE BLANK

In [78]:
# LEAVE BLANK

In [79]:
# LEAVE BLANK

In [80]:
# LEAVE BLANK

In [81]:
# LEAVE BLANK

In [82]:
# LEAVE BLANK

In [83]:
# LEAVE BLANK

In [84]:
# LEAVE BLANK

In [85]:
# LEAVE BLANK

In [86]:
# LEAVE BLANK

In [87]:
# LEAVE BLANK

In [88]:
# LEAVE BLANK

In [89]:
# LEAVE BLANK

In [90]:
# LEAVE BLANK

In [91]:
# LEAVE BLANK

In [92]:
# LEAVE BLANK

In [93]:
# LEAVE BLANK

In [94]:
# LEAVE BLANK

In [95]:
# LEAVE BLANK

In [96]:
# LEAVE BLANK

In [97]:
# LEAVE BLANK

In [98]:
# LEAVE BLANK

In [99]:
# LEAVE BLANK

## Retrieving artist information

In what follows, you are going to retrieve data about the artists for each of the tracks contained in your playlist.

<div class="alert alert-info"><b>Exercise 6 </b>Write the code to extract data about the total number of followers (in int form), the first listed genre (in string form) and the popularity (in int form) for the artists of all the tracks included in your chosen playlist. Store these data in separate lists called <i>artist_followers</i>, <i>genre</i> and <i>artist_popularity</i>. In cases where no genre is given, fill in the corresponding entry using a None.<br><i>[1.5 points]</i></div>

In [100]:
# YOUR CODE HERE
#find the artist IDs for the relevant artists
artist_ids = []

relevant_list = playlist['items']
for each_element in relevant_list:
    artist_ids.append(each_element['track']['artists'][0]['id'])

In [101]:
artist_ids_1 = []
artist_ids_2 = []

#divide artist_id into two lists of 50 elements each
half_length = int(len(artist_ids)/2)

artist_ids_1 = artist_ids[:half_length]
artist_ids_2 = artist_ids[half_length:]

#unify all ids into 2 strings, separating the ids with commas
artist_ids_1_str = ','.join(artist_ids_1)
artist_ids_2_str = ','.join(artist_ids_2)

In [102]:
#retrieve first 50 artists
url_artists = 'https://api.spotify.com/v1/artists'

params = {'ids' : artist_ids_1_str}

first_50 = requests.get(url_artists, params = params, headers = headers).json()

In [103]:
#retrieve second 50 artists
params = {'ids' : artist_ids_2_str}

second_50 = requests.get(url_artists, params = params, headers = headers).json()
second_50

{'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/2BpAc5eK7Rz5GAwSp9UYXa'},
   'followers': {'href': None, 'total': 598359},
   'genres': ['indie folk',
    'stomp and holler',
    'swedish americana',
    'swedish singer-songwriter'],
   'href': 'https://api.spotify.com/v1/artists/2BpAc5eK7Rz5GAwSp9UYXa',
   'id': '2BpAc5eK7Rz5GAwSp9UYXa',
   'images': [{'height': 640,
     'url': 'https://i.scdn.co/image/ab6761610000e5ebe790a3bccaa22e827475113b',
     'width': 640},
    {'height': 320,
     'url': 'https://i.scdn.co/image/ab67616100005174e790a3bccaa22e827475113b',
     'width': 320},
    {'height': 160,
     'url': 'https://i.scdn.co/image/ab6761610000f178e790a3bccaa22e827475113b',
     'width': 160}],
   'name': 'The Tallest Man On Earth',
   'popularity': 57,
   'type': 'artist',
   'uri': 'spotify:artist:2BpAc5eK7Rz5GAwSp9UYXa'},
  {'external_urls': {'spotify': 'https://open.spotify.com/artist/5INjqkS1o8h1imAzPqGZBb'},
   'followers': {'href': None, 'tota

In [104]:
#put dictionaries back together
first_50_artists = first_50['artists']
second_50_artists = second_50['artists']

all_100_artists = first_50_artists + second_50_artists
all_100_artists
#len(all_100_artists)

[{'external_urls': {'spotify': 'https://open.spotify.com/artist/4Z8W4fKeB5YxbusRsdQVPb'},
  'followers': {'href': None, 'total': 6445637},
  'genres': ['alternative rock',
   'art rock',
   'melancholia',
   'oxford indie',
   'permanent wave',
   'rock'],
  'href': 'https://api.spotify.com/v1/artists/4Z8W4fKeB5YxbusRsdQVPb',
  'id': '4Z8W4fKeB5YxbusRsdQVPb',
  'images': [{'height': 640,
    'url': 'https://i.scdn.co/image/ab6761610000e5eba03696716c9ee605006047fd',
    'width': 640},
   {'height': 320,
    'url': 'https://i.scdn.co/image/ab67616100005174a03696716c9ee605006047fd',
    'width': 320},
   {'height': 160,
    'url': 'https://i.scdn.co/image/ab6761610000f178a03696716c9ee605006047fd',
    'width': 160}],
  'name': 'Radiohead',
  'popularity': 79,
  'type': 'artist',
  'uri': 'spotify:artist:4Z8W4fKeB5YxbusRsdQVPb'},
 {'external_urls': {'spotify': 'https://open.spotify.com/artist/4CvTDPKA6W06DRfBnZKrau'},
  'followers': {'href': None, 'total': 745663},
  'genres': ['art pop',


In [105]:
artist_followers = [] #int 
genre = [] #string
artist_popularity = [] #int

length = int(len(all_100_artists))

for x in range(length):
    #retrieve artists' followers
    artist_followers.append(all_100_artists[x]['followers']['total'])
    
    #retrieve artists' popularity
    artist_popularity.append(all_100_artists[x]['popularity'])
    
    #check whether genre is provided, if not append None
    if all_100_artists[x]['genres'][0] == '':
        genre.append(None)
    else:
        genre.append(all_100_artists[x]['genres'][0])

    
#len(artist_followers), len(genre), len(artist_popularity)
#print(artist_followers, '\n', genre,'\n' ,artist_popularity)

[6445637, 745663, 549069, 6445637, 6445637, 1473368, 1473368, 1752572, 1752572, 737470, 326116, 312308, 562053, 562053, 31951841, 31951841, 32104887, 6630543, 10814125, 10814125, 10814125, 5192909, 5192909, 823890, 7181550, 4434389, 4434389, 4434389, 6445637, 3239460, 968760, 968760, 1282079, 1282079, 199045, 199045, 598359, 354654, 354654, 5150777, 2669291, 212252, 2669291, 893990, 779005, 893990, 473470, 1983582, 1341600, 1066353, 598359, 5150777, 191498, 15472234, 5079462, 1737458, 10814125, 3061747, 3061747, 3061747, 3061747, 893990, 950407, 1137207, 1137207, 1148321, 1281041, 5288343, 1907737, 1907737, 1907737, 1285944, 1750771, 965946, 1388411, 879303, 1956205, 1956205, 14113105, 3267426, 2290600, 271019, 12698277, 1585235, 1585235, 1504269, 1085285, 1515590, 1515590, 271019, 6651161, 7426092, 338556, 57884, 341970, 141322, 977066, 68129, 2221788, 154547] 
 ['alternative rock', 'art pop', 'alternative dance', 'alternative rock', 'alternative rock', 'electronica', 'electronica', '

The following cells check whether your code is correct. Please **don't write any code here**. Just leave them as they are.

In [106]:
# LEAVE BLANK

In [107]:
# LEAVE BLANK

In [108]:
# LEAVE BLANK

In [109]:
# LEAVE BLANK

In [110]:
# LEAVE BLANK

In [111]:
# LEAVE BLANK

In [112]:
# LEAVE BLANK

In [113]:
# LEAVE BLANK

In [114]:
# LEAVE BLANK

In [115]:
# LEAVE BLANK

In [116]:
# LEAVE BLANK

In [117]:
# LEAVE BLANK

In addition to the above, there's also additional information we can retrieve about each artist. For this purpose, let's first retrieve the list of distinct artists in our playlist.

<div class="alert alert-info"><b>Exercise 7 </b>Write the code to identify the list of unique artist ids that correspond to the different tracks in your chosen playlist. Store these data in a list called <i>unique_artist_id</i>.<br><i>[1 points]</i></div>

In [118]:
# YOUR CODE HERE
#turn artist_ids into a set and then again into a list to remove all duplicates
unique_artist_id = list(set(artist_ids))

#test
#print(unique_artist_id)
#len(unique_artist_id)

The following cells check whether your code is correct. Please **don't write any code here**. Just leave them as they are.

In [119]:
# LEAVE BLANK

In [120]:
# LEAVE BLANK

In [121]:
# LEAVE BLANK

In [122]:
# LEAVE BLANK

We are now interested in retrieving catalog information about each artist’s top tracks. This information is provided by Spotify's API on a country basis. Here, we will retrieve the information corresponding to Spain, whose *ISO 3166-1 alpha-2* code is **ES**. The information we are looking for is store dun der to ```top-tracks``` endpoint for ```artists```. Requests to this location retrieve the 10 most famous tracks for a given artist id.

<div class="alert alert-info"><b>Exercise 8 </b>Write the code to retrieve the 10 top tracks for each of the unique artists in your chosen playlist. Store these data in a dictionary called <i>top_tracks</i>. The keys of this dictionary should correspond to the unique artist ids stored in the list <i>unique_artist_id</i>. The values of this dictionary should include the names of the 10 most popular songs in a list.<br><i>[1 points]</i></div>

In [123]:
# YOUR CODE HERE
url_temp = 'https://api.spotify.com/v1/artists/' # add {id}
endpoint_top_tracks = '/top-tracks'

response_all_artists = {}

for each_id in unique_artist_id:
    url_top_tracks = url_temp + each_id + endpoint_top_tracks
    params = {'id' : each_id, 'market' : 'ES'}
    response_all_artists[each_id] = requests.get(url_top_tracks, params = params, headers = headers).json()
    
#response_all_artists

In [124]:
top_tracks = {}
all_top_tracks = []
all_tracks = []

for key, value in response_all_artists.items():
    all_tracks.append(value['tracks'])

for each_id in range(len(all_tracks)):
    for song in all_tracks[each_id]:
        all_top_tracks.append(song['name'])
          
i = 0
for element in unique_artist_id:
    top_tracks[element] = all_top_tracks[i:i+10]
    i +=10

The following cells check whether your code is correct. Please **don't write any code here**. Just leave them as they are.

In [125]:
# LEAVE BLANK

In [126]:
# LEAVE BLANK

In [127]:
# LEAVE BLANK

We can now use this information to identify those songs in your chosen playlist that correspond to each artist's top tracks.

<div class="alert alert-info"><b>Exercise 9 </b>Write the code to learn whether the different tracks in your chosen playlist are included among the corresponding artist's top tracks. Store the results in a list called <i>is_top</i>. This list should include the entry 'yes' whenever the considered track is among the top tracks for that artist and not otherwise. You code should only look for exact matches.<br><i>[1 points]</i> </div>

In [128]:
# YOUR CODE HERE
#is_top = []
song_list = []
is_top = []

for each_id, top_10_songs in top_tracks.items():
    for each_song in top_10_songs:
        #put all songs in a list, regardless of the artist
        song_list.append(each_song)

#check whether each song in title is present in song_list        
for song in title:
    if song in song_list:
        is_top.append('yes')
    else: 
        is_top.append('no')

The following cells check whether your code is correct. Please **don't write any code here**. Just leave them as they are.

In [129]:
# LEAVE BLANK

In [130]:
# LEAVE BLANK

In [131]:
# LEAVE BLANK

In [132]:
# LEAVE BLANK

## Saving the data

The final step is to save the data you have collected to a csv file in order to be able to load it into your BigML account. You can do so by first saving the data to a pandas DataFrame and then exporting it to a csv file of your choosing, in the same directoy where your notebook is located.

Run the following cell to save your DataFrame to a .csv file called 'spotify.csv'.

In what follows, you are going to store all the data you just retrieved in a convenient form.

<div class="alert alert-info"><b>Exercise 10 </b>Write the code to save the data about the title, the name of the album, the name of the artist, the duration, the track number, the release date, the popularity, the id, the number of available markets, the danceability, the energy, the loudness, the mode, the speechiness, the acousticness, the instrumentalness, the liveness, the valence and the tempo for all the tracks included in your chosen playlist, as well as the data about the total number of followers, the first listed genre, the popularity for the artists and whether the tracks are include in the top 10 in a dataframe called <i>df</i> and save this DataFrame to a .csv file called 'spotify.csv'. When creating the dataframe make sure the column names are <b>exactly</b> the same as those of the lists you ccreated in previous exercises to store the different values.<br><i>[0.5 points]</i></div>

In [133]:
# YOUR CODE HERE
import pandas as pd

dict_data = {'title':title, 'album':album, 'artist':artist, 'duration':duration, 'track_number':track_number, 
             'release_date':release_date, 'track_popularity':track_popularity, 'track_id':track_id, 'n_available_markets':n_available_markets, 
             'danceability':danceability, 'energy':energy, 'loudness':loudness, 'mode':mode, 'speechiness':speechiness, 
             'acousticness':acousticness, 'instrumentalness':instrumentalness, 'liveness':liveness, 'valence':valence, 
             'tempo':tempo, 'artist_followers':artist_followers, 'genre':genre, 'artist_popularity':artist_popularity, 'is_top':is_top}

#create pandas dataframe with all necessary variables
df = pd.DataFrame(dict_data)

#store dataframe in a .csv file (spotify.csv)
df.to_csv('spotify.csv')

The following cells check whether your code is correct. Please **don't write any code here**. Just leave them as they are.

In [134]:
# LEAVE BLANK

In [135]:
# LEAVE BLANK

In [136]:
# LEAVE BLANK

## Bonus exercise

In what follows, you are going to extract certain information from the dataframe you created.

<div class="alert alert-danger"><b>Bonus 1 </b>Write the code to find the most popular song in your playlist. If several songs have the same popularity, choose the one for which the artist has the most followers. Save the name of the song to a new variable called <i>most_popular</i>. This variable should <b>only</b> contain the name of the most popular song in string form.<br><i>[1 points]</div>

In [137]:
# YOUR CODE HERE
most_popular = ''
most_followers = 0

#find max value of popularity column
max_value = df['track_popularity'].max()
#select rows in which track_popularity equals the max_value found
most_pop_rows = df.loc[df['track_popularity'] == max_value]

#check whether most_pop_rows has more than 1 row
if len(most_pop_rows.index) == 1:
    #store in most_popular the title of the song in the only row 
    most_popular = most_pop_rows.iloc[0]['title']
elif len(most_pop_rows.index) > 1:
    #if most_pop_rows has more than 1 row, find the max value of artist_followers
    most_followers = most_pop_rows['artist_followers'].max()
    #retrieve the row of the song of the artist with the highest number of followers
    most_followers_row = most_pop_rows.loc[most_pop_rows['artist_followers'] == most_followers]
    #store in most_popular the title of the song of the most popular artist
    most_popular = most_followers_row.iloc[0]['title']
    
#print(most_popular)

The following cells check whether your code is correct. Please **don't write any code here**. Just leave them as they are.

In [138]:
# LEAVE BLANK