<h1 style="text-align: center;"><strong>SPOTIFY API REQUEST PROJECT</strong></h1>

<p>This notebook serves as a practical way to understand and utilize the Spotify API easily. By following the steps, readers will learn how to authenticate their requests, make API calls, and collect raw data and later transform, all using Python.</p>
<p>In the upcoming sections, the process will be broken down into simple, actionable steps. By the end, readers will gain a comprehensive understanding of Spotify&rsquo;s API and the confidence to explore other APIs to create diverse and powerful applications. Let&rsquo;s dive right in!</p>

<p><strong>Pre-requisites from the API</strong></p>
<ul>
<li>You have read the <a href="https://developer.spotify.com/documentation/web-api/concepts/authorization">authorization guide</a>.</li>
<li>You have created an app following the <a href="https://developer.spotify.com/documentation/web-api/concepts/apps">app guide</a>.</li>
</ul>

# IMPORTS

In [2]:
import requests
import json
import base64
import os
import sqlite3
import datetime

import pandas as pd
import numpy  as np

from sqlalchemy      import create_engine
from IPython.display import display, HTML

## HELPER FUNCTION

In [3]:
# increase the notebook cell size (optional)
display(HTML("<style>.container { width:75% !important; }</style>"))

## ENVIROMENT VARIABLES

<p>For safety and privacy reasons, it is adviced to store your keys as enviroment variables so that when you publish your notebook it doesn't share sensitive data.
</p>
<p><strong>Remember both keys can be accessed in the Spotify App dashboard if you followed the <a href="https://developer.spotify.com/documentation/web-api/concepts/apps">app guide</a></strong></p>

In [4]:
# loading our enviroment variables
client_id = os.environ.get('SPOTIFY_CLIENT_ID')
client_secret = os.environ.get('SPOTIFY_CLIENT_SECRET')

In [5]:
# # load your client keys
# client_id = <your-client-id-key>
# client_secret = <your-client-secret-key>

# REQUESTING ACCESS TOKEN FOR API

In [6]:
# check if our request meet the api criteria
r = requests.post('https://accounts.spotify.com/api/token')

# Response 400 -> Bad Request - The request could not be understood by the server due to malformed syntax
r

<Response [400]>

<p>If the response received is 400, it indicates that the request does not meet the required parameters to access the API. In such cases, it's essential to refer to the documentation and provide the correct set of information. Specifically for this project, consult the <a href="https://developer.spotify.com/documentation/web-api/tutorials/client-credentials-flow">Client Credentials Flow</a> documentation for the necessary details.</p>
<p>To request authorization, follow these steps:</p>
<ul>
<li>Use the POST method with the base URL '<a href="https://accounts.spotify.com/" target="_new">https://accounts.spotify.com/</a>' and endpoint '/api/token'.</li>
<li>Set the grant_type parameter to 'client_credentials'.</li>
</ul>
<p>For the header parameters, ensure the following:</p>
<ul>
<li>Authorization should be in the format: Basic &lt;base64 encoded client_id:client_secret&gt;.</li>
<li>Set Content-type to 'application/x-www-form-urlencoded'.</li>
</ul>

In [7]:
# set the base url with the correct endpoint
url = 'https://accounts.spotify.com/api/token'

In [8]:
# utilize the correct parameters
data = {'grant_type': 'client_credentials'}

In [9]:
# our client_id and client_secret keys encoded to base 64 as the API documentation request
base64_auth = base64.b64encode(f"{client_id}:{client_secret}".encode()).decode()

In [10]:
# provide the correct headers following the documentation
headers = {'Authorization': 'Basic ' + base64_auth,
           'Content-Type' : 'application/x-www-form-urlencoded'}

In [11]:
# check if the API responds with the code 200 
r = requests.post(url=url, headers=headers, data=data)

r # OK - The request has succeeded

<Response [200]>

In [12]:
# verify the response of the API
#r.json()
r.json().keys()

dict_keys(['access_token', 'token_type', 'expires_in'])

## FUNCTION TO RETRIEVE THE ACCESS TOKEN

<p>Having successfully tested the Spotify API accessibility with the provided keys, its easier to access using a function. Based on the example present in the <a href="https://developer.spotify.com/documentation/web-api/tutorials/client-credentials-flow">documentation</a> that is built with javascript, develops a version in Python that will include a clear description of its use and responses for both successful and failed attempts.</p>

In [13]:
def get_token(client_id, client_secret):
    """
    Get Spotify API access token using client credentials and check its token type.

    Args:
        client_id (str): Spotify API client ID.
        client_secret (str): Spotify API client secret.

    Returns:
        str: Access code obtained from Spotify API.
        str: Token type of the access code (or None if failed to retrieve).
        str: Token availability time in seconds.
    """
    base64_auth = base64.b64encode(f"{client_id}:{client_secret}".encode()).decode()

    auth_options = {
        'url': 'https://accounts.spotify.com/api/token',
        'headers': {
            'Authorization': 'Basic ' + base64_auth, 
            'Content-Type' : 'application/x-www-form-urlencoded'
        },
        'data': {
            'grant_type': 'client_credentials'
        }
    }

    response = requests.post(auth_options['url'], headers=auth_options['headers'], data=auth_options['data'])

    if response.status_code == 200:
        r = response.json()
        token = r['access_token']
        token_type = r['token_type']
        token_duration = r['expires_in']
        print(f'Access Token requested successfully!')
        print(f'Token Type: {token_type}')
        print(f'Token duration: {token_duration} seconds')
    else:
        print('Failed to retrieve access token')
    
    return f'{token_type} {token}'

In [117]:
# test the function
access_token = get_token(client_id, client_secret)

Access Token requested successfully!
Token Type: Bearer
Token duration: 3600 seconds


# RETRIEVING DATA FROM THE API

<p>To initiate our analysis, let's start by selecting three specific genres: Blues, Rock, and Soul. Before proceeding, it's necessary to consult the documentation to identify the correct endpoints for the data we intend to access. For this project, our focus is on genres and their respective tracks.</p>
<p>First, refer to the <a href="https://developer.spotify.com/documentation/web-api/reference/get-recommendation-genres">Genres Reference</a> section in the documentation to confirm the availability of the chosen genres: Blues, Rock, and Soul. Once verified, access the specified URL to retrieve the seed data for these genres.</p>

## GENRES AVAILABLE

In [55]:
# url with the available genres
url = 'https://api.spotify.com/v1/recommendations/available-genre-seeds'

In [56]:
# if the request is correct it should return a code 200
response = requests.get(url, headers={'Authorization': access_token})
response

<Response [200]>

In [57]:
# check the response text or json
response.text

'{\n  "genres" : [ "acoustic", "afrobeat", "alt-rock", "alternative", "ambient", "anime", "black-metal", "bluegrass", "blues", "bossanova", "brazil", "breakbeat", "british", "cantopop", "chicago-house", "children", "chill", "classical", "club", "comedy", "country", "dance", "dancehall", "death-metal", "deep-house", "detroit-techno", "disco", "disney", "drum-and-bass", "dub", "dubstep", "edm", "electro", "electronic", "emo", "folk", "forro", "french", "funk", "garage", "german", "gospel", "goth", "grindcore", "groove", "grunge", "guitar", "happy", "hard-rock", "hardcore", "hardstyle", "heavy-metal", "hip-hop", "holidays", "honky-tonk", "house", "idm", "indian", "indie", "indie-pop", "industrial", "iranian", "j-dance", "j-idol", "j-pop", "j-rock", "jazz", "k-pop", "kids", "latin", "latino", "malay", "mandopop", "metal", "metal-misc", "metalcore", "minimal-techno", "movies", "mpb", "new-age", "new-release", "opera", "pagode", "party", "philippines-opm", "piano", "pop", "pop-film", "post-d

In [58]:
# check the response keys
response.json().keys()

dict_keys(['genres'])

**Funny to see that Brazil and Disney have their own genre**

In [59]:
# create a list with all the genres
spotify_genres = response.json()['genres']
len(spotify_genres)

126

In [60]:
# the example genres for this project in lowercase
example_genres = ['blues', 'rock', 'soul']

selected_genres = []

# checking if our genres are available in the genres list
for genre in example_genres: 
    if genre in spotify_genres: 
        selected_genres.append(genre)
        
selected_genres

['blues', 'rock', 'soul']

## TRACK DATA

<p>Once the genres on the list are confirmed, let's explore the data from each track. In this case, the better choice is the <a href="https://developer.spotify.com/documentation/web-api/reference/search">Search for Item Reference</a>. This method allows filtering by genre and enables requests for information related to albums, artists, playlists, tracks, shows, episodes, or audiobooks.</p>
<p>While it's possible to use the <a href="https://developer.spotify.com/documentation/web-api/reference/get-several-tracks">Get Several Tracks</a> option from the API, it requires providing track IDs, with a maximum of 50 IDs per batch. However, this isn't possible at the moment as we don't yet have access to the track IDs.</p>

<p>In the&nbsp;<a href="https://developer.spotify.com/documentation/web-api/reference/search">Search for Item Reference</a> we can observe some of the request parameters, here is a summary of each:</p>
<ul>
<li><strong> q (string) [Required]</strong>: <br />Your search query. Can be narrowed down using field filters, including album, artist, track, year, upc, tag:hipster, tag:new, isrc, and genre. Each filter applies to specific result types.<br />- Artist and year filters apply to albums, artists, and tracks.<br />- Album filter applies to albums and tracks.<br />- Genre filter applies to artists and tracks.<br />- ISRC and track filters apply to tracks.<br />- UPC, tag:new, and tag:hipster filters are for albums. tag:new returns albums from the past two weeks, and tag:hipster filters the lowest 10% popularity albums.<br />Example value: "remaster%20track:Doxy%20artist:Miles%20Davis"</li>
<li><strong> type (array of strings) [Required]</strong>: <br />Comma-separated list of item types to search across. Results include hits from all specified item types: "album", "artist", "playlist", "track", "show", "episode", "audiobook".<br />Example: type=album,track returns both albums and tracks matching the query.</li>
<li><strong> market (string)</strong>: <br />ISO 3166-1 alpha-2 country code. Specifies the market for content availability. If a user access token is provided, the user's associated country takes priority. If neither market nor user country are provided, content is considered unavailable for the client.<br />Example value: "ES"</li>
<li><strong> limit (integer)</strong>: <br />Maximum number of results to return for each item type.<br />Example value: 10<br />Default value: 20<br />Range: 0 - 50</li>
<li><strong> offset (integer)</strong>: <br />Index of the first result to return. Used with limit to paginate through search results.<br />Example value: 5<br />Default value: 0<br />Range: 0 - 1000</li>
<li><strong> include_external (string)</strong>: <br />If include_external=audio is specified, it indicates the client can play externally hosted audio content. By default, externally hosted audio content is marked as unplayable in the response. Allowed value: "audio".</li>
</ul>

Examples of endpoints with the parameters above:



<p>For this project are selected the following parameters:</p>
<ul>
<li><strong>q</strong> = genre</li>
<li><strong>type</strong> = track</li>
<li><strong>market</strong> = BR</li>
<li><strong>limit</strong> = 20</li>
<li><strong>offset</strong> = 0</li>
</ul>

In [118]:
# defining our endpoint based on the documentation
url = 'https://api.spotify.com/v1/search?q=genre:rock&type=track&market=BR&limit=20&offset=0'

In [115]:
url = 'https://api.spotify.com/v1/search?q=genre:rock&type=track&market=BR&limit=20&offset=0'

In [119]:
# if the request is correct it should return a code 200
response = requests.get(url, headers={'Authorization': access_token})
response

<Response [200]>

In [120]:
# check the data in the full response
response.json()

{'tracks': {'href': 'https://api.spotify.com/v1/search?query=genre%3Arock&type=track&market=BR&offset=0&limit=20',
  'items': [{'album': {'album_type': 'album',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/7Ln80lUS6He07XvHI8qqHH'},
       'href': 'https://api.spotify.com/v1/artists/7Ln80lUS6He07XvHI8qqHH',
       'id': '7Ln80lUS6He07XvHI8qqHH',
       'name': 'Arctic Monkeys',
       'type': 'artist',
       'uri': 'spotify:artist:7Ln80lUS6He07XvHI8qqHH'}],
     'external_urls': {'spotify': 'https://open.spotify.com/album/78bpIziExqiI9qztvNFlQu'},
     'href': 'https://api.spotify.com/v1/albums/78bpIziExqiI9qztvNFlQu',
     'id': '78bpIziExqiI9qztvNFlQu',
     'images': [{'height': 640,
       'url': 'https://i.scdn.co/image/ab67616d0000b2734ae1c4c5c45aabe565499163',
       'width': 640},
      {'height': 300,
       'url': 'https://i.scdn.co/image/ab67616d00001e024ae1c4c5c45aabe565499163',
       'width': 300},
      {'height': 64,
       'url': 'htt

In [121]:
# the keys we can access from the response
response.json().keys()

dict_keys(['tracks'])

In [122]:
# check the data in the full response with the tracks key
response.json()['tracks']

{'href': 'https://api.spotify.com/v1/search?query=genre%3Arock&type=track&market=BR&offset=0&limit=20',
 'items': [{'album': {'album_type': 'album',
    'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/7Ln80lUS6He07XvHI8qqHH'},
      'href': 'https://api.spotify.com/v1/artists/7Ln80lUS6He07XvHI8qqHH',
      'id': '7Ln80lUS6He07XvHI8qqHH',
      'name': 'Arctic Monkeys',
      'type': 'artist',
      'uri': 'spotify:artist:7Ln80lUS6He07XvHI8qqHH'}],
    'external_urls': {'spotify': 'https://open.spotify.com/album/78bpIziExqiI9qztvNFlQu'},
    'href': 'https://api.spotify.com/v1/albums/78bpIziExqiI9qztvNFlQu',
    'id': '78bpIziExqiI9qztvNFlQu',
    'images': [{'height': 640,
      'url': 'https://i.scdn.co/image/ab67616d0000b2734ae1c4c5c45aabe565499163',
      'width': 640},
     {'height': 300,
      'url': 'https://i.scdn.co/image/ab67616d00001e024ae1c4c5c45aabe565499163',
      'width': 300},
     {'height': 64,
      'url': 'https://i.scdn.co/image/ab67616d

In [123]:
# the keys we can access from the response tracks
response.json()['tracks'].keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

<p>Take a look inside the 'tracks' key, and you'll find several other keys, each with different data values. Checking the <a href="https://developer.spotify.com/documentation/web-api/reference/search">documentation</a> and the response, it's easy to see that the 'items' key holds the track-related data and which data will be selected.</p>

In [127]:
api_response.json()['tracks']['items'][0].keys()

dict_keys(['album', 'artists', 'disc_number', 'duration_ms', 'explicit', 'external_ids', 'external_urls', 'href', 'id', 'is_local', 'is_playable', 'name', 'popularity', 'preview_url', 'track_number', 'type', 'uri'])

In [67]:
# first item in the items key
response.json()['tracks']['items'][0]

{'album': {'album_type': 'album',
  'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/7Ln80lUS6He07XvHI8qqHH'},
    'href': 'https://api.spotify.com/v1/artists/7Ln80lUS6He07XvHI8qqHH',
    'id': '7Ln80lUS6He07XvHI8qqHH',
    'name': 'Arctic Monkeys',
    'type': 'artist',
    'uri': 'spotify:artist:7Ln80lUS6He07XvHI8qqHH'}],
  'external_urls': {'spotify': 'https://open.spotify.com/album/78bpIziExqiI9qztvNFlQu'},
  'href': 'https://api.spotify.com/v1/albums/78bpIziExqiI9qztvNFlQu',
  'id': '78bpIziExqiI9qztvNFlQu',
  'images': [{'height': 640,
    'url': 'https://i.scdn.co/image/ab67616d0000b2734ae1c4c5c45aabe565499163',
    'width': 640},
   {'height': 300,
    'url': 'https://i.scdn.co/image/ab67616d00001e024ae1c4c5c45aabe565499163',
    'width': 300},
   {'height': 64,
    'url': 'https://i.scdn.co/image/ab67616d000048514ae1c4c5c45aabe565499163',
    'width': 64}],
  'is_playable': True,
  'name': 'AM',
  'release_date': '2013-09-09',
  'release_date_precisio

<p>There is a lot of data that we can extract from here, the ones selected:</p>
<ul>
<li>Track ID</li>
<li>Track Name</li>
<li>Track Duration</li>
<li>Artist Name</li>
<li>Album Name</li>
<li>Album Release Date</li>
<li>Popularity</li>
</ul>
<p>Let's see how we can access each one.</p>

In [68]:
# # creating a variable with the items data
# json_data_items = response.json()['tracks']['items'][0]

In [69]:
# # checking the items keys
# print(json_data_items.keys())

# # creating another variable with items keys
# items_keys = json_data_items.keys()

In [70]:
# track ID from the first item
response.json()['tracks']['items'][0]['id']

'5XeFesFbtLpXzIVDNQP22n'

In [71]:
# track name from the first item
response.json()['tracks']['items'][0]['name']

'I Wanna Be Yours'

In [72]:
# track duration in milliseconds from the first item
response.json()['tracks']['items'][0]['duration_ms']

183956

In [73]:
# artist name from the first item
response.json()['tracks']['items'][0]['artists'][0]['name']

'Arctic Monkeys'

In [74]:
# album name from the first item
response.json()['tracks']['items'][0]['album']['name']

'AM'

In [75]:
# album release date from the first item
response.json()['tracks']['items'][0]['album']['release_date']

'2013-09-09'

In [76]:
# popularity from the first item
response.json()['tracks']['items'][0]['popularity']

95

In [77]:
track_id = response.json()['tracks']['items'][0]['id']

Now let's build a DataFrame to visualize our data

In [78]:
# each feature we selected from the api response will be atribute to a variable
track_id = response.json()['tracks']['items'][0]['id']
track_name = response.json()['tracks']['items'][0]['name']
track_duration_ms = response.json()['tracks']['items'][0]['duration_ms']
artist_name = response.json()['tracks']['items'][0]['artists'][0]['name']
album_name = response.json()['tracks']['items'][0]['album']['name']
album_release_date = response.json()['tracks']['items'][0]['album']['release_date']
popularity = response.json()['tracks']['items'][0]['popularity']
genre = 'rock'  # which we have informed in the url request in the beginning of this section

# create a new DataFrame to visualize the data
track_df = pd.DataFrame(
    {
    'track_id': [track_id],
    'track_name': [track_name],
    'track_duration_ms':[track_duration_ms],
    'artist_name': [artist_name],
    'album_name': [album_name],
    'album_release_date': [album_release_date],
    'popularity': [popularity],
    'genre': [genre]
    }
    )

In [79]:
print(f'There are {track_df.shape[1]} columns in this dataframe')
track_df

There are 8 columns in this dataframe


Unnamed: 0,track_id,track_name,track_duration_ms,artist_name,album_name,album_release_date,popularity,genre
0,5XeFesFbtLpXzIVDNQP22n,I Wanna Be Yours,183956,Arctic Monkeys,AM,2013-09-09,95,rock


## FUNCTION TO COLLECT THE DATA

<p>Now, all that is needed is a funtion to access the API and collect the data. Same as we did before this function will include a clear description of its use and responses.</p>
<p><strong>**For this code I will build it step-by-step just so that I can share how it was approached and my thought process during its construction**</strong></p>

<p><em>FIY: As I've learned the hard way, it's better to check first how the URL is constructed to understand the way the information needs to be provided. This ensures that the parameters are given in the correct order.</em></p>

### STEP 1 - API CALL

<p>Let's transform the code that was built to call the API into a function. Since it will be used with parameters that we determine, it is necessary to generalize the URL. When it's called, the parameters will be provided.</p>

In [80]:
def api_call(url, access_token):
    """
    Calls the Spotify API using an URL endpoint and access token, to retrieve data in the json format.

    Args:
        url (str): URL endpoint to access track data from the Brazilian Market. 
            Default URL provided - https://api.spotify.com/v1/search?q=genre:{genre}&type=track&market=BR&limit={limit}&offset={offset}
        
        access_token (str): Spotify API access token provided with the use of the function get_token.

    Returns:
        dict: JSON object with the API response.
    """
    response = requests.get(url, headers={'Authorization': access_token})
    api_response = response.json()
    
    return api_response

In [81]:
# example
genre = 'soul'
limit = 5
offset = 0

# defining our endpoint based on the documentation
url = f'https://api.spotify.com/v1/search?q=genre:{genre}&type=track&market=BR&limit={limit}&offset={offset}'

# if the request is correct it should print the json
api_call(url, access_token)

{'tracks': {'href': 'https://api.spotify.com/v1/search?query=genre%3Asoul&type=track&market=BR&offset=0&limit=5',
  'items': [{'album': {'album_type': 'album',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/4dpARuHxo51G3z768sgnrY'},
       'href': 'https://api.spotify.com/v1/artists/4dpARuHxo51G3z768sgnrY',
       'id': '4dpARuHxo51G3z768sgnrY',
       'name': 'Adele',
       'type': 'artist',
       'uri': 'spotify:artist:4dpARuHxo51G3z768sgnrY'}],
     'external_urls': {'spotify': 'https://open.spotify.com/album/0Lg1uZvI312TPqxNWShFXL'},
     'href': 'https://api.spotify.com/v1/albums/0Lg1uZvI312TPqxNWShFXL',
     'id': '0Lg1uZvI312TPqxNWShFXL',
     'images': [{'height': 640,
       'url': 'https://i.scdn.co/image/ab67616d0000b2732118bf9b198b05a95ded6300',
       'width': 640},
      {'height': 300,
       'url': 'https://i.scdn.co/image/ab67616d00001e022118bf9b198b05a95ded6300',
       'width': 300},
      {'height': 64,
       'url': 'https://i.scd

In [82]:
# saving the reponse in a variable
api_response = api_call(url, access_token)

In [83]:
# checking the response lenght to match the limit - OK
len(api_response['tracks']['items'])

5

### STEP 2 - ITERATE TRACK NUMBER, GENRE, LIMIT AND OFFSET

<p>The building process has been initiated using the code that was developed for creating the DataFrame. Since the data will be collected from different tracks, it is vital to create a generalized method for our code to iterate.</p>
<p>To achieve this, we have changed [0] observed after ['items'] and ['tracks'] to [i] as it corresponds with the number of each track in the response. A list will determine the genre, referred to as 'genre', and the limit will dictate how many times the code will iterate per page.</p>
<p>Additionally, the response.json() feature has been changed to api_response to match our api_call function return object.</p>

In [84]:
# example
selected_genres = ['blues', 'rock', 'soul']
limit = 5
offset = 0

# defining our endpoint based on the documentation
url = f'https://api.spotify.com/v1/search?q=genre:{genre}&type=track&market=BR&limit={limit}&offset={offset}'

# if the request is correct it should print the json
api_response = api_call(url, access_token)

In [85]:
for genre in selected_genres:

    for i in range(limit):
        track_id = api_response['tracks']['items'][i]['id']
        track_name = api_response['tracks']['items'][i]['name']
        track_duration_ms = api_response['tracks']['items'][i]['duration_ms']
        artist_name = api_response['tracks']['items'][i]['artists'][0]['name']
        album_name = api_response['tracks']['items'][i]['album']['name']
        album_release_date = api_response['tracks']['items'][i]['album']['release_date']
        popularity = api_response['tracks']['items'][i]['popularity']
        genre = genre  # which we have informed in the url request in the beginning of this section
        # create a new DataFrame with the data to append
        track_df = pd.DataFrame([
            {
            'track_id': track_id,
            'track_name': track_name,
            'track_duration_ms':track_duration_ms,
            'artist_name': artist_name,
            'album_name': album_name,
            'album_release_date': album_release_date,
            'popularity': popularity,
            'genre': genre
            }], index=None
            )

In [86]:
track_df

Unnamed: 0,track_id,track_name,track_duration_ms,artist_name,album_name,album_release_date,popularity,genre
0,1zwMYTA5nlNjZxYrvBB2pV,Someone Like You,285240,Adele,21,2011-01-24,81,soul


<p>The code is only returning one row because it's not adding more values to the DataFrame. To fix this, it may be easier to fill an empty list with dict objects and then convert it to a DataFrame. Also, the code is not iterating through the genre list, most likely because the genre parameter of the URL is set outside the loop.</p>

In [87]:
selected_genres = ['blues', 'rock', 'soul']
limit = 5
offset = 0
track_df = []

for genre in selected_genres:
    url = f'https://api.spotify.com/v1/search?q=genre:{genre}&type=track&market=BR&limit={limit}&offset={offset}'
    api_response = api_call(url, access_token)
    
    for i in range(limit):
        track_id = api_response['tracks']['items'][i]['id']
        track_name = api_response['tracks']['items'][i]['name']
        track_duration_ms = api_response['tracks']['items'][i]['duration_ms']
        artist_name = api_response['tracks']['items'][i]['artists'][0]['name']
        album_name = api_response['tracks']['items'][i]['album']['name']
        album_release_date = api_response['tracks']['items'][i]['album']['release_date']
        popularity = api_response['tracks']['items'][i]['popularity']
        genre = genre

        track_df.append(
            {
            'track_id': track_id,
            'track_name': track_name,
            'track_duration_ms':track_duration_ms,
            'artist_name': artist_name,
            'album_name': album_name,
            'album_release_date': album_release_date,
            'popularity': popularity,
            'genre': genre
            }
            )
            
tracks_dataset = pd.DataFrame(track_df)

In [90]:
tracks_dataset

Unnamed: 0,track_id,track_name,track_duration_ms,artist_name,album_name,album_release_date,popularity,genre
0,3dPQuX8Gs42Y7b454ybpMR,Seven Nation Army,232106,The White Stripes,Elephant,2003-04-01,87,blues
1,63OFKbMaZSDZ4wtesuuq6f,Born To Be Wild,210373,Steppenwolf,Steppenwolf,1968,76,blues
2,5G1sTBGbZT5o4PNRc75RKI,Lonely Boy,193653,The Black Keys,El Camino,2011-12-06,80,blues
3,5MAK1nd8R6PWnle1Q1WJvh,I See Red,230613,Everybody Loves an Outlaw,I See Red,2018-10-31,77,blues
4,6jHvX8ZnHKC1PnrPMJ0Emt,Cigarette Daydreams,208760,Cage The Elephant,Melophobia,2013-10-08,77,blues
...,...,...,...,...,...,...,...,...
2995,5ZTZL5UlpF3UZ8H7BhoI9N,Mr Magic (Through The Smoke),236504,Amy Winehouse,Frank,2003-10-20,56,soul
2996,2pMPWE7PJH1PizfgGRMnR9,Bad Religion,175453,Frank Ocean,channel ORANGE,2012-07-10,69,soul
2997,71XhXay6rKPZCVAaDtFlSR,Lost Ones,333906,Ms. Lauryn Hill,The Miseducation of Lauryn Hill,1998-08-25,66,soul
2998,5xRP5iyVdGglqlY4Vcjhkx,Sinnerman,622000,Nina Simone,Pastel Blues,1965-10-01,64,soul


<p>To effectively use the API, it's important to adjust the offset parameter correctly. This parameter works with the value of the limit to move to the next page of data. For each genre, the offset parameter needs to be reset to its original value so that it can be iterated over again. To keep track of this, the pages variable was added. It helps to know how many times the function needs to be looped.</p>
<p><span style="text-decoration: underline;"><em>To get the most out of the API, it's crucial to refer to the documentation. The API allows for a maximum of 50 tracks per page, and the maximum value for the offset is 1000. Therefore, for each genre, we can extract 20 pages, each containing 50 songs (20 x 50 = 1000). The largest dataset we can obtain theoretically, considering our code, will have 126000 tracks (126 genres x 1000 tracks limit).</em></span></p>

In [91]:
selected_genres = ['blues', 'rock', 'soul']
limit = 50
offset = 0
pages = 20
offset_counter = offset
track_df = []

for genre in selected_genres:
    
    for page in range(pages):    
        url = f'https://api.spotify.com/v1/search?q=genre:{genre}&type=track&market=BR&limit={limit}&offset={offset}'
        api_response = api_call(url, access_token)
        
        for i in range(limit):
            track_id = api_response['tracks']['items'][i]['id']
            track_name = api_response['tracks']['items'][i]['name']
            track_duration_ms = api_response['tracks']['items'][i]['duration_ms']
            artist_name = api_response['tracks']['items'][i]['artists'][0]['name']
            album_name = api_response['tracks']['items'][i]['album']['name']
            album_release_date = api_response['tracks']['items'][i]['album']['release_date']
            popularity = api_response['tracks']['items'][i]['popularity']
            genre = genre

            # append data to the empty list
            track_df.append(
                {
                'track_id': track_id,
                'track_name': track_name,
                'track_duration_ms':track_duration_ms,
                'artist_name': artist_name,
                'album_name': album_name,
                'album_release_date': album_release_date,
                'popularity': popularity,
                'genre': genre
                }
                )
        offset += limit
    offset = offset_counter
tracks_dataset = pd.DataFrame(track_df)

In [92]:
print(f'This dataset has {tracks_dataset.shape[0]} rows and {tracks_dataset.shape[1]} columns')
tracks_dataset

This dataset has 3000 rows and 8 columns


Unnamed: 0,track_id,track_name,track_duration_ms,artist_name,album_name,album_release_date,popularity,genre
0,3dPQuX8Gs42Y7b454ybpMR,Seven Nation Army,232106,The White Stripes,Elephant,2003-04-01,87,blues
1,63OFKbMaZSDZ4wtesuuq6f,Born To Be Wild,210373,Steppenwolf,Steppenwolf,1968,76,blues
2,5G1sTBGbZT5o4PNRc75RKI,Lonely Boy,193653,The Black Keys,El Camino,2011-12-06,80,blues
3,5MAK1nd8R6PWnle1Q1WJvh,I See Red,230613,Everybody Loves an Outlaw,I See Red,2018-10-31,77,blues
4,6jHvX8ZnHKC1PnrPMJ0Emt,Cigarette Daydreams,208760,Cage The Elephant,Melophobia,2013-10-08,77,blues
...,...,...,...,...,...,...,...,...
2995,5ZTZL5UlpF3UZ8H7BhoI9N,Mr Magic (Through The Smoke),236504,Amy Winehouse,Frank,2003-10-20,56,soul
2996,2pMPWE7PJH1PizfgGRMnR9,Bad Religion,175453,Frank Ocean,channel ORANGE,2012-07-10,69,soul
2997,71XhXay6rKPZCVAaDtFlSR,Lost Ones,333906,Ms. Lauryn Hill,The Miseducation of Lauryn Hill,1998-08-25,66,soul
2998,5xRP5iyVdGglqlY4Vcjhkx,Sinnerman,622000,Nina Simone,Pastel Blues,1965-10-01,64,soul


### STEP 3 - CONSOLIDATE IN A FUNCTION

With all set, just need to generalize and mark every variable necessary to call our function.

In [93]:
def tracks_dataset(genres, limit, offset, pages, access_token):

    offset_counter = offset
    track_df = []

    for genre in genres: 
        
        for page in range(pages):    
            url = f'https://api.spotify.com/v1/search?q=genre:{genre}&type=track&market=BR&limit={limit}&offset={offset}'
            api_response = api_call(url, access_token)
            
            for i in range(limit):
                track_id = api_response['tracks']['items'][i]['id']
                track_name = api_response['tracks']['items'][i]['name']
                track_duration_ms = api_response['tracks']['items'][i]['duration_ms']
                artist_name = api_response['tracks']['items'][i]['artists'][0]['name']
                album_name = api_response['tracks']['items'][i]['album']['name']
                album_release_date = api_response['tracks']['items'][i]['album']['release_date']
                popularity = api_response['tracks']['items'][i]['popularity']
                genre = genre 

                track_df.append(
                    {
                    'track_id': track_id,
                    'track_name': track_name,
                    'track_duration_ms':track_duration_ms,
                    'artist_name': artist_name,
                    'album_name': album_name,
                    'album_release_date': album_release_date,
                    'popularity': popularity,
                    'genre': genre
                    }
                    )
            offset += limit
        offset = offset_counter
    
    tracks_dataset = pd.DataFrame(track_df, index = None)    
    
    return tracks_dataset

In [94]:
# example with selected genres
genres = ['alt-rock', 'alternative', 'brazil', 'blues', 'electro', 'heavy-metal', 'hip-hop', 'house', 'jazz', 'pop', 'reggae', 'rock', 'soul', 'techno', 'trance'] 

limit = 50
offset = 0
pages = 20
access_token = access_token

In [95]:
tracks_dataset(genres, limit, offset, pages, access_token)

Unnamed: 0,track_id,track_name,track_duration_ms,artist_name,album_name,album_release_date,popularity,genre
0,3nI0piSOxAik2RCpHGloB7,Só os Loucos Sabem,210493,Charlie Brown Jr.,Camisa 10 joga bola até na chuva,2009-09-16,75,alt-rock
1,2QjOHCTQ1Jl3zawyYOpxh6,Sweater Weather,240400,The Neighbourhood,I Love You.,2013-04-19,93,alt-rock
2,70dJEanFPdYuWZumkrnKeX,Ela Vai Voltar (Todos Os Defeitos De Uma Mulhe...,188200,Charlie Brown Jr.,Imunidade Musical,2005-01-01,73,alt-rock
3,1z1EeTXwnz3gvoUvzvkkdw,Céu Azul - Ao Vivo,200071,Charlie Brown Jr.,Música Popular Caiçara: Edição Luxo (Ao Vivo),2012-04-12,72,alt-rock
4,31AOj9sFz2gM0O3hMARRBx,Losing My Religion,268426,R.E.M.,Out Of Time (25th Anniversary Edition),1991-03-12,87,alt-rock
...,...,...,...,...,...,...,...,...
14995,7oM9y6V3ObY10VVauGOLOo,The Business,164000,Tiësto,House Music 2021,2021-10-01,46,trance
14996,6pyDHEzTTBaei42DKc7jOz,Hologram,459130,Blazy,Hologram,2022-11-25,29,trance
14997,1IAtShkPcFnvYQPp6Rfv3Q,Change The Formality,464026,Infected Mushroom,Vicious Delicious,2007-04-01,52,trance
14998,5CHtoMz3q6Zb87Sf3VeOcZ,Surrounded,231724,Azzura,Surrounded,2022-12-23,32,trance


<p>An error occurred during an attempt to request all available data across various genres. After troubleshooting, it was discovered that the index of an item in the API's response is lost when a genre has insufficient tracks for the maximum value requested. To address this problem, a break was implemented in the code when the number of items is less than or equal to zero and is below the limit. This ensures that each batch of data contains 50 tracks. Also, in this last interaction of the function the markets variable was added.</p>
<p><span style="text-decoration: underline;"><em>FYI: If multiple markets are informed, it leads to an increase in the number of API requests. If these requests exceed the time limit of the token availability (1 hour only), the code will break.</em></span></p>

In [96]:
def tracks_dataset(genres, markets, limit, offset, pages, access_token):
    """
    Retrieves track data from the Spotify API based on specified genres, pagination parameters, and access token.

    Args:
        genres (list): A list of genre names for which tracks need to be fetched from the Spotify API.
        markets (list): A list of country codes for available markets on Spotify (value according to the ISO 3166-1 alpha-2 country code).
        limit (int): The maximum number of tracks to retrieve per API request.
        offset (int): The initial offset for pagination to retrieve tracks from the API.
        pages (int): The number of API pages to fetch for each genre.
        access_token (str): The access token required for authentication with the Spotify API.

    Returns:
        pandas.DataFrame: A pandas DataFrame containing the extracted track information, including track ID, 
        track name, duration, artist name, album name, album release date, popularity, and genre.

    Functionality:
    - Iterates through specified genres and pagination pages to fetch track data from the Spotify API.
    - Extracts relevant track information from the API response.
    - Combines the extracted data into a pandas DataFrame and returns the DataFrame to the caller.

    Note:
    - The function makes multiple API calls based on genres and pagination, processing the data into a DataFrame.
    - Uses the provided API endpoint URL format with placeholders for genre, limit, and offset.
    - Assumes the existence of an 'api_call' function for making API requests.
    - Requires proper handling of the access token using the 'get_token' function or a similar mechanism.
    """

    offset_counter = offset
    track_df = []
    
    for market in markets:
        for genre in genres:  
            for page in range(pages):    
                url = f'https://api.spotify.com/v1/search?q=genre:{genre}&type=track&market={market}&limit={limit}&offset={offset}'
                api_response = api_call(url, access_token)
                num_items = len(api_response['tracks']['items'])

                if num_items == 0:
                    break

                for i in range(num_items):
                    track_id = api_response['tracks']['items'][i]['id']
                    track_name = api_response['tracks']['items'][i]['name']
                    track_duration_ms = api_response['tracks']['items'][i]['duration_ms']
                    artist_name = api_response['tracks']['items'][i]['artists'][0]['name']
                    album_name = api_response['tracks']['items'][i]['album']['name']
                    album_release_date = api_response['tracks']['items'][i]['album']['release_date']
                    popularity = api_response['tracks']['items'][i]['popularity']
                    genre = genre
                    market = market

                    track_df.append({
                        'track_id': track_id,
                        'track_name': track_name,
                        'track_duration_ms': track_duration_ms,
                        'artist_name': artist_name,
                        'album_name': album_name,
                        'album_release_date': album_release_date,
                        'popularity': popularity,
                        'genre': genre,
                        'market': market
                    })

                offset += limit

                if num_items < limit:
                    break

            offset = offset_counter

        tracks_dataset = pd.DataFrame(track_df)    
        return tracks_dataset

In [97]:
# example with selected genres
genres = spotify_genres 
markets = ['BR']
limit = 50
offset = 0
pages = 20
access_token = access_token

In [98]:
tracks_df = tracks_dataset(genres, markets, limit, offset, pages, access_token)

In [99]:
tracks_df

Unnamed: 0,track_id,track_name,track_duration_ms,artist_name,album_name,album_release_date,popularity,genre,market
0,38jy6kRlPt8z1GUS9WXeNh,93 Million Miles,216386,Jason Mraz,Love Is a Four Letter Word,2012-04-13,69,acoustic,BR
1,3S0OXQeoh0w6AY8WQVckRW,I'm Yours,242946,Jason Mraz,We Sing. We Dance. We Steal Things. (Bonus Tra...,2008-05-01,79,acoustic,BR
2,5vjLSffimiIP26QG5WcN2K,Hold On,198853,Chord Overstreet,Hold On,2017-02-03,83,acoustic,BR
3,1EzrEOXmMH3G43AXT1y7pA,I'm Yours,242946,Jason Mraz,We Sing. We Dance. We Steal Things.,2008-05-12,81,acoustic,BR
4,2qLMf6TuEC3ruGJg4SMMN6,Lucky,189613,Jason Mraz,We Sing. We Dance. We Steal Things. (Bonus Tra...,2008-05-01,73,acoustic,BR
...,...,...,...,...,...,...,...,...,...
112995,6opp1KuZgwChOSKRX0e8KV,Escolho A Ti,200336,Planetshakers,Escolho A Ti,2019-10-11,18,world-music,BR
112996,3fdj319kbvmq4kRDGFdkAb,Holy Ground / Spontaneous - Live in Paris,746282,Jeremy Riddle,Holy Ground (Live Around the World),2020-03-13,37,world-music,BR
112997,2u8pYdBBiiQoBQFZ7fyx6X,Nobody But You,258393,New Life Worship,Nobody But You,2023-06-16,42,world-music,BR
112998,1XldqbigKtheiXxqCakzje,My Testimony (Live),291320,Elevation Worship,My Testimony (Live),2020-04-03,44,world-music,BR


## SAVE TO CSV OR SQLITE

The created dataframe can be exported to a CSV file for easier loading, or used to analyze and store data in a SQLite file.

### CSV

In [100]:
# not always simple as that
tracks_df.to_csv('spotify_dataset.csv')

### SQLITE

In [101]:
# check the data types
tracks_df.dtypes

track_id              object
track_name            object
track_duration_ms      int64
artist_name           object
album_name            object
album_release_date    object
popularity             int64
genre                 object
market                object
dtype: object

Most of the data is correct. The only change needed is to the 'album_release_date' to a datetime object.

In [102]:
# format is mixed because there are albums with year-month or year only
tracks_df['album_release_date'] = pd.to_datetime(tracks_df['album_release_date'], format='mixed')

In [103]:
# check the data types again
tracks_df.dtypes

track_id                      object
track_name                    object
track_duration_ms              int64
artist_name                   object
album_name                    object
album_release_date    datetime64[ns]
popularity                     int64
genre                         object
market                        object
dtype: object

In [107]:
# create the table
create_table_spotify_tracks = """
    CREATE TABLE spotify_tracks(
        track_id                      TEXT,
        track_name                    TEXT,
        track_duration_ms              INT,
        artist_name                   TEXT,
        album_name                    TEXT,
        album_release_date        DATETIME,
        popularity                     INT,
        genre                         TEXT,
        market                        TEXT
    )
"""

# create the connection - if the database does not exist the command will create it
con = sqlite3.connect('spotify_tracks_db.sqlite')

# execute our query
con.execute(create_table_spotify_tracks)
con.commit()
con.close()

# insert date into a database
con = create_engine('sqlite:///spotify_tracks_db.sqlite')
tracks_df.to_sql('spotify_tracks', con=con, if_exists='append', index=False)

113000

In [108]:
# consulting the database
query = """
    SELECT * FROM spotify_tracks
"""

df_tracks_sqlite = pd.read_sql_query(query, con)

In [109]:
df_tracks_sqlite

Unnamed: 0,track_id,track_name,track_duration_ms,artist_name,album_name,album_release_date,popularity,genre,market
0,38jy6kRlPt8z1GUS9WXeNh,93 Million Miles,216386,Jason Mraz,Love Is a Four Letter Word,2012-04-13 00:00:00.000000,69,acoustic,BR
1,3S0OXQeoh0w6AY8WQVckRW,I'm Yours,242946,Jason Mraz,We Sing. We Dance. We Steal Things. (Bonus Tra...,2008-05-01 00:00:00.000000,79,acoustic,BR
2,5vjLSffimiIP26QG5WcN2K,Hold On,198853,Chord Overstreet,Hold On,2017-02-03 00:00:00.000000,83,acoustic,BR
3,1EzrEOXmMH3G43AXT1y7pA,I'm Yours,242946,Jason Mraz,We Sing. We Dance. We Steal Things.,2008-05-12 00:00:00.000000,81,acoustic,BR
4,2qLMf6TuEC3ruGJg4SMMN6,Lucky,189613,Jason Mraz,We Sing. We Dance. We Steal Things. (Bonus Tra...,2008-05-01 00:00:00.000000,73,acoustic,BR
...,...,...,...,...,...,...,...,...,...
112995,6opp1KuZgwChOSKRX0e8KV,Escolho A Ti,200336,Planetshakers,Escolho A Ti,2019-10-11 00:00:00.000000,18,world-music,BR
112996,3fdj319kbvmq4kRDGFdkAb,Holy Ground / Spontaneous - Live in Paris,746282,Jeremy Riddle,Holy Ground (Live Around the World),2020-03-13 00:00:00.000000,37,world-music,BR
112997,2u8pYdBBiiQoBQFZ7fyx6X,Nobody But You,258393,New Life Worship,Nobody But You,2023-06-16 00:00:00.000000,42,world-music,BR
112998,1XldqbigKtheiXxqCakzje,My Testimony (Live),291320,Elevation Worship,My Testimony (Live),2020-04-03 00:00:00.000000,44,world-music,BR
