# 🎧 Scrapify notebook

Spotify generates and stores a lot of material to analyze and fortunately with the [API of Spotify](https://developer.spotify.com/) it's possible to access to data about artists, tracks, playlist, users...etc. 

This notebook was made with the goal of explain step by step how can be extracted those data.

## 📌 Importing libraries

We are not going to use Spotipy so it's necesary import these libraries to create a mapping from scratch.

In [1]:
import requests 
import pandas as pd
import numpy as np
import json
import warnings
warnings.simplefilter("ignore")

## 📌 Requirements: Adding you credentials

The first step that is **very, very important** it's get your credentials: Client ID and Secret ID. 

It's easy, you only need to go to the page of [Spotify for Developer Dashboard](https://developer.spotify.com/dashboard/login), then click to Log in (it doesn't matter if you are a free or premium account). Finally, create a project to see your IDs and don't share them.

Spotify implements the **OAuth 2.0 authorization** framework, and OAuth manage four grant types. We are going to use Client Credentials type, so the two things that needed here are the both ID's to request the authorization.  

Thus, we send a **POST request to /api/token**, specifying our client ID and Secret ID. The limitation here is that our access token is not refreshed and expires in 3600 seconds.

In [2]:
def auth_spotify (client_id, client_secret):
    URl= 'https://accounts.spotify.com/api/token'

    auth_response = requests.post(URl, {
        'grant_type': 'client_credentials',
        'client_id': client_id,
        'client_secret': client_secret,
    })
    
    auth_response= auth_response.json()
    access_token= auth_response['access_token']
    
    return access_token

In [3]:
client_id= 'INSERT HERE YOUR CLIENT ID'
client_secret= 'INSERT HERE YOUR SECRET ID'
access_token= auth_spotify(client_id, client_secret)

## 📌 Functions to request endpoints

It was mentioned before that the API has a serie of endpoints and each one of them has a specific path to make the request with a general URl **/api.spotify.com/v1**. Below is a sample of some of them that were of interest to me.

#### 🔨 Get Playlist Items
[Get Playlist Items](https://developer.spotify.com/documentation/web-api/reference/#/operations/get-playlists-tracks) endpoint allows me to know which tracks are in the playlist

In [4]:
def get_playlist_tracks (access_token, playlist_id):
    header= {'Authorization': "Bearer {}".format(access_token)}
    URl_base = 'https://api.spotify.com/v1/'
    
    playlist = requests.get(URl_base + 'playlists/' + playlist_id + '/tracks', headers=header)
    playlist = playlist.json()
    return playlist

#### 🔨 Get Track's Audio Features
[Get Track's Audio Features](https://developer.spotify.com/documentation/web-api/reference/#/operations/get-audio-features) endpoint allows me to know the audio feature information for a single track identified by its unique Spotify ID.

In [5]:
def get_track_feature(track_id):
    header= {'Authorization': "Bearer {}".format(access_token)}
    URl_base = 'https://api.spotify.com/v1/'
 
    
    track_feature = requests.get(URl_base + 'audio-features/' + track_id, headers=header)
    track_feature = track_feature.json()
    return track_feature

#### 🔨 Get Track
[Get Track](https://developer.spotify.com/documentation/web-api/reference/#/operations/get-track) endpoint allows me to know general information for a single track identified by its unique Spotify ID.

In [6]:
def get_unique_track (track_id):
    header= {'Authorization': "Bearer {}".format(access_token)}
    URl_base = 'https://api.spotify.com/v1/'
    
    track = requests.get(URl_base + 'tracks/' + track_id, headers=header)
    track = track.json()
    return track

#### 🔨 Get Artist
[Get Artist](https://developer.spotify.com/documentation/web-api/reference/#/operations/get-an-artist) endpoint allows me to know general  information for a single artist identified by their unique Spotify ID.

In [7]:
def get_unique_artist(artist_id):
    header= {'Authorization': "Bearer {}".format(access_token)}
    URl_base = 'https://api.spotify.com/v1/'
 
    
    artist_feature = requests.get(URl_base + 'artists/' + artist_id, headers=header)
    artist_feature = artist_feature.json()
    return artist_feature

## 📌 Integrating request in a DataFrame.

Add_playlist is a function to create a playlist dataframe wich uses the Id of the playlist. At the same time, it uses **get_unique_track** function to get the name of eache artists and **get_track_feature** function to get the most important feautures.

In [8]:
def add_playlist(playlist, df_playlist):
    ##artists_= [] 
    for item in playlist['tracks']['items']:
        track_id= item['track']['id']
        track_name= item['track']['name']
        
        track= get_unique_track(track_id)
        
        artists_= []
        for artist in track['artists']:
            artists_.append(artist['name'])
        
        album= track['album']['name']
        album_id= track['album']['id']

        track_features = get_track_feature(track_id)
        
        df_playlist = df_playlist.append({'Song_id': track_id, 'Song': track_name, 'Artist': artists_, 'Album': album, 'Album_id': album_id, 'danceability': track_features['danceability'], 
                        'energy': track_features['energy'], 
                        'key': track_features['key'],
                        'mode': track_features['mode'], 
                        'instrumentalness': track_features['instrumentalness'], 
                        'valence': track_features['valence']}, ignore_index=True)
            
        
    return df_playlist

Add_artists is a function to create an artists dataframe wich uses the Id of the playlist and **get_unique_artists** to acess to its features.

In [9]:
def add_artist(playlist, df_artists):
    artists_id=[]
    
    for item in playlist['tracks']['items']:
        for artist in item['track']['artists']:
            artists_id.append(artist['id'])
            
    for artist in range (len(artists_id)):
        artist= get_unique_artist(artists_id[artist])
        
        df_artists = df_artists.append({'Artist_id': artist['id'], 'name': artist['name'], 'followers': artist['followers']['total'], 'genres': artist['genres'], 'popularity': artist['popularity']}, ignore_index=True)
    
    return  df_artists

## 📌 Requesting your endpoints.

#### 🔨 Get dataFrame of the playlist

I go to Spotify and find a playlist of my interest, in this case I use one of my favorites. Click on the **three dots of the playlist, select share, and then copy the link**: https://open.spotify.com/playlist/77e8R7DM5kj6Y34r6krjgf?si=42b2ce68ada24171

In [10]:
playlist_id= '77e8R7DM5kj6Y34r6krjgf?si=04e0f53d600345eb'
playlist= get_playlist_tracks (access_token, playlist_id)

In [11]:
df = pd.DataFrame(columns=['Song_id', 'Song', 'Artist', 'Album', 'Album_id', 'danceability', 'energy', 'key', 'mode', 'instrumentalness', 'valence'])
df

Unnamed: 0,Song_id,Song,Artist,Album,Album_id,danceability,energy,key,mode,instrumentalness,valence


In [12]:
df_tracks= add_playlist(playlist, df)

In [13]:
df_tracks.head()

Unnamed: 0,Song_id,Song,Artist,Album,Album_id,danceability,energy,key,mode,instrumentalness,valence
0,6hvczQ05jc1yGlp9zhb95V,Kill This Love,[BLACKPINK],KILL THIS LOVE,3PNxZ3BELbUXJ1XLktXiHz,0.763,0.835,2,0,0.00221,0.645
1,4SFknyjLcyTLJFPKD2m96o,How You Like That,[BLACKPINK],THE ALBUM,71O60S5gIJSIAhdnrDIh3N,0.812,0.761,11,1,0.000135,0.344
2,1XnpzbOGptRwfJhZgLbmSr,Pretty Savage,[BLACKPINK],THE ALBUM,71O60S5gIJSIAhdnrDIh3N,0.701,0.556,9,0,0.000122,0.333
3,13MF2TYuyfITClL1R2ei6e,BOOMBAYAH,[BLACKPINK],SQUARE ONE,0FOOodYRlj7gzh7q7IjmNZ,0.661,0.836,5,0,0.0,0.396
4,4ZxOuNHhpyOj4gv52MtQpT,As If It's Your Last,[BLACKPINK],As If It's Your Last,7ikmjsvRzDRzxHN0KXSQdv,0.786,0.852,8,1,0.0,0.455


In [14]:
df_tracks.to_csv("playlist_track.csv", index=False) ##Generates a csv file

#### 🔨 Get dataFrame of artists

In [15]:
df_artists = pd.DataFrame(columns=['Artist_id', 'name', 'followers', 'genres', 'popularity'])
df_artists

Unnamed: 0,Artist_id,name,followers,genres,popularity


In [16]:
df_artists= add_artist(playlist, df_artists)

It is totally **normal to have repeated artists** because in a playlist there **can be different songs by the same artist**, I have decided not to alter the frame because that can allow us to make a count of artists in a future analysis.

In [17]:
df_artists.head()

Unnamed: 0,Artist_id,name,followers,genres,popularity
0,41MozSoPIsD1dJM0CLPjZF,BLACKPINK,31847764,"[k-pop, k-pop girl group]",81
1,41MozSoPIsD1dJM0CLPjZF,BLACKPINK,31847764,"[k-pop, k-pop girl group]",81
2,41MozSoPIsD1dJM0CLPjZF,BLACKPINK,31847764,"[k-pop, k-pop girl group]",81
3,41MozSoPIsD1dJM0CLPjZF,BLACKPINK,31847764,"[k-pop, k-pop girl group]",81
4,41MozSoPIsD1dJM0CLPjZF,BLACKPINK,31847764,"[k-pop, k-pop girl group]",81


In [18]:
df_artists.to_csv("artists_features.csv", index=False) #Generates a csv file

## 📂 References:
- https://developer.spotify.com/