#Data Extraction

In this notebook, we demonstrate our data extraction process with a single member's data, Kelly.  However, in our overall project, each member will be performing this same process with their own Spotify data.




We will be using the Spotify API and Spotipy library to extract audio features for songs in 3 different playlists: Kelly's "likes", "dislikes", and "test".  We will first establish our credentials with Spotify, then import our playlist data to this notebook, and then finally go through every song in the playlist to retrieve their audio features. 


We plan to use this data to train a model on Kelly's preference of music based upon the audio features for each track.  We will then use this model to predict on the "test" playlist, and it will output whether or not Kelly will like a given track. This process can be found here:
https://github.com/kwong101/SpotifySongRecommender/blob/main/Final%20Prototype_480.ipynb

## Prerequisites

Before beginning, we installed the Spotipy library and imported the Pandas library.

The Spotipy library was used to extract our music information from Spotify, and the Pandas library was used to hold our data in dataframes.

In [None]:
!pip install Spotipy

Collecting Spotipy
  Downloading https://files.pythonhosted.org/packages/7a/cd/e7d9a35216ea5bfb9234785f3d8fa7c96d0e33999c2cb72394128f6b4cce/spotipy-2.16.1-py3-none-any.whl
Installing collected packages: Spotipy
Successfully installed Spotipy-2.16.1


In [None]:
import time
import requests
import pandas as pd
import spotipy
import spotipy.util as util
from spotipy.oauth2 import SpotifyClientCredentials
from spotipy.oauth2 import SpotifyOAuth
import spotipy.oauth2 as oauth2
from fastai.collab import *
from fastai.tabular import *
import pandas as pd
import numpy as np


##Setup Spotify Credentials

In order to access our music information, we must set up Spotify Credentials by using a Client Id and a Client Secret. These values are unique to each individual user, and they can be retrieved by setting up a new app in the Spotify Developers Website on the dashboard: https://developer.spotify.com/dashboard/applications.

In [None]:
CLIENT_ID = "b884f341420d4409974e6707dee78ca8"
CLIENT_SECRET = "306a2c61fa694a3d8642f2a8656f8fd4"
username = "kelly101wong"
market = ['US']
redirect_uri='http://localhost:808/callback/'

In [None]:
credentials = oauth2.SpotifyClientCredentials(
        client_id=CLIENT_ID,
        client_secret=CLIENT_SECRET)

token = credentials.get_access_token()
sp = spotipy.Spotify(auth=token)
scope = 'user-library-read'


  """


##Getting Playlist With Liked Tracks 

First, Kelly created a playlist of liked songs on Spotify and used the Spotipy API to grab every track in the playlist.

In [None]:
PLAYLIST_ID = '1O0LYzR7ctJh8D4hmnHbEo'

playlist_tracks = sp.user_playlist_tracks(CLIENT_ID, PLAYLIST_ID, fields='items,uri,name,id,total')

In [None]:
PLAYLIST_ID = '1O0LYzR7ctJh8D4hmnHbEo'

tracks = []
for uri in playlist_tracks:

  results = sp.user_playlist_tracks(CLIENT_ID, PLAYLIST_ID)
  tracks = results['items']

  # Loops to ensure I get every track of the playlist
  while results['next']:
      time.sleep(3)
      results = sp.next(results)
      tracks.extend(results['items'])



In [None]:
df_tracks = pd.json_normalize(tracks)

In [None]:
df_tracks = df_tracks[['track.name', 'track.album.uri', 'track.id', 'track.uri']]
df_tracks

Unnamed: 0,track.name,track.album.uri,track.id,track.uri
0,It Goes In Waves,spotify:album:3woooDflvrTEmLXHuERWBs,6vNUlpx3Lxy3Ilr61kFkC8,spotify:track:6vNUlpx3Lxy3Ilr61kFkC8
1,Sunburn - Reimagined,spotify:album:4B8VCnt9cXMBzQctzzEYDW,0i27kJRbxmdzQzhVDJVgzO,spotify:track:0i27kJRbxmdzQzhVDJVgzO
2,Sweet,spotify:album:39bAAoJ347tffgS7788a0N,3vA6H5yARRohQkpcHKjZN9,spotify:track:3vA6H5yARRohQkpcHKjZN9
3,Figure A (NASAYA Remix),spotify:album:1q2qDZKyv6XnKsBhnFNbnT,5COquaK9Wx28EMPLydTVPI,spotify:track:5COquaK9Wx28EMPLydTVPI
4,Only One,spotify:album:3D0PZPpzrwEtzkJzXmWSVl,6ZILYi8SaRLbGIdgej1WIA,spotify:track:6ZILYi8SaRLbGIdgej1WIA
...,...,...,...,...
1177,Dynasty,spotify:album:0tLiYqolgGEXIe2pIOrDT9,3vVXzKIlFLYERxMaVFukyr,spotify:track:3vVXzKIlFLYERxMaVFukyr
1178,Slow Motion,spotify:album:3pKTKC0AAe3yTcXQLzvpSW,4NYwy0R3NdvORX2B6OZXBT,spotify:track:4NYwy0R3NdvORX2B6OZXBT
1179,Daisy,spotify:album:4jKdXIJckKh7la6xHuKwRT,4ccQmBycgXDYtIA7Z1i32V,spotify:track:4ccQmBycgXDYtIA7Z1i32V
1180,Beautiful Now,spotify:album:4jKdXIJckKh7la6xHuKwRT,2ISSQPb9LHHiV6ng2NXosL,spotify:track:2ISSQPb9LHHiV6ng2NXosL


##Getting Disliked Songs from Playlist

Next, Kelly created a playlist of disliked songs, and again used the Spotipy API to grab every track in the playlist. 


In [None]:
DISLIKED_PLAYLIST_ID = '32XKJnecHE3WYpzfNdrThK'

disliked_playlist_tracks = sp.user_playlist_tracks(CLIENT_ID, DISLIKED_PLAYLIST_ID, fields='items,uri,name,id,total')

In [None]:
disliked_tracks = []
for uri in disliked_playlist_tracks:

  results = sp.user_playlist_tracks(CLIENT_ID, DISLIKED_PLAYLIST_ID)
  disliked_tracks = results['items']

  # Loops to ensure I get every track of the playlist
  while results['next']:
      results = sp.next(results)
      disliked_tracks.extend(results['items'])


In [None]:
df_disliked_tracks = pd.json_normalize(disliked_tracks)
df_disliked_tracks = df_disliked_tracks[['track.name', 'track.album.uri', 'track.id', 'track.uri']]
df_disliked_tracks

##Getting test data
For the group's test data, we added a bunch of songs from several different genres (country, rap, lofi, edm, drum and bass, dubstep, future bass, latin, trap) into a playlist and used the Spotify API to grab every track in the playlist.

In [None]:
TEST_PLAYLIST_ID = '5ZYD1j4JoaYQAnwJw4gWb0'
test_playlist_tracks = sp.user_playlist_tracks(CLIENT_ID, TEST_PLAYLIST_ID, fields='items,uri,name,id,total')

In [None]:
test_tracks = []
for uri in test_playlist_tracks:

  results = sp.user_playlist_tracks(CLIENT_ID, TEST_PLAYLIST_ID)
  test_tracks = results['items']

  # Loops to ensure I get every track of the playlist
  while results['next']:
      results = sp.next(results)
      test_tracks.extend(results['items'])

In [None]:
df_test_tracks = pd.json_normalize(test_tracks)
df_test_tracks = df_test_tracks[['track.name', 'track.album.uri', 'track.id', 'track.uri']]
df_test_tracks

## Save tracks to Google Drive as .csv files

Afterwards, we made sure to save the tracks as a .csv file to our google drive.  This is done so that long computations don't need to be rerun in the future.  We can directly access the saved .csv file from our drive the next time we need to use the data.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
df_tracks.to_csv("drive/My Drive/Colab Notebooks/my_playlist_tracks.csv")

In [None]:
df_test_tracks.to_csv("drive/My Drive/Colab Notebooks/test_tracks.csv")

In [None]:
df_disliked_tracks.to_csv("drive/My Drive/Colab Notebooks/disliked_tracks.csv")

#Getting audio features for my liked tracks

Next, Kelly used the Spotipy API to extract audio features for each track in her "liked" playlist and placed them into a single dataframe.  She made sure to save this dataframe to her Google Drive so that it will be accessible in other Jupyter Notebooks.

In [None]:
import pandas as pd

In [None]:
path = "drive/My Drive/Colab Notebooks/my_playlist_tracks.csv"
df_tracks = pd.read_csv(path)

In [None]:
df_tracks

Unnamed: 0.1,Unnamed: 0,track.name,track.album.uri,track.id,track.uri
0,0,It Goes In Waves,spotify:album:3woooDflvrTEmLXHuERWBs,6vNUlpx3Lxy3Ilr61kFkC8,spotify:track:6vNUlpx3Lxy3Ilr61kFkC8
1,1,Sunburn - Reimagined,spotify:album:4B8VCnt9cXMBzQctzzEYDW,0i27kJRbxmdzQzhVDJVgzO,spotify:track:0i27kJRbxmdzQzhVDJVgzO
2,2,Sweet,spotify:album:39bAAoJ347tffgS7788a0N,3vA6H5yARRohQkpcHKjZN9,spotify:track:3vA6H5yARRohQkpcHKjZN9
3,3,Figure A (NASAYA Remix),spotify:album:1q2qDZKyv6XnKsBhnFNbnT,5COquaK9Wx28EMPLydTVPI,spotify:track:5COquaK9Wx28EMPLydTVPI
4,4,Only One,spotify:album:3D0PZPpzrwEtzkJzXmWSVl,6ZILYi8SaRLbGIdgej1WIA,spotify:track:6ZILYi8SaRLbGIdgej1WIA
...,...,...,...,...,...
1177,1177,Dynasty,spotify:album:0tLiYqolgGEXIe2pIOrDT9,3vVXzKIlFLYERxMaVFukyr,spotify:track:3vVXzKIlFLYERxMaVFukyr
1178,1178,Slow Motion,spotify:album:3pKTKC0AAe3yTcXQLzvpSW,4NYwy0R3NdvORX2B6OZXBT,spotify:track:4NYwy0R3NdvORX2B6OZXBT
1179,1179,Daisy,spotify:album:4jKdXIJckKh7la6xHuKwRT,4ccQmBycgXDYtIA7Z1i32V,spotify:track:4ccQmBycgXDYtIA7Z1i32V
1180,1180,Beautiful Now,spotify:album:4jKdXIJckKh7la6xHuKwRT,2ISSQPb9LHHiV6ng2NXosL,spotify:track:2ISSQPb9LHHiV6ng2NXosL


In [None]:
count = 0
tracks_features = []
for i in range(len(df_tracks)):
  results = sp.audio_features(df_tracks.iloc[i]["track.id"])
  tracks_features.extend(results)
  count += 1
  if count == 100:
    print("100 songs later ...")
    count = 0
  


In [None]:
df_features = pd.json_normalize(tracks_features)
df_features

In [None]:
#Adding Track Name and Id to Features Dataframe
df_features.insert(0, "track.id", df_tracks["track.id"])
df_features.insert(0, "track.name", df_tracks["track.name"])


In [None]:
df_features.to_csv("drive/My Drive/Colab Notebooks/playlist_song_features.csv")

In [None]:
df_features

Unnamed: 0,track.name,track.id,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,It Goes In Waves,6vNUlpx3Lxy3Ilr61kFkC8,0.795,0.645,2,-7.589,0,0.0871,0.2390,0.043600,0.3400,0.353,129.971,audio_features,6vNUlpx3Lxy3Ilr61kFkC8,spotify:track:6vNUlpx3Lxy3Ilr61kFkC8,https://api.spotify.com/v1/tracks/6vNUlpx3Lxy3...,https://api.spotify.com/v1/audio-analysis/6vNU...,214154,4
1,Sunburn - Reimagined,0i27kJRbxmdzQzhVDJVgzO,0.828,0.690,8,-4.723,1,0.0338,0.0116,0.261000,0.1140,0.495,105.996,audio_features,0i27kJRbxmdzQzhVDJVgzO,spotify:track:0i27kJRbxmdzQzhVDJVgzO,https://api.spotify.com/v1/tracks/0i27kJRbxmdz...,https://api.spotify.com/v1/audio-analysis/0i27...,247530,4
2,Sweet,3vA6H5yARRohQkpcHKjZN9,0.662,0.766,9,-5.941,0,0.0448,0.0149,0.004010,0.1160,0.638,113.316,audio_features,3vA6H5yARRohQkpcHKjZN9,spotify:track:3vA6H5yARRohQkpcHKjZN9,https://api.spotify.com/v1/tracks/3vA6H5yARRoh...,https://api.spotify.com/v1/audio-analysis/3vA6...,237176,4
3,Figure A (NASAYA Remix),5COquaK9Wx28EMPLydTVPI,0.744,0.450,0,-6.522,1,0.0573,0.2850,0.000013,0.0908,0.651,94.038,audio_features,5COquaK9Wx28EMPLydTVPI,spotify:track:5COquaK9Wx28EMPLydTVPI,https://api.spotify.com/v1/tracks/5COquaK9Wx28...,https://api.spotify.com/v1/audio-analysis/5COq...,210798,4
4,Only One,6ZILYi8SaRLbGIdgej1WIA,0.393,0.542,1,-7.254,1,0.3070,0.2910,0.000000,0.0966,0.579,178.318,audio_features,6ZILYi8SaRLbGIdgej1WIA,spotify:track:6ZILYi8SaRLbGIdgej1WIA,https://api.spotify.com/v1/tracks/6ZILYi8SaRLb...,https://api.spotify.com/v1/audio-analysis/6ZIL...,208120,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1177,Dynasty,3vVXzKIlFLYERxMaVFukyr,0.493,0.594,8,-4.404,0,0.0378,0.2560,0.000000,0.0759,0.304,123.751,audio_features,3vVXzKIlFLYERxMaVFukyr,spotify:track:3vVXzKIlFLYERxMaVFukyr,https://api.spotify.com/v1/tracks/3vVXzKIlFLYE...,https://api.spotify.com/v1/audio-analysis/3vVX...,225515,4
1178,Slow Motion,4NYwy0R3NdvORX2B6OZXBT,0.733,0.408,7,-7.834,0,0.0388,0.6790,0.000039,0.1100,0.291,94.947,audio_features,4NYwy0R3NdvORX2B6OZXBT,spotify:track:4NYwy0R3NdvORX2B6OZXBT,https://api.spotify.com/v1/tracks/4NYwy0R3NdvO...,https://api.spotify.com/v1/audio-analysis/4NYw...,197854,4
1179,Daisy,4ccQmBycgXDYtIA7Z1i32V,0.586,0.598,2,-5.774,1,0.0359,0.1240,0.000000,0.1510,0.201,112.067,audio_features,4ccQmBycgXDYtIA7Z1i32V,spotify:track:4ccQmBycgXDYtIA7Z1i32V,https://api.spotify.com/v1/tracks/4ccQmBycgXDY...,https://api.spotify.com/v1/audio-analysis/4ccQ...,174413,4
1180,Beautiful Now,2ISSQPb9LHHiV6ng2NXosL,0.628,0.833,11,-4.126,0,0.0282,0.0079,0.000015,0.0740,0.560,128.003,audio_features,2ISSQPb9LHHiV6ng2NXosL,spotify:track:2ISSQPb9LHHiV6ng2NXosL,https://api.spotify.com/v1/tracks/2ISSQPb9LHHi...,https://api.spotify.com/v1/audio-analysis/2ISS...,218293,4


#Getting Audio features for Disliked Tracks

Kelly repeated this process with her disliked tracks this time, making sure to save it to her Google Drive. 

In [None]:
path = "drive/My Drive/Colab Notebooks/disliked_tracks.csv"
df_disliked_tracks = pd.read_csv(path)
df_disliked_tracks

In [None]:
count = 0
disliked_tracks_features = []
for i in range(len(df_disliked_tracks)):
  results = sp.audio_features(df_disliked_tracks.iloc[i]["track.id"])
  disliked_tracks_features.extend(results)
  count += 1
  if count == 100:
    print("100 songs later ...")
    count = 0

100 songs later ...
100 songs later ...
100 songs later ...


In [None]:
df_disliked_features = pd.json_normalize(disliked_tracks_features)
df_disliked_features

In [None]:
#Adding Track Name and Id to Features Dataframe
df_disliked_features.insert(0, "track.id", df_disliked_tracks["track.id"])
df_disliked_features.insert(0, "track.name", df_disliked_tracks["track.name"])


In [None]:
df_disliked_features.to_csv("drive/My Drive/Colab Notebooks/disliked_features.csv")

#Getting Audio features for Test Tracks

Kelly repeats the process once more with the test playlist, and saves the test playlist data to her Google Drive.

In [None]:
path = "drive/My Drive/Colab Notebooks/test_tracks.csv"
df_test_tracks = pd.read_csv(path)

In [None]:
count = 0
test_tracks_features = []
for i in range(len(df_test_tracks)):
  results = sp.audio_features(df_test_tracks.iloc[i]["track.id"])
  test_tracks_features.extend(results)
  count += 1
  if count == 100:
    print("100 songs later ...")
    count = 0

100 songs later ...


In [None]:
df_test_features = pd.json_normalize(test_tracks_features)
df_test_features

In [None]:
#Adding Track Name and Id to Features Dataframe
df_test_features.insert(0, "track.id", df_test_tracks["track.id"])
df_test_features.insert(0, "track.name", df_test_tracks["track.name"])


In [None]:
df_test_features.to_csv("drive/My Drive/Colab Notebooks/test_features.csv")

In [None]:
df_test_features

Unnamed: 0,track.name,track.id,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,Beer Can’t Fix,7idmHTAQQPUFqdjXkoooXD,0.711,0.774,7,-4.068,1,0.0308,0.031700,0.000000,0.1240,0.939,111.016,audio_features,7idmHTAQQPUFqdjXkoooXD,spotify:track:7idmHTAQQPUFqdjXkoooXD,https://api.spotify.com/v1/tracks/7idmHTAQQPUF...,https://api.spotify.com/v1/audio-analysis/7idm...,209733,4
1,Cheatin’ Songs,01dBLHq3UWyDRWDZJXj235,0.723,0.723,4,-5.522,1,0.0269,0.043000,0.000040,0.3260,0.694,109.988,audio_features,01dBLHq3UWyDRWDZJXj235,spotify:track:01dBLHq3UWyDRWDZJXj235,https://api.spotify.com/v1/tracks/01dBLHq3UWyD...,https://api.spotify.com/v1/audio-analysis/01dB...,215033,4
2,Break Things,45hbxz8xCxQfa9vmnV187v,0.638,0.740,8,-5.071,1,0.0314,0.032900,0.000000,0.1340,0.673,105.006,audio_features,45hbxz8xCxQfa9vmnV187v,spotify:track:45hbxz8xCxQfa9vmnV187v,https://api.spotify.com/v1/tracks/45hbxz8xCxQf...,https://api.spotify.com/v1/audio-analysis/45hb...,172785,4
3,She's Mine,6NCbMyR7A8MjbX0UhaEgbd,0.621,0.858,7,-5.864,1,0.0353,0.000072,0.000006,0.1160,0.608,123.004,audio_features,6NCbMyR7A8MjbX0UhaEgbd,spotify:track:6NCbMyR7A8MjbX0UhaEgbd,https://api.spotify.com/v1/tracks/6NCbMyR7A8Mj...,https://api.spotify.com/v1/audio-analysis/6NCb...,229493,4
4,I Hope You’re Happy Now,1iiehnBysGi59zXYXIuKQD,0.591,0.815,10,-4.725,1,0.0366,0.168000,0.000009,0.1420,0.306,118.024,audio_features,1iiehnBysGi59zXYXIuKQD,spotify:track:1iiehnBysGi59zXYXIuKQD,https://api.spotify.com/v1/tracks/1iiehnBysGi5...,https://api.spotify.com/v1/audio-analysis/1iie...,198689,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
148,Like Mike,72pQVKLq8x5Zr2x7esliQz,0.696,0.706,2,-5.708,0,0.0707,0.010500,0.000000,0.3310,0.561,135.093,audio_features,72pQVKLq8x5Zr2x7esliQz,spotify:track:72pQVKLq8x5Zr2x7esliQz,https://api.spotify.com/v1/tracks/72pQVKLq8x5Z...,https://api.spotify.com/v1/audio-analysis/72pQ...,142272,4
149,Scares,6fJwXmAU1risouOng097pd,0.293,0.866,0,-2.822,1,0.0319,0.012000,0.000503,0.1050,0.209,159.399,audio_features,6fJwXmAU1risouOng097pd,spotify:track:6fJwXmAU1risouOng097pd,https://api.spotify.com/v1/tracks/6fJwXmAU1ris...,https://api.spotify.com/v1/audio-analysis/6fJw...,228000,4
150,I'm Bad,04DqF7MZwYURO4jchKsflH,0.615,0.651,8,-6.055,0,0.0408,0.363000,0.000000,0.0664,0.553,79.981,audio_features,04DqF7MZwYURO4jchKsflH,spotify:track:04DqF7MZwYURO4jchKsflH,https://api.spotify.com/v1/tracks/04DqF7MZwYUR...,https://api.spotify.com/v1/audio-analysis/04Dq...,193125,4
151,Believe I'm Leaving,6ACRvx5KyZARgR9jwbPhlh,0.467,0.796,7,-3.487,1,0.0480,0.144000,0.000000,0.1160,0.236,77.962,audio_features,6ACRvx5KyZARgR9jwbPhlh,spotify:track:6ACRvx5KyZARgR9jwbPhlh,https://api.spotify.com/v1/tracks/6ACRvx5KyZAR...,https://api.spotify.com/v1/audio-analysis/6ACR...,164712,4
