# Accessing the Spotify API

## Using Spotipy to create artist and track data

[Spotipy](https://spotipy.readthedocs.io/en/2.13.0/) is a lightweight Python Library for the Spotify Web API. I'm specifically using it to garner data about how my music listening habits have changed over time based on a number of audio statistics. Additionally, I can see the top tracks and artists that I've listened to the most for three different time frames. 

The Spotify API gives users access to their music listening data for the short-term ( ~4 weeks), medium-term ( ~6 months) and long-term time frames (several years). The first section details how to access your data, however, I've omitted some personal information and will be pulling in my data from a csv. The csvs were created from data collected from the following process.

In [97]:
# Import libraries
import numpy as np
import pandas as pd
import spotipy
from pandas.io.json import json_normalize
import csv

In order to get authentication you must have a spotify premium account and register an application with Spotify. After registering your application with Spotify you will have a personal client_id and client_secret token. You must also create a redirect_uri. You can then input the necessary information as such below. 

In [98]:
# oAuth Information
username='xxxxxxx'
scope = 'user-library-read user-top-read'
client_id = 'xxxxxxx'
client_secret = 'xxxxxxx'
redirect_uri = 'xxxxxxx'

# access token
tk = spotipy.util.prompt_for_user_token(username, scope, client_id, client_secret, redirect_uri)

In [99]:
# Authorizing access to my data and instantiating Spotify Object
sp = spotipy.Spotify(auth = tk)

## Requesting Artist and Track Data
Here, I am requesting my top **artists** and **tracks** for the short term, medium term, and long term time frames. I then convert the JSON to a data frame which will later be converted into a csv file for analysis.

In [100]:
# Returning top 50 artists for each time range
A_SAMPLE_SIZE = 50
A_OFFSET = 0

st_artists = sp.current_user_top_artists(limit=A_SAMPLE_SIZE, offset=A_OFFSET, time_range='short_term')
mt_artists = sp.current_user_top_artists(limit=A_SAMPLE_SIZE, offset=A_OFFSET, time_range='medium_term')
lt_artists = sp.current_user_top_artists(limit=A_SAMPLE_SIZE, offset=A_OFFSET, time_range='long_term')

# Data frames
short_term_top = pd.json_normalize(st_artists["items"])
medium_term_top = pd.json_normalize(mt_artists["items"])
long_term_top = pd.json_normalize(lt_artists["items"])


In [101]:
# Returning top 50 tracks for each time range
T_SAMPLE_SIZE = 50
# Start at the first track
T_OFFSET = 0

# Data for Short, Medium, and Long Term
st_tracks = sp.current_user_top_tracks(limit=T_SAMPLE_SIZE, offset=T_OFFSET, time_range='short_term')
mt_tracks = sp.current_user_top_tracks(limit=T_SAMPLE_SIZE, offset=T_OFFSET, time_range='medium_term')
lt_tracks = sp.current_user_top_tracks(limit=T_SAMPLE_SIZE, offset=T_OFFSET, time_range='long_term')   

st_top_tracks_df = pd.json_normalize(st_tracks["items"])
mt_top_tracks_df = pd.json_normalize(mt_tracks["items"])
lt_top_tracks_df = pd.json_normalize(lt_tracks["items"])

In [102]:
# Subset down to important fields
st_top_tracks_df = st_top_tracks_df[['artists', 'name','popularity', 'id', 'duration_ms', 'album.name', 
                                     'album.release_date']].sort_values(by=['popularity'], 
                                                                        ascending=False)
mt_top_tracks_df = mt_top_tracks_df[['artists', 'name','popularity', 'id', 'duration_ms', 'album.name', 
                                     'album.release_date']].sort_values(by=['popularity'], 
                                                                        ascending=False)
lt_top_tracks_df = lt_top_tracks_df[['artists', 'name','popularity', 'id', 'duration_ms', 'album.name', 
                                     'album.release_date']].sort_values(by=['popularity'], 
                                                                        ascending=False)
# Lambda function to pull out Artist name
artist_name = lambda col_value : col_value[0]['name']

st_top_tracks_df['artists'] = st_top_tracks_df['artists'].apply(artist_name)
mt_top_tracks_df['artists'] = mt_top_tracks_df['artists'].apply(artist_name)
lt_top_tracks_df['artists'] = lt_top_tracks_df['artists'].apply(artist_name)


### Extending Genres Column

I noticed the 'genres' column includes lists of the different types of genres that artists are a part of. In order to get a better look at the overall number and kinds of genres we need to split up the list items into separate row. I show two different ways to perform this. The first is without pandas built in 'explode' function and the second uses it.

In [103]:
st_df = short_term_top[['name', 'popularity', 'genres', 'followers.total']]
mt_df = medium_term_top[['name', 'popularity', 'genres', 'followers.total']]
lt_df = long_term_top[['name', 'popularity', 'genres', 'followers.total']]

In [None]:
# Unpack the 'genre' variable to create a row for each genre

# st_df_genres = pd.DataFrame({
#       col:np.repeat(st_df[col].values, st_df['genres'].str.len())
#       for col in st_df.columns.drop('genres')}
#     ).assign(**{'genres':np.concatenate(st_df['genres'].values)})[st_df.columns]

# mt_df_genres = pd.DataFrame({
#       col:np.repeat(mt_df[col].values, mt_df['genres'].str.len())
#       for col in mt_df.columns.drop('genres')}
#     ).assign(**{'genres':np.concatenate(mt_df['genres'].values)})[mt_df.columns]

# lt_df_genres = pd.DataFrame({
#       col:np.repeat(lt_df[col].values, lt_df['genres'].str.len())
#       for col in lt_df.columns.drop('genres')}
#     ).assign(**{'genres':np.concatenate(lt_df['genres'].values)})[lt_df.columns]

In [104]:
# Splitting genre list items into separate rows
st_df_genres = st_df.explode('genres')
mt_df_genres = mt_df.explode('genres')
lt_df_genres = lt_df.explode('genres')

st_df_genres['term'] = "Short Term"
mt_df_genres['term'] = "Medium Term"
lt_df_genres['term'] = "Long Term"

top_artists = pd.concat([st_df_genres, mt_df_genres, lt_df_genres], ignore_index = True)

# Create one csv file for all time frames 
top_artists.to_csv('spotify_data/top_artists_by_genre.csv')

## Writing CSV files for Artist and Track data

In [105]:
### For Artist Data

short_term_top.to_csv('spotify_data/st_artists.csv')
medium_term_top.to_csv('spotify_data/mt_artists.csv')
long_term_top.to_csv('spotify_data/lt_artists.csv')

### For Track Data

st_top_tracks_df.to_csv('spotify_data/st_tracks.csv')
mt_top_tracks_df.to_csv('spotify_data/mt_tracks.csv')
lt_top_tracks_df.to_csv('spotify_data/lt_tracks.csv')


### Combined Top Tracks and Artists

st_top_tracks_df2 = st_top_tracks_df.drop(['id'], axis = 1)
st_top_tracks_df2['term'] = 'Short Term'
mt_top_tracks_df2 = mt_top_tracks_df.drop(['id'], axis = 1)
mt_top_tracks_df2['term'] = 'Medium Term'
lt_top_tracks_df2 = lt_top_tracks_df.drop(['id'], axis = 1)
lt_top_tracks_df2['term'] = 'Long Term'

top_songs = pd.concat([st_top_tracks_df2, mt_top_tracks_df2, lt_top_tracks_df2], ignore_index = True)
top_songs.to_csv('spotify_data/top_songs.csv')


## Returning Top Tracks' Audio Features for 3 Time Frames

In [106]:
# Audio statistic data

trends_set = [
    { 'ref': lt_top_tracks_df, 'term': 'Long'},
    { 'ref': mt_top_tracks_df, 'term': 'Medium'},
    { 'ref': st_top_tracks_df, 'term': 'Short'}
]

# Create empty dictionaries for features
loudness = {}
tempo = {}
mode = {}
energy = {}
danceability = {}
speechiness = {}
acousticness = {}
instrumentalness = {}
liveness = {}
valence = {}
popularity = {}

# Loop through each time frame
for df_item in trends_set:
    dict_data = df_item['ref'].to_dict(orient = 'index')
    track_popularity = {v['id']: v['popularity'] for k, v in dict_data.items()}

    try:
        # Return audio features based off track id
        track_features = sp.audio_features(track_popularity.keys())
    except:
        print('No Track Audio Features')
        continue
        track_features
    
    track_analytics = None
    collect_col_name = True
    # Adding the popularity of a track
    for feature in track_features:
        feature['popularity'] = track_popularity[feature['id']]
        
        if collect_col_name:
            track_analytics = pd.DataFrame(columns = list(feature.keys()))
            collect_col_name = False
        track_analytics = track_analytics.append(feature, ignore_index = True)

    loudness[df_item['term']] = np.average(track_analytics['loudness'])
    tempo[df_item['term']] = np.average(track_analytics['tempo'])
    mode[df_item['term']] = np.average(pd.to_numeric(track_analytics['mode']))
    danceability[df_item['term']] = np.average(track_analytics['danceability'])
    energy[df_item['term']] = np.average(track_analytics['energy'])
    speechiness[df_item['term']] = np.average(track_analytics['speechiness'])
    acousticness[df_item['term']] = np.average(track_analytics['acousticness'])
    instrumentalness[df_item['term']] = np.average(pd.to_numeric(track_analytics['instrumentalness']))
    liveness[df_item['term']] = np.average(track_analytics['liveness'])
    valence[df_item['term']] = np.average(track_analytics['valence'])
    popularity[df_item['term']] = np.average(pd.to_numeric(track_analytics['popularity']))
    

In [65]:
# Return User's Current Track
# current_track = sp.currently_playing()
# current_track = pd.json_normalize(current_track['item'])
# print(current_track['name'])

### Combining Audio Features of 3 Time Frames into one Data Frame

In [111]:
results_analytics = pd.DataFrame(columns=['Long', 'Medium', 'Short'])
features = [loudness, tempo, mode, danceability, energy, speechiness, acousticness, instrumentalness, 
            liveness, valence, popularity]

# change to use pd.concat
for f in features:
    results_analytics = results_analytics.append(f, ignore_index = True)

# results_analytics.index = ['Loudness', 'Tempo', 'Modality', 'Danceability', 'Energy', 'Speechiness', 
#                            'Acousticness', 'Instrumentalness', 'Liveness', 'Valence', 'Popularity']
#results_analytics

Unnamed: 0,Long,Medium,Short
0,-8.26472,-7.76274,-7.71648
1,119.2791,114.84904,113.06658
2,0.7,0.42,0.6
3,0.58136,0.6114,0.56322
4,0.6372,0.58352,0.56294
5,0.074208,0.091748,0.07873
6,0.273138,0.344791,0.380407
7,0.299963,0.264205,0.125538
8,0.141076,0.168566,0.218516
9,0.402772,0.387228,0.399944


In [112]:
results_analytics.to_csv('spotify_data/results_analytics.csv')

## Accessing Spotify Data through their API
This requires you to set up a local node server in order to return the **Access Token**

I followed this tutorial: [Analyzing Spotify Music](https://vsupalov.com/analyze-spotify-music-library-with-jupyter-pandas/)

In [None]:
TOKEN = "token"


In [None]:
import json
import requests
from furl import furl
from math import ceil

# to save some typing
import pandas as pd
import matplotlib

# to display plots in the notebook
%matplotlib inline
import matplotlib.pyplot as plt

In [None]:
url = "https://api.spotify.com/v1/me/tracks"
headers = {'Authorization': "Bearer {}".format()}
r = requests.get(url, headers=headers)
parsed = json.loads(r.text)

count_songs = parsed["total"]
print("Total number of songs: {}".format(count_songs))