# Final Project: Data in Hand Checkpoint

## Serena Gestring

### December 2, 2022

I found two playlists on Spotify: [Today's Top Hits](https://open.spotify.com/playlist/37i9dQZF1DXcBWIGoYBM5M) for current popular music and [Baroque Classics](https://open.spotify.com/playlist/4DvteColbVCrs7iIgc4r6x) for popular music from the Baroque period. While a large sample size is best for performing a better, more thorough, and more complete analysis, due to the timeline of this report I decided to only look at 50 songs from each time period. The Top Hits playlist conveniently has 50 songs. The Baroque Classics playlist has 145 songs. I used a random number generator to give me 50 random track numbers so I could validly collect a sample size from that playlist. 

First I imported all of the packages I would need to perform this analysis in Jupyter Notebook.

In [1]:
import pandas as pd
import json
import requests
import base64
from matplotlib import pyplot as plt
from scipy import stats

In order to retrieve the data from my chosen playlists, I needed to access the Spotify API. In order to do that, I needed to provide my developer keys to prove I am authorized to access that information. I uploaded my keys to the Jupyter Notebook anonymously (meaning not visually so it will not be available on the internet).  

In [2]:
keys = pd.read_csv("Spotify_Keys_10-18-22.txt", header = 0, sep = ",")
appid = keys['Client_ID'][0]
appsecret = keys['Client_Secret'][0]

In [3]:
client_cred = base64.b64encode(str(appid + ":" + appsecret).encode('ascii'))
header = {'Authorization': 'Basic {}'.format(client_cred.decode('ascii'))}
payload = {'grant_type' : 'client_credentials'}
access_token_url = "https://accounts.spotify.com/api/token"

In [4]:
response = requests.post(access_token_url, headers = header, data = payload)
session_token = response.json()['access_token']
session_header = {'Authorization': 'Bearer {}'.format(response.json()['access_token'])}

Once I had gained access to the Spotify API, I used the Get Playlist Items Endpoint to retrieve the track information for the Top Hits playlist. I printed the status code of the request so I knew that it was successful. 

In [5]:
top_hits_id = '37i9dQZF1DXcBWIGoYBM5M'
playlist_endpoint = "https://api.spotify.com/v1/playlists/{}/tracks".format(top_hits_id)
top_hits_tracks = requests.get(playlist_endpoint, headers = session_header)
print(top_hits_tracks.status_code)

200


In order to get the audio features for each track, I needed the track ids. I loaded the data I retrieved into a pandas data frame, then created another data frame for just the 'track' key in order to more easily look at the track ids.  

In [6]:
top_hits_dict = json.loads(top_hits_tracks.text)
top_hits_df = pd.DataFrame(top_hits_dict['items'])
tracks_df = pd.DataFrame(list(top_hits_df['track']))

I tried alternative ways to combine the track ids into a string, but the way I had done it in a previous assignment was not working, so I went with this method. While listing out all 50 ids is not the most efficient way of doing this, it is still valid code that does what I need it to do. 

Once all of the ids were joined as a string, that string was passed to the Track Features Endpoint in order to retrieve the audio features for the tracks. I printed the status code to make sure the request was successful.  

In [7]:
track_ids1 = ['0V3wPSX9ygBnCm8psDIegu', '3nqQXoyQOWXiESFLlDF1hG', '1xzi1Jcr7mEi9K2RfzLOqS', '4LRPiXqCikLlN15c3yImP7', '0WtM2NBVQNNJLh6scP13H8', '1bDbXMyjaUIooNwFE9wn0N', '73vIOb4Q7YN6HeJTbscRx5', '4uUG5RXrOk84mYEfFvj3cK', '35ovElsgyAtQwYPYnZJECg', '0QHEIqNKsMoOY5urbzN48u']
track_ids2 = ['4h9wh7iOZ0GGn8QVp4RAOB', '5ww2BF9slyYgNOk37BlC4u', '5IgjP7X4th6nMNDh4akUHb', '0O6u0VJ46W86TxN9wgyqDj', '5odlY52u43F5BjByhxg7wg', '5jQI2r1RdgtuT8S3iG8zFC', '1qEmFfgcLObUfQm0j1W2CK', '34ZAzO78a5DAVNrYIGWcPm', '38T0tPVZHcPZyhtOcCP7pF', '26hOm7dTtBi0TdpDGl141t']
track_ids3 = ['1IHWl5LamUGEuP4ozKQSXZ', '4C6Uex2ILwJi9sZXRdmqXp', '76OGwb5RA9h4FxQPT33ekc', '1RDvyOk4WtPCtoqciJwVn8', '0HqZX76SFLDz2aW8aiqi7G', '3WMj8moIAXJhHsyLaqIIHI', '5CM4UuQ9Gnd6K2YyKGPMoK', '0hquQWY3xvYqN4qtiquniF', '4FyesJzVpA39hbYvcseO2d', '1PckUlxKqWQs3RlWXVBLw3']
track_ids4 = ['0mBP9X2gPCuapvpZ7TGDk3', '6G12ZafqofSq7YtrMqUm76', '39JofJHEtg8I4fSyo7Imft', '72yP0DUlWPyH8P7IoxskwN', '2TktkzfozZifbQhXjT6I33', '5unjCay0kUjuej5ebn4nS4', '5HCyWlXZPP0y6Gqq8TgA20', '0T5iIrXA4p5GsubkhuBIKV', '1Ame8XTX6QHY0l0ahqUhgv', '2tTmW7RDtMQtBk7m2rYeSw']
track_ids5 = ['59nOXPmaKlBfGMDeOVGrIK', '37vVp2sWHuuIBOSl1NswP6', '0XER3HPMx223xWaAgNKp4Y', '2rmwqU7yzTvzkiaRV53DpT', '1xK59OXxi2TAAAbmZK0kBL', '5ildQOEKmJuWGl2vRkFdYc', '5hnGrTBaEsdukpDF6aZg8a', '5uSFGgIfHMT3osrAd9n9ym', '3LtpKP5abr2qqjunvjlX5i', '0ARKW62l9uWIDYMZTUmJHF']
top_hits_track_strings = ','.join(str(x) for x in track_ids1+track_ids2+track_ids3+track_ids4+track_ids5)
track_features_endpoint = "https://api.spotify.com/v1/audio-features?ids={}".format(top_hits_track_strings)
top_hits_features = requests.get(track_features_endpoint, headers = session_header)
print(top_hits_features.status_code)

200


Then I loaded in the data to a pandas data frame that only displays the audio features. 

In [8]:
top_hits_tracks_dict = json.loads(top_hits_features.text)
top_hits_df = pd.DataFrame(top_hits_tracks_dict['audio_features'])

I repeated this process with the Baroque Classics playlist. 

In [9]:
classic_id = '4DvteColbVCrs7iIgc4r6x'
playlist_endpoint = "https://api.spotify.com/v1/playlists/{}/tracks".format(classic_id)
classic_tracks = requests.get(playlist_endpoint, headers = session_header)
print(classic_tracks.status_code)

200


In [10]:
classic_dict = json.loads(classic_tracks.text)
classic_df = pd.DataFrame(classic_dict['items'])
c_tracks_df = pd.DataFrame(list(classic_df['track']))

As mentioned previously, I used a random number generator to help me pick the 50 tracks from the playlist to look at. Because I am only using random tracks from the data frame, I think this same method for creating the string of ids would have to be used regardless of what I do with the Top Hits data frame; but again, it gets me the information I need. 

In [11]:
ctrack_ids1 = ['68pcPR8jd2djrHAAKYNsw4', '6srU3wlimYXpxBNoCabQGi', '3i8zRjiO3MNaEkrCsp5Ioh', '5S94PIQplSfBHZXsZowyGY', '6P7Ktg9c8YJbgyCqgukEmr', '5FNxBADhe8SM9Aejtw1AfR', '6OWBcTGl5cVn3xBHKrP3fi', '0ka8jmHCpHOiIJm4dgj6MF', '2f3gjXOD5ZdtyjskiT7vdS', '64ZxBE1ZgK4C0lGljXzDcG']
ctrack_ids2 = ['5w7zbR1ZV0Ee7tNFgVROwz', '18JxYMTSFKZlxqdNFrOs1r', '2bahH18wWfRwc073dKyfkT', '5dkKWRisTmeD85NuvNOQHH', '1OjmXxAaUYisbr6wNhyhiP', '267jqHim1E5DzjDfXjM90X', '1SBPeZr2Jy6mwX8X0SE3Cy', '3HEGe9L8HE5r0MQ751BicP', '1Gd0GJF72EJiPFFJYSeDO9', '5bu9A6uphPWg39RC3ZKeku']
ctrack_ids3 = ['5gKa9ap9gLwcCqYSZXkB7x', '2XkKjX6CcG7oQSaID54vjD', '6XgeIl03iqC7W89VmexJ5t', '3613rpwb4iF0gyWFwLWeDC', '5dSsnewB866BdN1aOE64Jk', '0mD1a7haZKdX9I0oPywrMb', '3Bp8T6l1Hc5OlM87U9jtB5', '5mHo72ntBoYSBjfiLiBLUM', '5TyDo6Ay5MN5VPlsPTBMl2', '1Q0YVDfALJ8xRXPsP0nlKg']
ctrack_ids4 = ['3dcKH2hiRBEARnijIF7rFm', '21rKK1lBgAMGqCShuqUyOf', '6jYl96tQY0lsG1tZtACDZx', '57EbM9h0XQG4qiiFUNgZ4W', '5NCVRnVeZk0nvqimk8D1Pt', '1QPRW7vWciCl9lUViiXJPv', '4JncPOVVxWr8Bjkz32bRFu', '6uTCvFRE00RsW3AUf39Evn', '0KHG44mT1UQBsmkZVYCpxp', '2VFWZbQk5XmTtkCkeOFgVo']
ctrack_ids5 = ['3sCGNW7o2uBpzbuUZInsbt', '4wRxPnMfvD1YqbOp9tDO6l', '2NpeOffr2aIpaoXNaXgdsV', '3doapWnT6l06sus63ctRtZ', '2uGkH3hFOydm8C20DoQ0HX', '0TNpryuAtSnygC3hosDGST', '47xdo9qYjAbOIVVBAqom1b', '4KuGb0cwL8KKSySLhS5F3H', '6glOeA1zQhc1plugv0NutP', '5s58LV3A5ytDqGd6Mii2Rp']
classic_track_strings = ','.join(str(x) for x in ctrack_ids1+ctrack_ids2+ctrack_ids3+ctrack_ids4+ctrack_ids5)
track_features_endpoint = "https://api.spotify.com/v1/audio-features?ids={}".format(classic_track_strings)
classic_features = requests.get(track_features_endpoint, headers = session_header)
print(classic_features.status_code)

200


In [12]:
classic_tracks_dict = json.loads(classic_features.text)
classic_df = pd.DataFrame(classic_tracks_dict['audio_features'])

Finally, I join the two data frames together using the concat() method. The first five and last five items in the data frame are displayed. 

In [13]:
final_df = pd.concat([top_hits_df, classic_df])
final_df.head()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.637,0.643,4,-6.571,1,0.0519,0.13,2e-06,0.142,0.533,97.008,audio_features,0V3wPSX9ygBnCm8psDIegu,spotify:track:0V3wPSX9ygBnCm8psDIegu,https://api.spotify.com/v1/tracks/0V3wPSX9ygBn...,https://api.spotify.com/v1/audio-analysis/0V3w...,200690,4
1,0.714,0.472,2,-7.375,1,0.0864,0.013,5e-06,0.266,0.238,131.121,audio_features,3nqQXoyQOWXiESFLlDF1hG,spotify:track:3nqQXoyQOWXiESFLlDF1hG,https://api.spotify.com/v1/tracks/3nqQXoyQOWXi...,https://api.spotify.com/v1/audio-analysis/3nqQ...,156943,4
2,0.78,0.689,7,-5.668,1,0.141,0.0368,1e-05,0.0698,0.642,115.042,audio_features,1xzi1Jcr7mEi9K2RfzLOqS,spotify:track:1xzi1Jcr7mEi9K2RfzLOqS,https://api.spotify.com/v1/tracks/1xzi1Jcr7mEi...,https://api.spotify.com/v1/audio-analysis/1xzi...,225389,4
3,0.52,0.731,6,-5.338,0,0.0557,0.342,0.00101,0.311,0.662,173.93,audio_features,4LRPiXqCikLlN15c3yImP7,spotify:track:4LRPiXqCikLlN15c3yImP7,https://api.spotify.com/v1/tracks/4LRPiXqCikLl...,https://api.spotify.com/v1/audio-analysis/4LRP...,167303,4
4,0.801,0.806,11,-5.206,1,0.0381,0.382,0.000669,0.114,0.802,106.999,audio_features,0WtM2NBVQNNJLh6scP13H8,spotify:track:0WtM2NBVQNNJLh6scP13H8,https://api.spotify.com/v1/tracks/0WtM2NBVQNNJ...,https://api.spotify.com/v1/audio-analysis/0WtM...,239318,4


In [14]:
final_df.tail()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
45,0.103,0.123,3,-16.964,0,0.0519,0.875,8e-06,0.11,0.0983,48.742,audio_features,0TNpryuAtSnygC3hosDGST,spotify:track:0TNpryuAtSnygC3hosDGST,https://api.spotify.com/v1/tracks/0TNpryuAtSny...,https://api.spotify.com/v1/audio-analysis/0TNp...,154267,4
46,0.139,0.11,0,-16.155,0,0.0362,0.453,0.902,0.107,0.0377,174.865,audio_features,47xdo9qYjAbOIVVBAqom1b,spotify:track:47xdo9qYjAbOIVVBAqom1b,https://api.spotify.com/v1/tracks/47xdo9qYjAbO...,https://api.spotify.com/v1/audio-analysis/47xd...,300213,3
47,0.38,0.0842,4,-30.289,0,0.0402,0.994,0.841,0.129,0.532,144.768,audio_features,4KuGb0cwL8KKSySLhS5F3H,spotify:track:4KuGb0cwL8KKSySLhS5F3H,https://api.spotify.com/v1/tracks/4KuGb0cwL8KK...,https://api.spotify.com/v1/audio-analysis/4KuG...,109720,4
48,0.439,0.134,5,-21.685,0,0.0508,0.756,0.28,0.235,0.377,154.098,audio_features,6glOeA1zQhc1plugv0NutP,spotify:track:6glOeA1zQhc1plugv0NutP,https://api.spotify.com/v1/tracks/6glOeA1zQhc1...,https://api.spotify.com/v1/audio-analysis/6glO...,209333,4
49,0.0983,0.0953,2,-22.25,1,0.0503,0.971,0.773,0.108,0.114,171.034,audio_features,5s58LV3A5ytDqGd6Mii2Rp,spotify:track:5s58LV3A5ytDqGd6Mii2Rp,https://api.spotify.com/v1/tracks/5s58LV3A5ytD...,https://api.spotify.com/v1/audio-analysis/5s58...,269347,1
