### Spotify API Report ###
#### Melaina Herbst ####
##### 10.15.2023 #####


***Hypothesis:*** The danceability level influences popularity ranking in Chris Stapleton's top tracks.

One theoretical application of my hypothesis is the enhancement of music. If there is a relationship between danceability and popularity found in Chris Stapleton's top tracks, music artists can use this as a tool to advance their music to captivate their audience.  The knowledge of musical trends can be beneficial for the advancement of artists in the music industry. If the hypothesis is found to be true, it can allow music creators to learn the importance of the influence danceability has on popularity of songs. In addition to satisfying their audience, this can also increase their chances of success. By analyzing data about the influence of danceability on popularity ranking, it can allow for deeper exploration of trends within music and the industry. Analyzing the correlation between danceability and popularity is one of the statistical applications of my hypothesis. Both of these variables hold a high ranking in the importance of music. Trends between these variables can be analyzed to identify if there is a correlation. Tools can be used to evaluate data and estimate the popularity of a song based on the danceability level. 

The endpoints I will be using in this report are Artists top tracks and audio features. Each of these endpoints are suitable for testing my hypothesis because they provide specific data that is crucial in this report. Chris Stapleton’s top tracks provided data on his most sought-after and relevant songs. This endpoint provided the popularity object which was a key data component in this report. The use of the Audio Features endpoints was crucial in this analysis as well. This feature provided specific elements that are included in the top tracks. These features are based on a numerical scale, which allows for the proper examination of the danceability in a song. Danceability was the most important object from audio features, due to its critical role in this analysis. 

This data can be reliable because it comes straight from Spotify’s records. Since this data comes straight from Spotify, it provides real user data. This data is also updated frequently which increases the reliability.  This data can be unreliable because of potential inaccuracies within data or the algorithm. 

The source for this data is Spotify. The structure of this data is artist data, which includes information about Chris Stapleton, and audio features, which includes audio elements of a song track. 


In [65]:
import requests 
import pandas as pd
import base64
import json
import urllib 

In [66]:
Client_ID = pd.read_csv('spotify_keys_9.19.txt')['Client_ID'].iloc[0]

In [67]:
Client_Secret = pd.read_csv('spotify_keys_9.19.txt')['Client_Secret'].iloc[0]

In [68]:
#Client_ID

In [69]:
#Client_Secret

In [70]:
client_cred = base64.b64encode(str(Client_ID + ":" + Client_Secret).encode("ascii"))

In [71]:
#client_cred

In [72]:
headers = {"Authorization": "Basic {}".format(client_cred.decode("ascii"))}

In [73]:
#headers

In [74]:
payload = {'grant_type' : 'client_credentials'}
url = 'https://accounts.spotify.com/api/token'

In [75]:
response = requests.post(url = url, data = payload, headers = headers )

In [76]:
response.status_code

200

In [77]:
#response.json()['access_token']

In [78]:
header_key = response.json()

In [79]:
#header_key['access_token']

In [80]:
key = header_key['access_token']

In [81]:
session_headers = {"Authorization": "Bearer {}".format(key)}

In [82]:
#session_headers

In [83]:
chris_stap_id = '4YLtscXsxbVgi031ovDDdh'

The Spotify URL Endpoint for Chris Stapleton's top tracks. The url includes the artist ID.

In [84]:
top_tracks_url  = 'https://api.spotify.com/v1/artists/{}/top-tracks?market=US'.format(chris_stap_id)

In [85]:
top_tracks_url 

'https://api.spotify.com/v1/artists/4YLtscXsxbVgi031ovDDdh/top-tracks?market=US'

In [86]:
top_tracks_response = requests.get(url = top_tracks_url, headers = session_headers)

In [87]:
top_tracks_response.status_code

200

Request for data of the top tracks. 

In [88]:
top_tracks_data = top_tracks_response.json()

In [89]:
top_tracks_data.keys()

dict_keys(['tracks'])

In [90]:
tracks = top_tracks_data['tracks']

Below is the DataFrame for Chris Stapletons top tracks.

In [91]:
top_tracks_df = pd.DataFrame(top_tracks_data['tracks'])

In [92]:
top_tracks_df.head()

Unnamed: 0,album,artists,disc_number,duration_ms,explicit,external_ids,external_urls,href,id,is_local,is_playable,name,popularity,preview_url,track_number,type,uri
0,"{'album_type': 'album', 'artists': [{'external...",[{'external_urls': {'spotify': 'https://open.s...,1,293293,False,{'isrc': 'USUM71418088'},{'spotify': 'https://open.spotify.com/track/3f...,https://api.spotify.com/v1/tracks/3fqwjXwUGN6v...,3fqwjXwUGN6vbzIwvyFMhx,False,True,Tennessee Whiskey,84,,3,track,spotify:track:3fqwjXwUGN6vbzIwvyFMhx
1,"{'album_type': 'album', 'artists': [{'external...",[{'external_urls': {'spotify': 'https://open.s...,1,213493,False,{'isrc': 'USUM72013814'},{'spotify': 'https://open.spotify.com/track/2U...,https://api.spotify.com/v1/tracks/2UikqkwBv7aI...,2UikqkwBv7aIvlixeVXHWt,False,True,You Should Probably Leave,83,,13,track,spotify:track:2UikqkwBv7aIvlixeVXHWt
2,"{'album_type': 'single', 'artists': [{'externa...",[{'external_urls': {'spotify': 'https://open.s...,1,267893,False,{'isrc': 'USUG12300484'},{'spotify': 'https://open.spotify.com/track/7M...,https://api.spotify.com/v1/tracks/7MSWxMumjz6l...,7MSWxMumjz6lHj7oRApNbg,False,True,White Horse,81,,1,track,spotify:track:7MSWxMumjz6lHj7oRApNbg
3,"{'album_type': 'album', 'artists': [{'external...",[{'external_urls': {'spotify': 'https://open.s...,1,240413,False,{'isrc': 'USUM72013812'},{'spotify': 'https://open.spotify.com/track/3K...,https://api.spotify.com/v1/tracks/3K07bGe8iljQ...,3K07bGe8iljQ3mOKArHLDo,False,True,Starting Over,78,,1,track,spotify:track:3K07bGe8iljQ3mOKArHLDo
4,"{'album_type': 'album', 'artists': [{'external...",[{'external_urls': {'spotify': 'https://open.s...,1,180906,False,{'isrc': 'USUM71701961'},{'spotify': 'https://open.spotify.com/track/06...,https://api.spotify.com/v1/tracks/06gD2ZtK3Dzc...,06gD2ZtK3Dzc1BYqWExQJJ,False,True,Broken Halos,80,,1,track,spotify:track:06gD2ZtK3Dzc1BYqWExQJJ


In [93]:
top_tracks_data['tracks'][0]['id']

'3fqwjXwUGN6vbzIwvyFMhx'

Below shows how to obtain the ID for each individual track. 

In [94]:
track_id = ','.join(list([x['id'] for x in top_tracks_data ['tracks']]))

In [95]:
audio_features_url = 'https://api.spotify.com/v1/audio-features?ids={}'.format(track_id)

In [96]:
audio_features_url

'https://api.spotify.com/v1/audio-features?ids=3fqwjXwUGN6vbzIwvyFMhx,2UikqkwBv7aIvlixeVXHWt,7MSWxMumjz6lHj7oRApNbg,3K07bGe8iljQ3mOKArHLDo,06gD2ZtK3Dzc1BYqWExQJJ,4CkgMiMqZ5JzW9iYXSTMTL,5jROdl6MhcmP3O7h2sVgtw,65M92JpTbAdHmTQm4jGaDa,68JS5SFTnW5Yv9Vzw81Jf0,178OI1A3qjROeFeh8lmNwW'

Below shows the Request of data for the audio features. 

In [97]:
audio_features_response = requests.get(url = audio_features_url, headers = session_headers)

In [98]:
audio_features_response.status_code

200

In [99]:
features = audio_features_response.json()

In [100]:
features.keys()

dict_keys(['audio_features'])

In [101]:
features['audio_features'][0]['danceability']

0.392

The DataFrame below includes all of the data and elements for the audio features.  

In [102]:
audio_features_df = pd.DataFrame(features['audio_features'])

In [103]:
audio_features_df.head()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.392,0.37,9,-10.888,1,0.0298,0.205,0.0096,0.0821,0.512,48.718,audio_features,3fqwjXwUGN6vbzIwvyFMhx,spotify:track:3fqwjXwUGN6vbzIwvyFMhx,https://api.spotify.com/v1/tracks/3fqwjXwUGN6v...,https://api.spotify.com/v1/audio-analysis/3fqw...,293293,4
1,0.602,0.477,9,-8.425,1,0.032,0.268,3.1e-05,0.173,0.552,183.89,audio_features,2UikqkwBv7aIvlixeVXHWt,spotify:track:2UikqkwBv7aIvlixeVXHWt,https://api.spotify.com/v1/tracks/2UikqkwBv7aI...,https://api.spotify.com/v1/audio-analysis/2Uik...,213493,4
2,0.353,0.77,1,-11.066,0,0.0493,0.0016,0.598,0.109,0.224,145.225,audio_features,7MSWxMumjz6lHj7oRApNbg,spotify:track:7MSWxMumjz6lHj7oRApNbg,https://api.spotify.com/v1/tracks/7MSWxMumjz6l...,https://api.spotify.com/v1/audio-analysis/7MSW...,267893,4
3,0.638,0.538,10,-8.445,1,0.0315,0.452,0.000415,0.0695,0.318,89.124,audio_features,3K07bGe8iljQ3mOKArHLDo,spotify:track:3K07bGe8iljQ3mOKArHLDo,https://api.spotify.com/v1/tracks/3K07bGe8iljQ...,https://api.spotify.com/v1/audio-analysis/3K07...,240413,4
4,0.613,0.564,8,-7.085,1,0.0251,0.134,2e-06,0.0585,0.538,78.388,audio_features,06gD2ZtK3Dzc1BYqWExQJJ,spotify:track:06gD2ZtK3Dzc1BYqWExQJJ,https://api.spotify.com/v1/tracks/06gD2ZtK3Dzc...,https://api.spotify.com/v1/audio-analysis/06gD...,180907,4


Below is the joint DataFrame of the top tracks and audio features. The two DataFrames were merged by each song ID. 

In [104]:
joined_df = pd.merge(top_tracks_df, audio_features_df, how = 'inner', on = 'id')

In [105]:
joined_df.head()

Unnamed: 0,album,artists,disc_number,duration_ms_x,explicit,external_ids,external_urls,href,id,is_local,...,instrumentalness,liveness,valence,tempo,type_y,uri_y,track_href,analysis_url,duration_ms_y,time_signature
0,"{'album_type': 'album', 'artists': [{'external...",[{'external_urls': {'spotify': 'https://open.s...,1,293293,False,{'isrc': 'USUM71418088'},{'spotify': 'https://open.spotify.com/track/3f...,https://api.spotify.com/v1/tracks/3fqwjXwUGN6v...,3fqwjXwUGN6vbzIwvyFMhx,False,...,0.0096,0.0821,0.512,48.718,audio_features,spotify:track:3fqwjXwUGN6vbzIwvyFMhx,https://api.spotify.com/v1/tracks/3fqwjXwUGN6v...,https://api.spotify.com/v1/audio-analysis/3fqw...,293293,4
1,"{'album_type': 'album', 'artists': [{'external...",[{'external_urls': {'spotify': 'https://open.s...,1,213493,False,{'isrc': 'USUM72013814'},{'spotify': 'https://open.spotify.com/track/2U...,https://api.spotify.com/v1/tracks/2UikqkwBv7aI...,2UikqkwBv7aIvlixeVXHWt,False,...,3.1e-05,0.173,0.552,183.89,audio_features,spotify:track:2UikqkwBv7aIvlixeVXHWt,https://api.spotify.com/v1/tracks/2UikqkwBv7aI...,https://api.spotify.com/v1/audio-analysis/2Uik...,213493,4
2,"{'album_type': 'single', 'artists': [{'externa...",[{'external_urls': {'spotify': 'https://open.s...,1,267893,False,{'isrc': 'USUG12300484'},{'spotify': 'https://open.spotify.com/track/7M...,https://api.spotify.com/v1/tracks/7MSWxMumjz6l...,7MSWxMumjz6lHj7oRApNbg,False,...,0.598,0.109,0.224,145.225,audio_features,spotify:track:7MSWxMumjz6lHj7oRApNbg,https://api.spotify.com/v1/tracks/7MSWxMumjz6l...,https://api.spotify.com/v1/audio-analysis/7MSW...,267893,4
3,"{'album_type': 'album', 'artists': [{'external...",[{'external_urls': {'spotify': 'https://open.s...,1,240413,False,{'isrc': 'USUM72013812'},{'spotify': 'https://open.spotify.com/track/3K...,https://api.spotify.com/v1/tracks/3K07bGe8iljQ...,3K07bGe8iljQ3mOKArHLDo,False,...,0.000415,0.0695,0.318,89.124,audio_features,spotify:track:3K07bGe8iljQ3mOKArHLDo,https://api.spotify.com/v1/tracks/3K07bGe8iljQ...,https://api.spotify.com/v1/audio-analysis/3K07...,240413,4
4,"{'album_type': 'album', 'artists': [{'external...",[{'external_urls': {'spotify': 'https://open.s...,1,180906,False,{'isrc': 'USUM71701961'},{'spotify': 'https://open.spotify.com/track/06...,https://api.spotify.com/v1/tracks/06gD2ZtK3Dzc...,06gD2ZtK3Dzc1BYqWExQJJ,False,...,2e-06,0.0585,0.538,78.388,audio_features,spotify:track:06gD2ZtK3Dzc1BYqWExQJJ,https://api.spotify.com/v1/tracks/06gD2ZtK3Dzc...,https://api.spotify.com/v1/audio-analysis/06gD...,180907,4


In [106]:
clean_df = joined_df.drop(columns=['artists', 'album','disc_number','explicit','external_urls','external_ids', 'is_playable', 'href', 'is_local', 'type_y', 'uri_y', 'preview_url','analysis_url', 'uri_x', 'type_x', 'duration_ms_y','track_href', 'duration_ms_x', 'id', 'track_number', 'energy','key', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness','liveness', 'valence', 'tempo' ,'time_signature'])

By cleaning up the DataFrame, it allows for the proper analysis of the data to test the hypothesis. 

In [107]:
name_group = clean_df.groupby("name").sum()

In [108]:
final_df = name_group.sort_values(by="popularity", ascending=False)

In [109]:
final_df

Unnamed: 0_level_0,popularity,danceability
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Tennessee Whiskey,84,0.392
You Should Probably Leave,83,0.602
White Horse,81,0.353
Broken Halos,80,0.613
Starting Over,78,0.638
I Bet You Think About Me (feat. Chris Stapleton) (Taylor’s Version) (From The Vault),77,0.391
Parachute,76,0.642
Think I'm In Love With You,76,0.671
Millionaire,75,0.616
Traveller,74,0.543


The final DataFrame above shows the two elements that were mandatory for testing my hypothesis. 
The DataFrame is sorted by greatest popularity level to least popularity level. The popularity level is measured on a scale of 0-100. The danceability level is measured on a scale of 0-1, 1 being the highest. To properly test my hypothesis and analyze the data, I cleaned up the data set and removed unecessary columns. The final DataFrame shows that my hypothesis was not correct. This data shows that popularity is not affected by danceability. If my hypothesis was correct, the songs with a higher popularity level would also have a higher danceability level, which in this case is not correct. By looking at this data, my hypothesis was almost the exact of what this data portrays. The lesser the popularity, the higher the danceability. Could this be a trend or is it just a coincidence? 

This analysis leaves me with the question of what factors do influence each other in music? In the future, I would like to take the year the song was made into account when doing an analysis like this. That is another factor that can  potentially influence popularity in music, or even influence other factors.  