# API Data Report by Larissa Kidd 10/15/2023
### In this report, I am parsing data available with a Spotify Developer account to test my hypothesis:
### Noah Kahan is pretty popular right now. I hypothesize that it is because he has above average tempo and scores high with energy and valence (even though the lyrics are not cheerful) but middle-ground danceability

I believe the way he sings is very upbeat so he will have high tempo and good valence score too. I want to compare Noah Kahan's Stick Season song (the song that put him on everyone's radar) to Dua Lipa's Dance the Night because it has very high tempo so I assume it will have valence, and danceability. For the opposite comparison, I want to use SZA's Snooze because "snooze" is synonymous with sleeping in so it shouldn't be very lively. I chose popular songs based off of their tempo because tempo is a universal measurement whereas the danceability, energy and valence are probably more objective. 

##### First things first - importing the library that I will need to access and parse the data on Spotify

In [1]:
import requests
import pandas as pd
import base64
import json
import urllib

##### Now, to access the data, I have to provide and apply the access keys that grant me access!

In [2]:
Client_ID = pd.read_csv('Spotify_Keys_9-19-23.txt')['Client_ID'].iloc[0]
Client_Secret = pd.read_csv('Spotify_Keys_9-19-23.txt')['Client_Secret'].iloc[0]

In [3]:
client_cred = base64.b64encode(str(Client_ID + ":" + Client_Secret).encode("ascii"))

In [4]:
headers = {"Authorization": "Basic {}".format(client_cred.decode("ascii"))}

In [5]:
payload = {'grant_type' : 'client_credentials'}
url = 'https://accounts.spotify.com/api/token'

In [6]:
session_key_response = requests.post(url = url, data = payload, headers = headers)
session_key_response.status_code

200

##### With a 200 status code, I know that the access codes were correctly read into the url

#### To understand more, I want to look at the info connected to Noah Kahan's spotify profile

In [7]:
session_header_key = session_key_response.json

In [8]:
key = session_key_response.json()["access_token"]

In [9]:
session_headers = {"Authorization": "Bearer {}".format(key)}

In [10]:
noah_kahan_id = '2RQXRUsr4IW1f3mKyKsy4B'

In [11]:
album_url = 'https://api.spotify.com/v1/artists/{}'.format(noah_kahan_id)

In [12]:
response = requests.get(url = album_url,headers = session_headers)

In [16]:
response_data = response.json()
print(response_data['genres'])
print(response_data['popularity'])

['pov: indie']
83


###### Noah Kahan is under the indie genre and he rates an 83 on the popularity scale which is out of 100!

Now, pulling the data for his number one song, Stick Season

In [17]:
stick_season_id = '0mflMxspEfB0VbI1kyLiAv'

In [18]:
stick_audio_feature_url = 'https://api.spotify.com/v1/audio-features/{}'.format(stick_season_id)

In [19]:
stick_features = requests.get(url = stick_audio_feature_url,headers = session_headers)

In [25]:
stick_features_data = stick_features.json()
stick_features_data

{'danceability': 0.662,
 'energy': 0.488,
 'key': 9,
 'loudness': -6.894,
 'mode': 1,
 'speechiness': 0.0682,
 'acousticness': 0.782,
 'instrumentalness': 0,
 'liveness': 0.102,
 'valence': 0.817,
 'tempo': 117.913,
 'type': 'audio_features',
 'id': '0mflMxspEfB0VbI1kyLiAv',
 'uri': 'spotify:track:0mflMxspEfB0VbI1kyLiAv',
 'track_href': 'https://api.spotify.com/v1/tracks/0mflMxspEfB0VbI1kyLiAv',
 'analysis_url': 'https://api.spotify.com/v1/audio-analysis/0mflMxspEfB0VbI1kyLiAv',
 'duration_ms': 182347,
 'time_signature': 4}

###### For the low tempo comparison, I chose a low tempo song that is very popular this year: Snooze by SZA 

In [26]:
snooze_id = '1Qrg8KqiBpW07V7PNxwwwL'
snooze_audio_feature_url = 'https://api.spotify.com/v1/audio-features/{}'.format(snooze_id)
snooze_features = requests.get(url = snooze_audio_feature_url,headers = session_headers)
snooze_features_data = snooze_features.json()
#snooze_features_data

###### On the other side of the spectrum, the popular high-tempo song I chose is Dance the Night by Dua Lipa

In [27]:
dance_night_id = '11C4y2Yz1XbHmaQwO06s9f'
dance_night_audio_feature_url = 'https://api.spotify.com/v1/audio-features/{}'.format(dance_night_id)
dance_night_features = requests.get(url = dance_night_audio_feature_url,headers = session_headers)
dance_night_features_data = dance_night_features.json()
#dance_night_features_data

##### I want to put all three songs into a data frame and only include the 4 features I am most curious about so I need to first create dataframes

In [28]:
stick_features_df = pd.DataFrame([stick_features_data])
stick_features_df

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.662,0.488,9,-6.894,1,0.0682,0.782,0,0.102,0.817,117.913,audio_features,0mflMxspEfB0VbI1kyLiAv,spotify:track:0mflMxspEfB0VbI1kyLiAv,https://api.spotify.com/v1/tracks/0mflMxspEfB0...,https://api.spotify.com/v1/audio-analysis/0mfl...,182347,4


In [29]:
snooze_features_df = pd.DataFrame([snooze_features_data])
#snooze_features_df

In [30]:
dance_night_features_df = pd.DataFrame([dance_night_features_data])
#dance_night_features_df

##### With the 3 different dataframes, I want to be able to glance at one data frame and easily compare the things I am curious about. So, to 'stack' the dataframes ontop of each other, I am using the concatenate feature in Pandas

In [33]:
joined_df = pd.concat([stick_features_df, snooze_features_df, dance_night_features_df], keys=['Stick Season','Snooze','Dance the Night'])
joined_df

Unnamed: 0,Unnamed: 1,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
Stick Season,0,0.662,0.488,9,-6.894,1,0.0682,0.782,0.0,0.102,0.817,117.913,audio_features,0mflMxspEfB0VbI1kyLiAv,spotify:track:0mflMxspEfB0VbI1kyLiAv,https://api.spotify.com/v1/tracks/0mflMxspEfB0...,https://api.spotify.com/v1/audio-analysis/0mfl...,182347,4
Snooze,0,0.644,0.735,8,-5.747,1,0.0391,0.0521,0.144,0.161,0.418,88.98,audio_features,1Qrg8KqiBpW07V7PNxwwwL,spotify:track:1Qrg8KqiBpW07V7PNxwwwL,https://api.spotify.com/v1/tracks/1Qrg8KqiBpW0...,https://api.spotify.com/v1/audio-analysis/1Qrg...,153947,4
Dance the Night,0,0.671,0.845,11,-4.93,0,0.048,0.0207,0.0,0.329,0.775,110.056,audio_features,11C4y2Yz1XbHmaQwO06s9f,spotify:track:11C4y2Yz1XbHmaQwO06s9f,https://api.spotify.com/v1/tracks/11C4y2Yz1XbH...,https://api.spotify.com/v1/audio-analysis/11C4...,176579,4


##### Because I am only curious about the 4 features I meantioned in my hypothesis, I am going to drop the other columns that are less relevant to my project

In [34]:
final_df = joined_df.drop(columns=['key', 'loudness','mode','speechiness','instrumentalness','liveness','acousticness','type','id','uri','track_href','analysis_url','time_signature','duration_ms'])
final_df

Unnamed: 0,Unnamed: 1,danceability,energy,valence,tempo
Stick Season,0,0.662,0.488,0.817,117.913
Snooze,0,0.644,0.735,0.418,88.98
Dance the Night,0,0.671,0.845,0.775,110.056


### Just as I hypothesized..
In comparison to the other two popular songs, the danceability for Stick Season is in the middle compared to the other two popular songs
### Suprisingly, the energy for Stick Season is the lowest of the three
This is weird because of how low energy I percieve Snooze and because the tempo of Stick Season is higher than both of them
### My hypothesis was correct in assuming valence would be high even though words are not entirely cheery
I credit this score to the high tempo

### Why use the endpoints I wanted to use
My first point of curiosity is where Spotify classifies Noah as a genre and what his popularity is as background info, so for that I pulled the artist endpoint
For my hypothesis I want to look at specific songs' audio features, so I am using the track's audio features endpoint to get to the exact place I want
### Reliable or unreliable?
I think this data is unreliable as far as validity because it is scaling the audio features on things that aren't entirely subjectively measured - so that might explain why some of the danceability or valence doesn't exactly make sense to my original guesses (because I could have a different opinon) but they are fun for curiosity sake!