# Mariam Sedrakyan: API data Report 
### Hypotheses:  Does the popularity of the song depends on its duration and danceability in certain country? 

#### Theoretical:
For my Spotify API data report, I aimed to find the featured playlists from Armenian and pick a playlist based on the description. Then, I wanted to dive in and learn more about the playlist by comparing the top 5 most popular tracks and understand if their popularity is connected to their duration and danceability. 


I picked to find the featured playlist from Armenia because that is my motherland, and it was exciting to find out what playlists are trendy in my country. After research, I chose a playlist closer to my music taste. 

This data can be reliable because it was collected from individual users and organized based on their preferences. This can help us find the information we need based on users' liking. However, the unreliable side of this data is too many individuals and too many opinions and tastes. This can lead the data to be unbalanced and various. This can lead to many limitations in using and collecting this data based on your hypothesis. For example, after I found my top ten featured playlists, they all were different, and it was hard to pick the one I wanted to explore. __The data was too diverse because it was collected from hundreds to millions of people from different tastes.__

### Endpoints 
To test my hypotheses, I used three endpoints.

The first one was __featured-playlist__. This endpoint allowed me to get the featured playlist and had a country parameter that returned a list from our chosen country.  

The second endpoint was __get playlist__, which is suitable for my hypothesis because it allowed me to get a playlist ID and sort the tracks by their population and duration. This helped me understand if the track's popularity is related to its length. 

Third, I used __audio-features__ to get more specific information about each track. More specifically, I was looking for the danceability of the track.

In [1]:
import requests
import pandas as pd
import base64
import json
import urllib

Below are the functions I used to get access tokens from Spotify's API. 
An access token is a string representing the authorization granted to the client. It is used in HTTP to prove the authenticity of the sender and receiver.

In [2]:
Client_ID = pd.read_csv("Spotify-Keys_9-23-21.txt")['Client_ID'].iloc[0]

In [3]:
Client_Secret = pd.read_csv("Spotify-Keys_secret.txt")['Client_Secret'].iloc[0]

Some of the information need to be hidden for confidentiality

In [4]:
# Client_ID

In [5]:
# Client_Secret

In [6]:
client_cred = base64.b64encode(str(Client_ID + ":" + Client_Secret).encode("ascii"))

In [7]:
# client_cred

In [8]:
headers = {"Authorization": "Basic {}".format(client_cred.decode("ascii"))}


In [9]:
# headers

In [10]:
payload = {'grant_type' : 'client_credentials'}
url = 'https://accounts.spotify.com/api/token'

In [11]:
session_key_response = requests.post(url = url, data = payload, headers = headers)

In [12]:
session_key_response.status_code

200

In [13]:
# session_key_response.json()['access_token']

In [14]:
session_header_key = session_key_response.json()

In [15]:
key = session_header_key['access_token']

After the access I need an authorization

In [16]:
session_headers = {"Authorization": "Bearer {}".format(key)}

In [17]:
# session_headers

After the access I need an authorization

In [18]:
base_url = 'https://api.spotify.com/v1/'

I am using my first endpoint to get the featured playlist from Armenia

In [19]:
featured_playlists_endpoint = 'browse/featured-playlists/?country=AM'
featured_playlists_url = ''.join([base_url,featured_playlists_endpoint])

In [20]:
response = requests.get(url = featured_playlists_url, headers = session_headers)
response.status_code

200

In [21]:
playlists_data = response.json()

After getting the response for the request we need to dive in and find what information we need using .keys()


In [22]:
playlists_data.keys()


dict_keys(['message', 'playlists'])

In [23]:
playlists_data['playlists'].keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

In [24]:
playlists_df= pd.DataFrame(playlists_data['playlists']['items'])
playlists_df.head(2)

Unnamed: 0,collaborative,description,external_urls,href,id,images,name,owner,primary_color,public,snapshot_id,tracks,type,uri
0,False,Bad Bunny on top of the Hottest 50!,{'spotify': 'https://open.spotify.com/playlist...,https://api.spotify.com/v1/playlists/37i9dQZF1...,37i9dQZF1DXcBWIGoYBM5M,"[{'height': None, 'url': 'https://i.scdn.co/im...",Today's Top Hits,"{'display_name': 'Spotify', 'external_urls': {...",,,MTY5NzE2OTYwMCwwMDAwMDAwMDg2YjA3ZDhjZTYwMWE0Mm...,{'href': 'https://api.spotify.com/v1/playlists...,playlist,spotify:playlist:37i9dQZF1DXcBWIGoYBM5M
1,False,the beat of your drift,{'spotify': 'https://open.spotify.com/playlist...,https://api.spotify.com/v1/playlists/37i9dQZF1...,37i9dQZF1DWWY64wDtewQt,"[{'height': None, 'url': 'https://i.scdn.co/im...",phonk,"{'display_name': 'Spotify', 'external_urls': {...",,,MTY5NzIxMDY4NCwwMDAwMDAwMDFmMWVhZWMzZTk0MTRiMG...,{'href': 'https://api.spotify.com/v1/playlists...,playlist,spotify:playlist:37i9dQZF1DWWY64wDtewQt


I must organize my graph with only the description, name, and ID to satisfy our hypothesis. I will use the name and description to pick the playlist that represents my taste in music and describes my personality the best and matches my music taste

In [25]:
playlists_narrowed = playlists_df [['description', 'name', 'id']]
playlists_narrowed.head(10)

Unnamed: 0,description,name,id
0,Bad Bunny on top of the Hottest 50!,Today's Top Hits,37i9dQZF1DXcBWIGoYBM5M
1,the beat of your drift,phonk,37i9dQZF1DWWY64wDtewQt
2,"Peaceful piano to help you slow down, breathe,...",Peaceful Piano,37i9dQZF1DX4sWSpwq3LiO
3,iykyk,big on the internet,37i9dQZF1DX5Vy6DFOcx00
4,The biggest songs of the 2010s.,All Out 2010s,37i9dQZF1DX5Ejj0EkURtP
5,"New music from Kodak Black, Key Glock and Offs...",RapCaviar,37i9dQZF1DX0XUsuxWHRQd
6,Kick back to the best new and recent chill hits.,Chill Hits,37i9dQZF1DX4WYpdgoIcn6
7,150%,sped up songs,37i9dQZF1DX0mWZXNs0ArA
8,Rock legends & epic songs that continue to ins...,Rock Classics,37i9dQZF1DWXRqgorJj26U
9,Get your beast mode on!,Beast Mode,37i9dQZF1DX76Wlfdnj7AP


After listening and doing some researched about the playlist, I decided to pick the playlist named __"I Love My '90s Hip-Hop."__

In [26]:
playlist_id = "37i9dQZF1DX186v583rmzp"

Using the second endpoint, I sorted the tracks by popularity and duration 

In [27]:
playlist_url = 'https://api.spotify.com/v1/playlists/{}'.format(playlist_id)

In [28]:
playlist_url_response = requests.get(url = playlist_url, headers = session_headers)


In [29]:
playlist_url_data=playlist_url_response.json()

In [30]:
playlist_url_data.keys()

dict_keys(['collaborative', 'description', 'external_urls', 'followers', 'href', 'id', 'images', 'name', 'owner', 'primary_color', 'public', 'snapshot_id', 'tracks', 'type', 'uri'])

In [31]:
playlist_url_data['tracks'].keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

Getting Inside the "track" I can find the popularity and duration 

In [32]:
playlist_df = pd.DataFrame([x['track'] for x in playlist_url_data['tracks']['items']])
playlist_df.head(2)

Unnamed: 0,album,artists,available_markets,disc_number,duration_ms,episode,explicit,external_ids,external_urls,href,id,is_local,name,popularity,preview_url,track,track_number,type,uri
0,"{'album_type': 'album', 'artists': [{'external...",[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",1,251666,False,True,{'isrc': 'USLF29600148'},{'spotify': 'https://open.spotify.com/track/6w...,https://api.spotify.com/v1/tracks/6w4Mlvpuxtbj...,6w4MlvpuxtbjGdL9YeTx89,False,Jazzy Belle,67,https://p.scdn.co/mp3-preview/dfee0ea5a130f79a...,True,5,track,spotify:track:6w4MlvpuxtbjGdL9YeTx89
1,"{'album_type': 'album', 'artists': [{'external...",[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",1,240693,False,False,{'isrc': 'USTB10400128'},{'spotify': 'https://open.spotify.com/track/1D...,https://api.spotify.com/v1/tracks/1DIXPcTDzTj8...,1DIXPcTDzTj8ZMHt3PDt8p,False,Gangsta's Paradise,87,https://p.scdn.co/mp3-preview/1454c63a66c27ac7...,True,2,track,spotify:track:1DIXPcTDzTj8ZMHt3PDt8p


In [33]:
playlist_url_data.keys()

dict_keys(['collaborative', 'description', 'external_urls', 'followers', 'href', 'id', 'images', 'name', 'owner', 'primary_color', 'public', 'snapshot_id', 'tracks', 'type', 'uri'])

I organized my graph to make it visually understandable by name popularity duration, and ID and sorted them using .sort_values()

In [34]:
playlist_poplarity = playlist_df[['name','popularity','duration_ms','id']].sort_values('popularity', ascending = False).head(10)
playlist_duration_ms = playlist_df[['name','duration_ms','popularity','id']].sort_values('duration_ms', ascending = False). head(10)

Below using pd.concat, I placed two graphs together to get a better comparison of the graphs.   
100: being most popular  
0: being least popular  
The most popular ranking of 87, "Gangsta's Paradise," has a short duration compared to "The Humpty Dance," which has the most extended duration on the playlist with a ranking of 64.   
We can conclude that songs with short duration are more popular than songs with long duration. 

In [35]:
result = pd.concat([playlist_poplarity, playlist_duration_ms], axis=1)
result.head(30)
playlist_poplarity.reset_index(drop=True, inplace=True)
playlist_duration_ms.reset_index(drop=True, inplace=True)
joined_df = (pd.concat([playlist_poplarity, playlist_duration_ms], axis=1))
joined_df.head(5)

Unnamed: 0,name,popularity,duration_ms,id,name.1,duration_ms.1,popularity.1,id.1
0,Gangsta's Paradise,87,240693,1DIXPcTDzTj8ZMHt3PDt8p,The Humpty Dance,391693,64,0YEC3pNS0uIKvZb5u4lgsu
1,Still D.R.E.,84,270586,503OTo2dSqe7qk76rgsbep,"Triumph (feat. Ol' Dirty Bastard, Inspectah De...",338186,65,3glqlA5qY3bbo8UaUDGPnw
2,It Was A Good Day,84,260000,2qOm7ukLyHUXWyR4ZWLwxA,Lost Ones,333906,66,71XhXay6rKPZCVAaDtFlSR
3,Hypnotize - 2014 Remaster,82,229826,7KwZNVEaqikRSBSpyhXK2j,Real Muthaphuckkin G's,332626,64,7q6lr1W6cmGWijl5kCv5RG
4,The Next Episode,82,161506,4LwU4Vp6od3Sb08CsP99GC,"Shook Ones, Pt. II",325506,78,33ZXjLCpiINn8eQIDYEPTD


I wanted to compare the joined_df graph with the top 5 most danceable tracks.  
By joining the tracks IDs, I will put them in my third endpoint __"audio-features"__ to get the danceability of the songs 

In [36]:
I wanted to compare the joined_df graph with the top 5 most danceable tracks.
By joining the tracks IDs, I will put them in my third endpoint __"audio-features"__ to get the danceability of the songs tracks_url = 'https://api.spotify.com/v1/audio-features?ids={}'\
.format(','.join(list(playlist_df['id'])))


After getting the desired response, I dived into audio_features to get more detailed information about the tracks.

In [37]:
tracks_url_response = requests.get(url = tracks_url , headers = session_headers)
tracks_url_response.status_code

200

In [38]:
features=tracks_url_response.json()

In [39]:
features.keys()

dict_keys(['audio_features'])

In [40]:
features['audio_features'][0].keys()

dict_keys(['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo', 'type', 'id', 'uri', 'track_href', 'analysis_url', 'duration_ms', 'time_signature'])

I pulled out the danceability from the audio featured and merged my two graphs by sorting the tracks by danceability.    
One being the most danceable and zero being none.    
The graph shows that most danceable songs had short duration.

In [41]:
features_df = pd.DataFrame(features['audio_features'])
final_df = pd.merge(playlist_df, features_df, how = 'inner', on = 'id')
final_df_orgnized = final_df[['name', 'popularity','duration_ms_x','danceability']].sort_values('danceability', ascending = False).head(10)
final_df_orgnized.head(5)

Unnamed: 0,name,popularity,duration_ms_x,danceability
37,Shoop,72,248573,0.939
39,Check Yo Self - Remix,72,273106,0.934
51,Got Your Money (feat. Kelis),70,239546,0.934
21,Forgot About Dre,81,222293,0.924
12,The Next Episode,82,161506,0.922


My final graph shows that the most popular songs have sort duration    
songs that have a long duration are not as popular  
Finally, songs with high danceability are viral and have a short duration.

In [43]:
final_graph =  pd.concat([joined_df, final_df_orgnized], axis=1)
joined_df.reset_index(drop=True, inplace=True)
final_df_orgnized.reset_index(drop=True, inplace=True)
final_graphs = (pd.concat([joined_df, final_df_orgnized], axis=1))
final_graphs.head(5)

Unnamed: 0,name,popularity,duration_ms,id,name.1,duration_ms.1,popularity.1,id.1,name.2,popularity.2,duration_ms_x,danceability
0,Gangsta's Paradise,87,240693,1DIXPcTDzTj8ZMHt3PDt8p,The Humpty Dance,391693,64,0YEC3pNS0uIKvZb5u4lgsu,Shoop,72,248573,0.939
1,Still D.R.E.,84,270586,503OTo2dSqe7qk76rgsbep,"Triumph (feat. Ol' Dirty Bastard, Inspectah De...",338186,65,3glqlA5qY3bbo8UaUDGPnw,Check Yo Self - Remix,72,273106,0.934
2,It Was A Good Day,84,260000,2qOm7ukLyHUXWyR4ZWLwxA,Lost Ones,333906,66,71XhXay6rKPZCVAaDtFlSR,Got Your Money (feat. Kelis),70,239546,0.934
3,Hypnotize - 2014 Remaster,82,229826,7KwZNVEaqikRSBSpyhXK2j,Real Muthaphuckkin G's,332626,64,7q6lr1W6cmGWijl5kCv5RG,Forgot About Dre,81,222293,0.924
4,The Next Episode,82,161506,4LwU4Vp6od3Sb08CsP99GC,"Shook Ones, Pt. II",325506,78,33ZXjLCpiINn8eQIDYEPTD,The Next Episode,82,161506,0.922


## Conclution
My goal from my data is to understand what made the songs famous. Based on my collected data, the songs with short duration and high rate of danceability are more popular than songs with long duration and high rate of population. I picked to use a popular playlist in a specific county, which can limit my data. For my next step, I want to look at the bigger picture and narrow my list to pick and collect data from songs or playlists that came out during the past 3 to 5 years. This way, my analysis will be based on recent data, making it more accurate about what people like to listen to and why this generation prefers certain types of music.   
