# API DATA REPORT

Melyssa Rein

10/15/2023

# Analysis of Brazilian Funk's danceability

![](https://volumemorto.com.br/wp-content/uploads/2020/01/CAPA-FUNK-DECADA-PORTAL-2.jpg) 
*Photo of some funk singers*

The Brazilian Funk is extremelly popular in Brazil, easy to recognize, and it is always playing in parties. In the beginning, the Brazilian funk was a music generally listened by the people living in the suburbs and it was more connected to North American music, but then it started spreding all over in the country and it slowly changed into what we know nowadays. In Brazil, people usually call it "Funk" and there is more than one style to it, for example, "Mega Funk" that is more electronic. However, Funk is in its majority a danceable music genre that people cannot manage to stay still when listening to because of its beats. 

Due to my experience in Brazilian parties, as someone that spent almost 20 years living in Brazil and that listened to Funk all my life, my hypothesis is that Brazilian Funk will have a high danceability score as it has a contagious rhythm with beats that make people feel like dancing. Spotify API measures danceability according to the song's musical elements (such as tempo) and it can go from 0 to 1, in which the closest to 1, the more danceable it is. I believe this range can be divided into four levels of danceability: not danceable (0-0.25), not as danceable (0.251-0.5), average danceability (0.51 - 0.75), very danceable (0.751-1). I got to these numbers only by dividing the overall into quarters, as it would be easier to grasp the idea using 0 to 1. Hence, my hypothesis is that Brazilian funk will score near to 1, around 0.75 to 0.9. As I do believe there might be more danceable music genres, I don't think Brazilian Funk will hit 1. 

To test my hypothesis, a playlist called "Melhores funks de todos os tempos" (translation, *Best funk of all time*) created by Victor Pelizaro de Souza will be analized. This playlist contains famous Brazilian funk songs from many funk styles, so it will be good to analyze the danceability from this sample as they are the most known in Brazil, as well as it allows the results to consider different funk styles Brazil has and different release years - as people's tastes change along the years which can impact on types of music released. So, by making requests to the Spotify API, it will be possible to receive information on the tracks in this playlist to get their danceability. Two Spotify endpoints will be used in this analysis: the playlist endpoint to get the tracks of the playlist and their IDs, and the tracks' audio features endpoint to get their respective danceability. These are the most suitable endpoints in Spotify API because in order to submit a request for the audio features of songs and receive their danceability, it will be necessary to first get their IDs by obtaining them through submitting a request for the playlist's information. 

The data acquired from Spotify API is reliable in the sense that Spoity is a very known music streaming platform and part of the  information requested are facts, as when requesting the content inside the playlist, Spotify gives the name of the tracks, its duration, the artist, if the song is explicit, etc. Something than can be considered unreliable is the amount of information that Spotify lets clients have, such as, through Spotify API there is no way to get an information on how many times a song was put on replay; hence, there are information they could probably be giving, but they are not. It is also interesting to consider that the description on Spotify API of what the measurement of danceability consists of only says "Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity" so there might be other elements included that are not disclosed; in addition, the danceability of a song can be arbitrary.

Looking at the API, the only crucial limitation regarding this hypothesis and analysis is the fact that when requesting a playlists information, the limit is 100 tracks; so, the 415 track playlist sample is decreased to 100. However, it is still a good sample of songs to get an idea of the average danceability of Brazilian Funk.

## Initial Process

It is necessary to import several packages to be able to make the requests to Spotify API and analyze the given information. 

In [42]:
import requests
import pandas as pd
import base64
import json
import urllib

Previously, a CSV file was created with the client's ID and password, as they are not information that can be displayed and shared, the file was called here so they do not have to be shown.

In [43]:
Client_ID = pd.read_csv('Spotify_keys.txt')['Client_ID'].iloc[0]

In [44]:
Client_Secret = pd.read_csv('Spotify_keys.txt')['Client_Secret'].iloc[0]

Then, the both of them were encoded using base64 and they were used to transfer this information to Spotify API and request a session key. This key allows me to actually get information from Spotify API, if not for it, I would not be able to get any information on the playlist. Also, if the status code of the request for a session key is 200, it means it was successful - which means I can proceed to request information from the API.

In [45]:
client_cred = base64.b64encode(str(Client_ID + ":" + Client_Secret).encode('ascii'))

In [46]:
headers = {"Authorization": "Basic {}".format(client_cred.decode('ascii'))}

In [47]:
payload = {'grant_type' : 'client_credentials'}
url = 'https://accounts.spotify.com/api/token'

In [48]:
session_key_response = requests.post(url = url,data = payload, headers = headers)

In [49]:
session_key_response.status_code

200

In [50]:
session_headers = {"Authorization": "Bearer {}".format(session_key_response.json()['access_token'])}

## Analysis

Now, using the playlist endpoint from Spotify API and the playlist ID taken from the playlist URL, I was able to request and get information on the content of the "Melhores funks de todos os tempos" playlist - more specifically of the first 100 tracks of it.

In [51]:
playlist_id = '1rcA66nSyWnPAebLekg3XF'

In [52]:
playlist_url = 'https://api.spotify.com/v1/playlists/{}'.format(playlist_id)

In [53]:
response = requests.get(url = playlist_url, headers = session_headers)

With the response of the request transformed into json, I was able to get the keys of the dictionary in order to create a data frame with the given information. Then, I would be able to have a look at everything and choose the crucial data for the analysis. I decided to only show the top 5 results, as what matter at that moment was having a clearer look at the information.

In [54]:
data_playlist = response.json() 

In [55]:
data_playlist.keys()

dict_keys(['collaborative', 'description', 'external_urls', 'followers', 'href', 'id', 'images', 'name', 'owner', 'primary_color', 'public', 'snapshot_id', 'tracks', 'type', 'uri'])

In [56]:
data_playlist ['tracks'].keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

In [58]:
playlist_df = pd.DataFrame([x['track'] for x in data_playlist['tracks']['items']])
playlist_df.head() 

Unnamed: 0,album,artists,available_markets,disc_number,duration_ms,episode,explicit,external_ids,external_urls,href,id,is_local,name,popularity,preview_url,track,track_number,type,uri
0,"{'album_type': 'single', 'artists': [{'externa...",[{'external_urls': {'spotify': 'https://open.s...,[BR],1,155550,False,True,{'isrc': 'US7VG1719220'},{'spotify': 'https://open.spotify.com/track/7K...,https://api.spotify.com/v1/tracks/7KBfjlQ85Fqs...,7KBfjlQ85Fqs2MCCvXSW0c,False,Bum Bum Tam Tam,57,,True,1,track,spotify:track:7KBfjlQ85Fqs2MCCvXSW0c
1,"{'album_type': 'compilation', 'artists': [{'ex...",[{'external_urls': {'spotify': 'https://open.s...,[],1,209013,False,False,{'isrc': 'FR0Z50003566'},{'spotify': 'https://open.spotify.com/track/2P...,https://api.spotify.com/v1/tracks/2PSsjdKe42nE...,2PSsjdKe42nE7gZYA1c6fu,False,Rap glamurosa,0,,True,15,track,spotify:track:2PSsjdKe42nE7gZYA1c6fu
2,"{'album_type': 'album', 'artists': [{'external...",[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",1,154026,False,False,{'isrc': 'BXMOP1400007'},{'spotify': 'https://open.spotify.com/track/1O...,https://api.spotify.com/v1/tracks/1OtZIaJBXo3P...,1OtZIaJBXo3Pe6nfaMsjKT,False,Na Ponta Ela Fica,63,https://p.scdn.co/mp3-preview/a2cec7d61d8692ff...,True,7,track,spotify:track:1OtZIaJBXo3Pe6nfaMsjKT
3,"{'album_type': 'single', 'artists': [{'externa...",[{'external_urls': {'spotify': 'https://open.s...,"[AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...",1,162053,False,False,{'isrc': 'BXG6R1700002'},{'spotify': 'https://open.spotify.com/track/6Q...,https://api.spotify.com/v1/tracks/6Q4foUefrAyC...,6Q4foUefrAyCt8VxJFML56,False,Olha A Explosão,74,,True,1,track,spotify:track:6Q4foUefrAyCt8VxJFML56
4,"{'album_type': 'album', 'artists': [{'external...",[{'external_urls': {'spotify': 'https://open.s...,[],1,164048,False,True,{'isrc': 'QM6P41550310'},{'spotify': 'https://open.spotify.com/track/3D...,https://api.spotify.com/v1/tracks/3DMnCzBn8jiz...,3DMnCzBn8jizeatq0ll5Bm,False,Vou Passar Cerol Na Mào,0,,True,11,track,spotify:track:3DMnCzBn8jizeatq0ll5Bm


As the only important information was the name of the song and the ID, I decided to create a summarized data frame. I decided not to include artists because it would not be necessary for the analysis, as the focus is the danceability of the songs and who it belongs to does not affect it. I also decided to only display the top 5 in this table as I would not analyse it yet.

In [59]:
playlist_summ = playlist_df[['name','id']]
playlist_summ.head()

Unnamed: 0,name,id
0,Bum Bum Tam Tam,7KBfjlQ85Fqs2MCCvXSW0c
1,Rap glamurosa,2PSsjdKe42nE7gZYA1c6fu
2,Na Ponta Ela Fica,1OtZIaJBXo3Pe6nfaMsjKT
3,Olha A Explosão,6Q4foUefrAyCt8VxJFML56
4,Vou Passar Cerol Na Mào,3DMnCzBn8jizeatq0ll5Bm


Then, in order to request the audio featues of the tracks, it is necessery to join the IDs by a comma. After that, the request using the tracks' audio features endpoint can be made. And just like before, if there is a 200 status code, the request was successful.

In [60]:
','.join(list(playlist_summ['id']))

'7KBfjlQ85Fqs2MCCvXSW0c,2PSsjdKe42nE7gZYA1c6fu,1OtZIaJBXo3Pe6nfaMsjKT,6Q4foUefrAyCt8VxJFML56,3DMnCzBn8jizeatq0ll5Bm,0EPxmvsG1BY5td4aTOkWBF,2V60nTDW7OtktRlqBfjFj1,6AaoEX192rJ6o3UFwG43sV,4FzmvHmoFXV4EFctf0o5YX,5oNdTrafZ0joNCikz1Vdjd,6cEJheMNRySY0kBB31jtdk,7FiIQXpIeLo3ITfGGpb8PB,3vSp8nnsu8xgMMjtzrtBi3,3Hb9kUdm4yf839Fle4RIdT,3ao5U0ksMLvgSoJqOpsgmv,2ZCLbGsDB89u7DsxQsVt9m,0WnbxhaRe5fg012TcDxwWj,6tuzPD1zPPvwMbGiD0TBay,3AZYWk2RaXnYBMbWvmInnc,66f34yT1LgVr5JhBF7c5Ck,7DRP2VOMpy1rrk3iYFLCW9,43t1aOJd9FtIV3hB0QkPi3,4aAq1afusFG6jRiOSHQZvN,4BjPsq3MXBNo4Qxg40igEr,0dCwmf59BqqIAXvw9PRsGG,1xFtF3Zo6d7CJQfcgsPbQZ,3x7oHnRfCLUUNftSowFOqo,4eZg3l8x5aMvgwsRHcZuIC,6OmKF6WW3eQtZzGQ4hDSOX,3xRaXAOFz03X86oWrDrW9Q,6Y0Lah5ZRbCZzNFcOrTN1o,480o64VmNvWcwzCg74VUxW,1NPWiFacYL1Db5j86Sxu6n,3zd8GrqInpxh3CWsEUAUqO,0Qjv8p8vPKDBJKuB9aBmpN,4SCH5CuivFcShLpTg8lYOf,3TUiw7KuX3j4VTvbS8qKe5,0P3k6Jipqv1UWX0a4GmUuj,1zhfU6YXa2pXUAQdG1NvBZ,2neRyPOj9hTCsl79GxFb2D,0lNSjlkUnSjuXOcOn2BNun,3pM81OKNrV6d2zJb76nQ6M,5HB5EU2P1caxud7rixXKCl,7lzp2TKcCL

In [61]:
audio_features_url = 'https://api.spotify.com/v1/audio-features?ids={}'.format(','.join(list(playlist_summ['id'])))

In [62]:
response_audio = requests.get(url = audio_features_url, headers = session_headers)

In [63]:
response_audio.status_code

200

Just as I did previously, I turned the file into a json file. Then, I created a data frame for this file and displayed only the top 5 in order to have a better look at the information it contains.

In [64]:
audio = response_audio.json()

In [65]:
audio_df = pd.DataFrame(audio['audio_features'])
audio_df.head()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.957,0.667,10,-5.353,0,0.0856,0.826,0.0205,0.179,0.556,132.01,audio_features,7KBfjlQ85Fqs2MCCvXSW0c,spotify:track:7KBfjlQ85Fqs2MCCvXSW0c,https://api.spotify.com/v1/tracks/7KBfjlQ85Fqs...,https://api.spotify.com/v1/audio-analysis/7KBf...,155550,4
1,0.812,0.796,7,-6.322,1,0.038,0.0933,0.000603,0.122,0.96,130.061,audio_features,2PSsjdKe42nE7gZYA1c6fu,spotify:track:2PSsjdKe42nE7gZYA1c6fu,https://api.spotify.com/v1/tracks/2PSsjdKe42nE...,https://api.spotify.com/v1/audio-analysis/2PSs...,209013,4
2,0.843,0.569,7,-10.713,1,0.278,0.288,8.8e-05,0.0643,0.947,95.075,audio_features,1OtZIaJBXo3Pe6nfaMsjKT,spotify:track:1OtZIaJBXo3Pe6nfaMsjKT,https://api.spotify.com/v1/tracks/1OtZIaJBXo3P...,https://api.spotify.com/v1/audio-analysis/1OtZ...,154027,4
3,0.778,0.635,6,-2.892,1,0.112,0.107,0.0,0.0608,0.379,90.008,audio_features,6Q4foUefrAyCt8VxJFML56,spotify:track:6Q4foUefrAyCt8VxJFML56,https://api.spotify.com/v1/tracks/6Q4foUefrAyC...,https://api.spotify.com/v1/audio-analysis/6Q4f...,162053,4
4,0.959,0.789,6,-5.762,0,0.269,0.0849,0.00012,0.0888,0.675,126.873,audio_features,3DMnCzBn8jizeatq0ll5Bm,spotify:track:3DMnCzBn8jizeatq0ll5Bm,https://api.spotify.com/v1/tracks/3DMnCzBn8jiz...,https://api.spotify.com/v1/audio-analysis/3DMn...,164049,4


Due to the large amount of categories, I decided to create a summary of this table using only the most important ones: danceability, tempo, and ID. I decided to leave tempo on the table simply because Spotify API mentioned that they analyse the tempo of the song, along with other features, in order to define the danceability.

Both summaries I created, one for the playlist and one for the audio features, have the ID in them because there has to be a category in common to merge them together later on, so it can be easier to understand the acquired information.

In [66]:
audio_summ = audio_df[['danceability','tempo','id']]

In [67]:
combined_df = pd.merge(playlist_summ,audio_summ, how = 'inner', on = 'id')

For the table to be clearer, I decided to drop the ID column when displaying it. I decided to show only the top 15 and the last 15 ranks in danceability, as at this point the objective is looking at the highest and lowest danceability scores, as well as having a look at the danceability and tempo relation (which can be done with only 30 tracks). 

After merging the tables, it is already noticeable that many of the tracks have a high danceability. It becomes clearer that the lowest score is 0.566 which it more than half of the possible score. If we put this in the four possible quarters of the danceability range I mentioned before, we would see that they all fit as either "average danceability" or as "very danceable."

In [76]:
combined_df.sort_values('danceability',ascending=False).drop(columns='id').head(15)

Unnamed: 0,name,danceability,tempo
15,Um Morto Muito Louco,0.975,128.04
24,Dança da Motinha,0.973,126.973
35,Bumbum Granada,0.965,130.015
4,Vou Passar Cerol Na Mào,0.959,126.873
16,Fica Caladinha,0.959,129.961
0,Bum Bum Tam Tam,0.957,132.01
36,Eu se quero ser feliz,0.954,129.03
18,Cerol na Mão,0.954,126.913
19,Sequencia do Pente,0.947,130.093
34,Já É Sensação,0.941,134.989


In [77]:
combined_df.sort_values('danceability',ascending=False).drop(columns='id').tail(15)

Unnamed: 0,name,danceability,tempo
79,Aproveita Que a Mamadeira Ta Cheia,0.683,129.753
98,Veracruz,0.677,76.32
62,Segue O Fluxo - Live,0.674,130.057
85,Cheguei no Pistão,0.657,84.016
32,Gata Demais,0.652,129.488
37,Vou Desafiar Você,0.644,132.028
21,Angra dos Reis,0.641,86.611
44,Amor de Verdade,0.638,120.235
71,"Tá Tranquilo, Tá Favorável",0.621,172.397
96,As Mina do Kit,0.616,80.772


To discover the average value of danceability of the 100 tracks, I calculated the mean. With this result, we can clearly see that the mean is high when looking at the danceability range.

In [78]:
combined_df["danceability"].mean()

0.8218300000000002

And because Spotify says that the tempo of a song impacts on the danceability, I got the mean of the tracks' tempo as well. However, after looking at the table, some of the tracks have high tempo and a lower danceability, which could be understandable as tempo is only one of the factors related to danceability. 

In [79]:
combined_df["tempo"].mean()

124.02296

## Conclusion

With this analysis, I was able to prove that my hypothesis of Brazilian Funk having a high danceability was correct. The danceability mean (0.82) was inside the range I expected it to be, which was 0.75 and 0.9. Not only that, but some of the tracks have a danceability higher than what I was expecting, reaching 0.975, almost hitting the maximum value. Plus, as stated before, the lowest danceability was higher than half of the possible score.

As mentioned previously, the tempo does not seem to be directly related to the danceability value, so it does not mean that if the tempo is high that the danceability will be also high, even though Spotify says the tempo goes into consideration when determining a track's danceability. It is also possible to conclude that by looking at the mean tempo (124 BPM), as it is actually around the average BPM for songs in general. Hence, if tempo was directly connected to a songs danceability, many songs with an average to high BPM would be considered danceable, but as we can see with the song "Oh Nanana," a high BPM (172.3) does not equal to a high danceability (0.6). 

The limitation of this analysis is that I was not able to get the whole playlist, as well as, even if I had been able, it would not be the entirety of Funk. However, as I mentioned before, it would be a good sample as it would have 415 song to look into. Also, considering that these are the most listened in parties, it would provide a sample that gets the big picture of what Brazilians normally enjoy regarding danceability of songs - as Brazilians generally love to dance to Funk in parties. In addition to the limitations, enters the fact that the danceability of a song is subjective to the person that is listening to the song. 

In a future analysis, to further dive into the results of this hypothesis, it would be interesting to compare these results with another danceable song genre (for example, KPOP is known for its dances, or Samba that is also a Brazilian music genre that people dance to) in order to get a better idea on how danceable Brazilian Funk actually is. This would be a good inclusion to the analysis as by comparing results it would be possible to determine if other danceable music genres surpass Funk's danceability. Another way to improve this report would be inserting a graph to put the table's information in a way that the results can be easily read and understood. It was my intent to do it for this version, however, I was not able to do it as I did not have the right package to do so - which can also be considered a limitation.