# **Spotify Data Analysis**
---
I want to accomplish 2 things with this project. First, I want to learn how to use the Spotify API. Learning how to use this API serves as a great gateway into the API Universe. 
The documentation is amazing, the API calls you can make to Spotify, per day, is more than enough for almost any kind of project and, the information you can get from it is really interesting.

I also wanto to do some research on the song profiles that different countries consume and predict if new releases could be successful in different regions. To accomplish this, I'm going to create a dataframe with all the songs from some of the most popular playlists per country. Once I have these songs I'm going to use the [**Spotify Audio Features**](https://developer.spotify.com/documentation/web-api/reference/#endpoint-get-several-audio-features) on each song and, as the last step, I'm going to use a Machine Learning model to use these features as my Dependent Variable in a prediction exercise. 

In addition to that, I wanted to do some research on the characteristics of music that each country consumes and how this consumption has changed over the last decade. To do this I want to create a dataset with all the songs from the most important albums of the last 20 years, analyze their characteristics, and see if there's a particular change in the consumption of certain types of artists or genres.

## 1. Required Libraries
---

In [2]:
import requests
import base64
import datetime
import pandas as pd
from urllib.parse import urlencode

In [6]:
from IPython.display import Image
from IPython.core.display import HTML 

## 2. Helper Functions
---
I created several functions inside the `helper_func.py` python file. I use them inside this notebook to gather all the data I need from Spotify, in the most efficient way possible.

In [1]:
%load_ext autoreload
%autoreload 2
import helper_func as hf

### 2.1 Description
---

* `auth()`: This functions generates the `access_token` by requesting the Client_ID and the Client_Secret. This token is important because it's the key to use all the functionalities of the API.
* `search_spotify()`: The purpose of this function is to do searches using the Spotify API and get a JSON file with the information that was requested. Using this function, we can search for information about albums, artists, playlists, tracks, shows, and episodes.
* `get_list_of_albums()`: This query will return a dataframe with all the albums from a single artist. 
* `album_information()`: This function returns key information about a list of album ids.
* `get_multiple_artists_from_albums()`: Some albums have more than 1 artist. This function creates a dataframe that creates new columns for each of the artists that collaborated on the album.
* `songs_information()`:This function returns a dataframe with all the songs from an artist along with additional data from those songs and the function also returns a list with the unique ids from those songs.
* `artists_from_songs()`: Some songs have more than 1 performer. This list creates a dataframe that adds new columns for each artist that was involved with the song.
* `multiple_artists_songs()`: This function can return a dataframe with detailed information about an artist.
* `song_features()`: This function returns a dataframe with the features that Spotify assigns to each song.
* `playlist_data()`: This function returns a dataframe with key data about a particular playlist.

## 3. Set-Up
---

### 3.1 Access Token
---
Depending on the level of access you want, there are several ways of interacting with the Spotify API.
For my personal workflow I created a function called `auth()`. This function handles all the steps that needed to be fullfilled to get the access token. When called, the function will ask for a `Client_ID` and a `Client_Secret`. You can learn how to get those access in this [**article**](https://developer.spotify.com/documentation/general/guides/app-settings/).


<ins>**Notes:**</ins>
* [**Here**](https://developer.spotify.com/documentation/general/guides/authorization-guide/#client-credentials-flow) you can find the article in which Spotify explains how the `access token` works. In addition, it also mentions different workflows that can be followed in order to get the token. 

In [46]:
access_token = hf.auth()

client_id:  6d9cda272a144d6988f08949e9f4cad9
client_secret:  28eb8fce5c3448acae7406415e84d1d9


## 4. Getting Data From Spotify
---
The data from this project comes from the Spotify API. To learn more about the different calls you can make to the API check their documentation by clicking [**here**](https://developer.spotify.com/documentation/web-api/reference/).

I'll be using the "helper functions" that I previously imported as `hf`

### 4.1 `Search` function.
---
There´s a function inside the `helper_functions` library called `search_spotify`. The purpose of this function is to do searches using the Spotify API and get a JSON file in return with the inforamtion that was requested. 
Using this function, we can search information about albums, artists, playlists, tracks, shows and episodes.

This function accepts 3 parameters:
* `access_token`: The json response we get after running the `hf.auth()`´function.
* `query`: The term we want to look for.
* `search_type`: Add the string 'albums', 'artists', 'playlists', 'tracks', 'shows' or 'episodes' depending on what result you're looking to get.

<ins>**Notes:**</ins>
* Click [**here**](https://developer.spotify.com/documentation/web-api/reference/#category-search) to learn more about the "Search" module of the Spotify API.
* If you want to test the "Search" API call in a Spotify Console click [**here**](https://developer.spotify.com/console/get-search-item/).

In [7]:
# Search_Type Options: album , artist, playlist, track, show and episode
albums = hf.search_spotify(access_token, query="Fine Line", search_type='album')
album_cover = albums["albums"]["items"][0]["images"][1]["url"]
album_id = albums["albums"]["items"][0]["id"]
Image(url= f"{album_cover}", width=200, height=200)

### 4.2 Albums Function
---
There´s a function inside the `helper_functions` library called `get_list_of_albums()`. The purpose of this function is to return a list with all the albums from a single artist.

The parameters that this function accepts are: 
>* `at`: The Access_Token. REQUIRED
>* `artist`: String with the name of a single artist. OPTIONAL
>* `lookup_id`: ID of a single Artist. OPTIONAL.
>* `market`: Choose the country you would like to get the information from. The default is "US". OPTIONAL

<ins>**Notes:**</ins>
* You must choose to use `artist` or `lookup_id` but not the two at the same time. 
* Click [**here**](https://developer.spotify.com/documentation/web-api/reference/#category-albums)) to learn more about the "Albums" module on the Spotify API.
* If you want to test the "Tracks" API call in a Spotify Console click [**here**](https://developer.spotify.com/console/albums/).


In [12]:
albums_ids= hf.get_list_of_albums(at=access_token, artist="Ed Sheeran", lookup_id=None)
albums_ids[0:5]

['3oIFxDIo2fwuk4lwCmFZCx',
 '5oUZ9TEZR3wOdvqzowuNwl',
 '3T4tUhGYeRNVUGevb0wThu',
 '2hyDesSAYNefikDJXlqhPE',
 '1xn54DMo2qIqBuMqHtUsFd']

### 4.2.1 Information About the Albums
---
There´s a function inside the helper_functions library called `album_information()`. The purpose of this function is to return a dataframe with key information about the albums that are passed to it.

This function simultaneously will return a json file that can be used by other functions inside the the `hf` library.

This function accepts the next parameters:
>* `list_of_albums`. A python list with all the albums that we want to transform into a dataset. REQUIRED.
>* `at`: Which is the Access_Token. REQUIRED
>* `market`: Choose the country you would like to get the information from. The default is "US". OPTIONAL


In [15]:
album_info_list, albums_json = hf.album_information(list_of_albums = albums_ids, at=access_token) 
album_info_list.head()

Unnamed: 0,name_of_album,album_id,album_url,album_genres,album_cover,album_popularity,release_date
0,No.6 Collaborations Project,3oIFxDIo2fwuk4lwCmFZCx,https://open.spotify.com/album/3oIFxDIo2fwuk4l...,[],https://i.scdn.co/image/ab67616d00001e0273304c...,85,2019-07-12
1,No.6 Collaborations Project,5oUZ9TEZR3wOdvqzowuNwl,https://open.spotify.com/album/5oUZ9TEZR3wOdvq...,[],https://i.scdn.co/image/ab67616d00001e027ed2a6...,55,2019-07-12
2,÷ (Deluxe),3T4tUhGYeRNVUGevb0wThu,https://open.spotify.com/album/3T4tUhGYeRNVUGe...,[],https://i.scdn.co/image/ab67616d00001e02ba5db4...,92,2017-03-03
3,5,2hyDesSAYNefikDJXlqhPE,https://open.spotify.com/album/2hyDesSAYNefikD...,[],https://i.scdn.co/image/ab67616d00001e022fec3a...,63,2014-06-23
4,x (Deluxe Edition),1xn54DMo2qIqBuMqHtUsFd,https://open.spotify.com/album/1xn54DMo2qIqBuM...,[],https://i.scdn.co/image/ab67616d00001e0213b3e3...,87,2014-06-21


### 4.2.2 Multiple Artists on a single Album
---
Some albums have more than 1 artist. This function creates a dataframe that creates new columns for each of the artists that collaborated on the album.

The only parameter this function accepts is:
>* `albums_json`: A json file previously generated when the `album_information()` function is called.  REQUIRED.

In [20]:
album_info_list, albums_json = album_information(list_of_albums = albums_ids, at=access_token) 
album_info_list.head()

Unnamed: 0,album_id,song_id,name_of_song,duration,song_url,song_preview
0,3oIFxDIo2fwuk4lwCmFZCx,70eFcWOvlMObDhURTqT4Fv,Beautiful People (feat. Khalid),197866,https://open.spotify.com/track/70eFcWOvlMObDhU...,https://p.scdn.co/mp3-preview/3ad904af9567a7c7...
1,3oIFxDIo2fwuk4lwCmFZCx,4vUmTMuQqjdnvlZmAH61Qk,South of the Border (feat. Camila Cabello & Ca...,204466,https://open.spotify.com/track/4vUmTMuQqjdnvlZ...,https://p.scdn.co/mp3-preview/686f1dc5c92030c8...
2,3oIFxDIo2fwuk4lwCmFZCx,4wuCQX7JvAZLlrcmH4AeZF,Cross Me (feat. Chance the Rapper & PnB Rock),206186,https://open.spotify.com/track/4wuCQX7JvAZLlrc...,https://p.scdn.co/mp3-preview/b79732826d505f62...
3,3oIFxDIo2fwuk4lwCmFZCx,1AI7UPw3fgwAFkvAlZWhE0,Take Me Back to London (feat. Stormzy),189733,https://open.spotify.com/track/1AI7UPw3fgwAFkv...,https://p.scdn.co/mp3-preview/0184ae44a09f81e1...
4,3oIFxDIo2fwuk4lwCmFZCx,0VsGaRXR5WAzpu51unJTis,Best Part of Me (feat. YEBBA),243266,https://open.spotify.com/track/0VsGaRXR5WAzpu5...,https://p.scdn.co/mp3-preview/e283388f212d47cf...


### 4.3 Get all the `Tracks` from a single Artist.
---
There´s a function inside the `helper_functions` library called `song_information()`. The purpose of this function is to get all the tracks from an artist.

The only parameter this function accepts is:
>* `albums_json`: A json file previously generated when the `album_information()` function is called.  REQUIRED.


<ins>**Notes:**</ins>
* Click [**here**](https://developer.spotify.com/documentation/web-api/reference/#category-tracks) to learn more about the "Tracks" module on the Spotify API.
* If you want to test the "Tracks" API call in a Spotify Console click [**here**](https://developer.spotify.com/console/get-several-tracks/).

In [31]:
list_of_songs_, list_of_songs_tolist = hf.songs_information(albums_json= albums_json)
list_of_songs_.head()

Unnamed: 0,album_id,song_id,name_of_song,duration,song_url,song_preview
0,3oIFxDIo2fwuk4lwCmFZCx,70eFcWOvlMObDhURTqT4Fv,Beautiful People (feat. Khalid),197866,https://open.spotify.com/track/70eFcWOvlMObDhU...,https://p.scdn.co/mp3-preview/3ad904af9567a7c7...
1,3oIFxDIo2fwuk4lwCmFZCx,4vUmTMuQqjdnvlZmAH61Qk,South of the Border (feat. Camila Cabello & Ca...,204466,https://open.spotify.com/track/4vUmTMuQqjdnvlZ...,https://p.scdn.co/mp3-preview/686f1dc5c92030c8...
2,3oIFxDIo2fwuk4lwCmFZCx,4wuCQX7JvAZLlrcmH4AeZF,Cross Me (feat. Chance the Rapper & PnB Rock),206186,https://open.spotify.com/track/4wuCQX7JvAZLlrc...,https://p.scdn.co/mp3-preview/b79732826d505f62...
3,3oIFxDIo2fwuk4lwCmFZCx,1AI7UPw3fgwAFkvAlZWhE0,Take Me Back to London (feat. Stormzy),189733,https://open.spotify.com/track/1AI7UPw3fgwAFkv...,https://p.scdn.co/mp3-preview/0184ae44a09f81e1...
4,3oIFxDIo2fwuk4lwCmFZCx,0VsGaRXR5WAzpu51unJTis,Best Part of Me (feat. YEBBA),243266,https://open.spotify.com/track/0VsGaRXR5WAzpu5...,https://p.scdn.co/mp3-preview/e283388f212d47cf...


### 4.4 Get all the artists that colaborate on the Tracks we're exploring
---
There´s a method inside the `helper_functions` library called `artists_from_songs()`. This function helps to create a dataframe that adds new columns for each artist that was involved with the song.

This function accepts the next parameters:
>* `list_of_songs_ids`: A python list with the unique ids of songs. A list of these characteristics is generated after calling the `songs_information()` function. However, it works with any python list as long as it has the unique id's that Spotify assigns to each song.  REQUIRED.
>* `at`: The Access_Token. REQUIRED


<ins>**Notes:**</ins>
* To get a list with all the albums from a single artist, I recommend to use the `album_ids()` method from the `hf` library.


In [24]:
artists_in_albums_, songs_json, artists_id_, songs_id_ = hf.artists_from_songs(list_of_songs_ids= list_of_songs_tolist,at=access_token)
artists_in_albums_

Unnamed: 0,song_id,song_popularity,song_image,name_artist_1,id_artist_1,name_artist_2,id_artist_2,name_artist_3,id_artist_3,name_artist_4,id_artist_4,name_artist_5,id_artist_5
0,0A2J5TumCpT4aJVvQHNEQW,46,https://i.scdn.co/image/ab67616d00001e02d08209...,Ed Sheeran,6eUKZXaKkcviH0Ku9w2n3V,,,,,,,,
1,0AtP8EkGPn6SwxKDaUuXec,67,https://i.scdn.co/image/ab67616d00001e0273304c...,Ed Sheeran,6eUKZXaKkcviH0Ku9w2n3V,Eminem,7dGJo4pcD2V6oG8kP0tJRR,50 Cent,3q7HBObVc0L8jNeTe5Gofh,,,,
2,0CNrpbpJ9HsFffF9hqWIIe,43,https://i.scdn.co/image/ab67616d00001e02bc17a9...,Ed Sheeran,6eUKZXaKkcviH0Ku9w2n3V,,,,,,,,
3,0E4Y1XIbs8GrAT1YqVy6dq,89,https://i.scdn.co/image/ab67616d00001e0288e170...,Ed Sheeran,6eUKZXaKkcviH0Ku9w2n3V,,,,,,,,
4,0N5zjRnf8AreOm95iSBXF4,54,https://i.scdn.co/image/ab67616d00001e0213b3e3...,Ed Sheeran,6eUKZXaKkcviH0Ku9w2n3V,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
172,7iBSkXB0pTvZasOLf0Qxk9,56,https://i.scdn.co/image/ab67616d00001e02ed139c...,Ed Sheeran,6eUKZXaKkcviH0Ku9w2n3V,,,,,,,,
173,7oolFzHipTMg2nL7shhdz2,61,https://i.scdn.co/image/ab67616d00001e02ba5db4...,Ed Sheeran,6eUKZXaKkcviH0Ku9w2n3V,,,,,,,,
174,7qiZfU4dY1lWllzX7mPBI3,83,https://i.scdn.co/image/ab67616d00001e02ba5db4...,Ed Sheeran,6eUKZXaKkcviH0Ku9w2n3V,,,,,,,,
175,7sGK98k8lv3hzRznoN909t,43,https://i.scdn.co/image/ab67616d00001e02bc17a9...,Ed Sheeran,6eUKZXaKkcviH0Ku9w2n3V,,,,,,,,


The function also returns a list with all the artist’s id's. In my example I stored it in the variable called `artists_id_`. Here's an example of how that variable would look like if we printed it:

In [25]:
songs_id_[0:10]

['0A2J5TumCpT4aJVvQHNEQW',
 '0AtP8EkGPn6SwxKDaUuXec',
 '0CNrpbpJ9HsFffF9hqWIIe',
 '0E4Y1XIbs8GrAT1YqVy6dq',
 '0N5zjRnf8AreOm95iSBXF4',
 '0NJDrEZdTbVPMKC0Zbglgn',
 '0SuG9kyzGRpDqrCWtgD6Lq',
 '0T0wYEzpL46dCaTL1DLCyD',
 '0Tdh0VTkzmPGeAQIXHYhVJ',
 '0Tel1fmuCxEFV6wBLXsEdk']

### 4.5 Get data from the artists
---
There´s a method inside the `helper_functions` library called `multiple_artists_songs()`. This function helps to create a dataframe with key information about the artists we're passing to it.

The parameters that this function accepts are:
>* `at`: The Access Token
>* `list_of_artists_ids`: A python list with the id's that Spotify assigns to each artist. The function `list_of_songs_tolist()` returns a list with this characteristics.

In [30]:
artist_list_df= hf.multiple_artists_songs(list_of_artists_ids=artists_id_,at=access_token)
artist_list_df.head()

Unnamed: 0,id_artist,name_artist,url,followers,image,artist_popluarity,genre_0,genre_1,genre_2,genre_3,genre_4,genre_5,genre_6,genre_7,genre_8,genre_9
0,0T2sGLJKge2eaFmZJxX7sq,Wretch 32,https://open.spotify.com/artist/0T2sGLJKge2eaF...,262669,https://i.scdn.co/image/d4c5e537525f312ab1f9c7...,59,grime,uk hip hop,,,,,,,,
1,0Y5tJX1MQlPlqiwlOH1tJY,Travis Scott,https://open.spotify.com/artist/0Y5tJX1MQlPlqi...,15099972,https://i.scdn.co/image/5801b0d47fbf34b228a1f8...,95,rap,slap house,,,,,,,,
2,0du5cEVh5yTK9QJze8zA0C,Bruno Mars,https://open.spotify.com/artist/0du5cEVh5yTK9Q...,28718094,https://i.scdn.co/image/aba91de7087e3b657cf11e...,90,pop,,,,,,,,,
3,1anyVhU62p31KFi8MEzkbf,Chance the Rapper,https://open.spotify.com/artist/1anyVhU62p31KF...,5179714,https://i.scdn.co/image/091f88e2ae654cd16458aa...,85,chicago rap,conscious hip hop,hip hop,pop rap,rap,,,,,
4,1ooV8YZC1KbpEcrmI8WH0F,Yebba,https://open.spotify.com/artist/1ooV8YZC1KbpEc...,137588,https://i.scdn.co/image/bacf2ca5ee47adafd59291...,67,pop soul,,,,,,,,,


### 4.6 Get Features from each song
---
This function returns a dataframe with the features that Spotify assigns to each song.
The parameters that this function accept are:
>* `at`: The Access Token 
>* `list_of_songs_ids`: A python list with the unique id's that Spotify assigns to each track. The functions `list_of_songs_ids()` and `artists_from_songs()` return a list with this characteristisc.


In [34]:
song_features, songs_features_json= hf.song_features(list_of_songs_ids=list_of_songs_tolist, at=access_token)
song_features

Unnamed: 0,song_id,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,5TvFfDlVoUWZvfqrhTJzD7,0.464,0.321,2,-11.120,1,0.0418,0.8770,0.000000,0.0789,0.306,93.528
1,0xCA70t1ZA4fa9UOE0lIJm,0.697,0.377,3,-7.755,1,0.0397,0.5560,0.000000,0.0999,0.336,138.754
2,6K8qKeWo5MsFED7wCR6Kop,0.818,0.670,8,-4.451,0,0.0472,0.3040,0.000001,0.0601,0.939,119.988
3,1huvTbEYtgltjQRXzrNKGi,0.806,0.608,1,-7.008,1,0.0659,0.0113,0.000000,0.6350,0.849,95.049
4,3GKSfF6c3Rv3a1knSjxnXa,0.791,0.669,6,-7.189,0,0.1460,0.1750,0.000000,0.3300,0.809,92.013
...,...,...,...,...,...,...,...,...,...,...,...,...
173,3MUKlVjVJBvPf4flrDq8KB,0.693,0.833,6,-5.408,0,0.2630,0.2140,0.000000,0.1170,0.759,99.901
174,7bXTXa2zmMuRZPCBEWcjHw,0.437,0.500,9,-7.103,1,0.1050,0.7310,0.000000,0.1570,0.437,85.293
175,0dNt7lZks5TVxiGaCIY1yZ,0.511,0.910,11,-6.186,0,0.1160,0.0282,0.000000,0.1270,0.426,140.147
176,4SVST4FR6Gj74UJCzCiOSX,0.855,0.678,9,-6.864,1,0.0520,0.1010,0.002320,0.0964,0.429,120.012


### 4.7 Information from Playlists
---
This function returns a dataframe with key data about a particular playlist.
The parameters that this function accepts are:
>* `at`: The Access Token
>* `playlist_id`: The unique id that Spotify assigns to each playlist.

<ins>**Notes:**</ins>
* Click [**here**](https://developer.spotify.com/documentation/web-api/reference/#category-playlists) to learn more about the information you can get from the "Playlist" API call.



In [40]:
play_list_json_V2, empty_list_one_V2= hf.playlist_data(at=access_token, playlist_id="37i9dQZF1DWWZJHBoz7SEG")
empty_list_one_V2.head()

Unnamed: 0,song_id,playlist_id,playlist_name,playlist_owner,owner_url,playlist_url,playlist_followers,playlist_cover_art,song_added_at,song_name,...,song_popularity,song_url,name_artist_1,id_artist_1,name_artist_2,id_artist_2,name_artist_3,id_artist_3,name_artist_4,id_artist_4
0,0J0Hci2kFZEZjEKZ7Z17zv,37i9dQZF1DWWZJHBoz7SEG,Novedades Viernes MX,Spotify,https://open.spotify.com/user/spotify,https://open.spotify.com/playlist/37i9dQZF1DWW...,625498,https://i.scdn.co/image/ab67706f00000003ca70d1...,2021-02-12T06:00:00Z,Steak,...,0,https://open.spotify.com/track/0J0Hci2kFZEZjEK...,Adan Cruz,645xd9cHiiLqqehoLzLMDR,,,,,,
1,0UT1rpFwDw5GLM7jDVs9pN,37i9dQZF1DWWZJHBoz7SEG,Novedades Viernes MX,Spotify,https://open.spotify.com/user/spotify,https://open.spotify.com/playlist/37i9dQZF1DWW...,625498,https://i.scdn.co/image/ab67706f00000003ca70d1...,2021-02-12T06:00:00Z,Quedate Sola,...,0,https://open.spotify.com/track/0UT1rpFwDw5GLM7...,Jon Z,5bWUlnPx9OYKsLiUJrhCA1,Myke Towers,7iK8PXO48WeuP03g8YR51W,Eladio Carrion,5XJDexmWFLWOkjOEjOVX3e,,
2,0yGJT29zL4YZnzXAJq6WMR,37i9dQZF1DWWZJHBoz7SEG,Novedades Viernes MX,Spotify,https://open.spotify.com/user/spotify,https://open.spotify.com/playlist/37i9dQZF1DWW...,625498,https://i.scdn.co/image/ab67706f00000003ca70d1...,2021-02-12T06:00:00Z,EL SANTO,...,0,https://open.spotify.com/track/0yGJT29zL4YZnzX...,VICE MENTA,5Aw0EHnWZ9YBfsYN3bjZJH,,,,,,
3,12RrfFKWOOzbhjt1LtQgxj,37i9dQZF1DWWZJHBoz7SEG,Novedades Viernes MX,Spotify,https://open.spotify.com/user/spotify,https://open.spotify.com/playlist/37i9dQZF1DWW...,625498,https://i.scdn.co/image/ab67706f00000003ca70d1...,2021-02-12T06:00:00Z,Change,...,59,https://open.spotify.com/track/12RrfFKWOOzbhjt...,Pale Waves,0wOej91SVqB1zcYkW6xUtA,,,,,,
4,1CWzBVoGYCiO8x3L1UKH2R,37i9dQZF1DWWZJHBoz7SEG,Novedades Viernes MX,Spotify,https://open.spotify.com/user/spotify,https://open.spotify.com/playlist/37i9dQZF1DWW...,625498,https://i.scdn.co/image/ab67706f00000003ca70d1...,2021-02-12T06:00:00Z,Santa Paloma,...,0,https://open.spotify.com/track/1CWzBVoGYCiO8x3...,Jd Pantoja,7yjRUA0Iz3VI4Kqa5oPJZK,,,,,,


## 5. Data Analysis
---
Now we have the neccesary functions to start analyzing our data. As mentioned at the beggining of this notebook, we want to get a sample of songs from different countries and analyze the charactersitics of those songs. Creating this dataframe is going to be a 3 step process:
1. Get 10 playlists from each country. This will create a robust sample dataframe.
2. Get all the songs from those playlists and create a dataframe with them.
3. Once we have the dataframe with all of our songs, we add the features to them.

### 5.1. Getting the playlists
---
Spotify has an API call that gathers the top playlists per country. There's a function inside our `hf` library called `top_playlists()` which accepts a list of countries as parameters and returns a dataframe with the top playlists from those countries. 

In [50]:
top_playlists_per_country = hf.top_playlists(country= ["CA","GB"], at=access_token)
top_playlists_per_country.iloc[[8,9,10,11],:]

Unnamed: 0,name_of_playlist,playlist_id,owner,playlist_cover,country
8,Folk & Friends,37i9dQZF1DWWv6MSZULLBi,Spotify,https://i.scdn.co/image/ab67706f00000003873d8d...,CA
9,indie pop & chill,37i9dQZF1DX5y8xoSWyhcz,Spotify,https://i.scdn.co/image/ab67706f00000003a7aa2e...,CA
10,Peaceful Piano,37i9dQZF1DX4sWSpwq3LiO,Spotify,https://i.scdn.co/image/ab67706f00000003ca5a75...,GB
11,Sleep,37i9dQZF1DWZd79rJ6a7lp,Spotify,https://i.scdn.co/image/ab67706f00000003b70e02...,GB


### 5.2 Songs from the playlists
---
We can use the `playlist_data()` function to get the songs from the playlists we got in the last step.

In [53]:
def get_songs_from_recommended_playlists(playlists):
    
    # Getting the playlists id's from the "top_playlists()" function
    playlists_ids= playlists.playlist_id.tolist()
    count_playlists_ids= range(len(playlists_ids))
    df_countries= playlists[["playlist_id","country"]] 
    
    # Getting all the songs from the playlists ids
    empty_list_one=[]
    
    for ids_of_songs in count_playlists_ids:
        play_list_json_V2, songs_from_playlist= hf.playlist_data(at=access_token, playlist_id=f"{playlists_ids[ids_of_songs]}")
        empty_list_one.append(songs_from_playlist)
    
    df_songs_many_features = pd.concat(empty_list_one).merge(df_countries, on="playlist_id", how="inner").drop_duplicates(subset="song_id")
    list_df_songs_many_features = df_songs_many_features.song_id.tolist()

    return list_df_songs_many_features, df_songs_many_features

list_df_songs_many_features, df_songs_many_features = get_songs_from_recommended_playlists(playlists=top_playlists_per_country)

df_songs_many_features.head()

Unnamed: 0,song_id,playlist_id,playlist_name,playlist_owner,owner_url,playlist_url,playlist_followers,playlist_cover_art,song_added_at,song_name,...,id_artist_12,name_artist_13,id_artist_13,name_artist_14,id_artist_14,name_artist_15,id_artist_15,name_artist_16,id_artist_16,country
0,01TfjDpIKnUXpJBjzv4j1i,37i9dQZF1DX59ogDi1Z2XL,Northern Bars,Spotify,https://open.spotify.com/user/spotify,https://open.spotify.com/playlist/37i9dQZF1DX5...,128127,https://i.scdn.co/image/ab67706f000000032f10a1...,2021-02-12T20:46:48Z,SouthWay,...,,,,,,,,,,CA
1,04upOOfD7Ma3nCUbXAbcWR,37i9dQZF1DX59ogDi1Z2XL,Northern Bars,Spotify,https://open.spotify.com/user/spotify,https://open.spotify.com/playlist/37i9dQZF1DX5...,128127,https://i.scdn.co/image/ab67706f000000032f10a1...,2021-02-12T20:46:48Z,Lowkey,...,,,,,,,,,,CA
2,0SK6p0iQBwpWNwKVI4iLwq,37i9dQZF1DX59ogDi1Z2XL,Northern Bars,Spotify,https://open.spotify.com/user/spotify,https://open.spotify.com/playlist/37i9dQZF1DX5...,128127,https://i.scdn.co/image/ab67706f000000032f10a1...,2021-02-12T20:46:48Z,MURDA,...,,,,,,,,,,CA
3,0d2CMyvh7aI8nvBdygh8NY,37i9dQZF1DX59ogDi1Z2XL,Northern Bars,Spotify,https://open.spotify.com/user/spotify,https://open.spotify.com/playlist/37i9dQZF1DX5...,128127,https://i.scdn.co/image/ab67706f000000032f10a1...,2021-02-12T20:46:48Z,Tha Great,...,,,,,,,,,,CA
4,0fMVy8dyJaTe2q8HQRPSwk,37i9dQZF1DX59ogDi1Z2XL,Northern Bars,Spotify,https://open.spotify.com/user/spotify,https://open.spotify.com/playlist/37i9dQZF1DX5...,128127,https://i.scdn.co/image/ab67706f000000032f10a1...,2021-02-12T20:46:48Z,Talk,...,,,,,,,,,,CA


### 5.3 Adding features to each song.
---
Now that we have all the songs from each playlist, we can use the `song_features()` function to add their features.

In [54]:
song_features = hf.song_features(list_of_songs_ids=list_df_songs_many_features, at=access_token)

2mjkEoiMDNarJvDQnOvStg,2vvSCgK0NMYiOImew2oYbJ,39vf62opOHTgnMh0bWSwju,3HG0bL6apxDTfjijRapnI2,3OrMbFUgXjchOYTU8TeLO7,3VXl2FieAjX3nqDsTwkX8g,3YlUuMG4Guve8SxlxYqJ2W,3rOThiv6lE8uYQJRF1APCM,3ujPhHwu1MYWjLZCmJWKRB,41EdsGaHftwAuwdJYsD9uw,48eQQlY608dm0wjJJ3VkFN,4LNWwDQY9IcdbzsEEvPXgD,4MjW0jp8b9hAe1dmzJIBCG,4Np4NOxEBsYvPNBZ8W0BDu,4T0ID31kp0iHMJyvAZ8oK7,4WyhOKXtCkNVcgfTBpYaxm,4YCTdBLEM0GSSvRenOseC1,4cnoGoLpEPe3Kp4zQMW4Fw,4ebMR5EAgtEwlYH5Yj4G8Q,4hDlAKBTP2Bcepfb28H60s,4iELCZDu8BTCgP0MWNLQ9u,4l8fwDaGkOWJqvNacpcBTo,4n0Hki3RG2Fa35s5PORLIq,4npxPQ7jYsO9RL6aN9EDjl,4tBdLWvYYr5IjJpvwiZr85,5ACjF83kNOF5awgCam7nDv,5IxCT70ElYRTxbkcx7nXRl,5PdtQ4Ms8T7anDn5UyBtN8,5R0dsmon2gTLCkXaCDDGKf,5V3Fl99ZzV7s9NBvjNM3W2,5VDnV2q6OrlrPsHsDZtIP1,5XU99uJiaZIlfy1ID23veH,5jFVQLyCudes6YFmS84jML,63GyZaNNNigHM47t7eaGzA,65kSgWVGY4JIWNLWnfEwbe,65ximzcC9MLQNOYHmiWpJ9,682HD7Z4WzSxqyjEJQgtBU,6NiUjNWHzjvPWLwvXQrFdU,6NzTCxQukuyhipmtcizNwq,6S7o5V57f8sa1Ri3UNqmmf,6UYbt0ZVXP5pnrhzk8z2d8,6ahaGOEIb9WO7zIExmMcWR,6bNA5yC5I0NPJBISlFZKcB,6cm0CsFVU8Q

In [55]:
song_features.head()

Unnamed: 0,song_id,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,2mjkEoiMDNarJvDQnOvStg,0.812,0.607,1,-11.695,1,0.0623,0.000319,0.879,0.0776,0.537,121.002
1,2vvSCgK0NMYiOImew2oYbJ,0.584,0.852,7,-7.117,0,0.129,0.036,0.0923,0.125,0.229,159.887
2,39vf62opOHTgnMh0bWSwju,0.803,0.771,11,-6.353,0,0.0499,9e-05,0.874,0.244,0.184,129.015
3,3HG0bL6apxDTfjijRapnI2,0.724,0.698,2,-11.161,1,0.0541,0.0251,0.833,0.0968,0.238,124.013
4,3OrMbFUgXjchOYTU8TeLO7,0.678,0.868,8,-7.694,1,0.0362,0.00131,0.779,0.335,0.152,124.01


### 5.3.1 Merging the features with the playlist information
---

In [56]:
df_p = song_features.merge(df_songs_many_features, on="song_id", how="inner")
df_p.head()

Unnamed: 0,song_id,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,...,id_artist_12,name_artist_13,id_artist_13,name_artist_14,id_artist_14,name_artist_15,id_artist_15,name_artist_16,id_artist_16,country
0,2mjkEoiMDNarJvDQnOvStg,0.812,0.607,1,-11.695,1,0.0623,0.000319,0.879,0.0776,...,,,,,,,,,,GB
1,2vvSCgK0NMYiOImew2oYbJ,0.584,0.852,7,-7.117,0,0.129,0.036,0.0923,0.125,...,,,,,,,,,,GB
2,39vf62opOHTgnMh0bWSwju,0.803,0.771,11,-6.353,0,0.0499,9e-05,0.874,0.244,...,,,,,,,,,,GB
3,3HG0bL6apxDTfjijRapnI2,0.724,0.698,2,-11.161,1,0.0541,0.0251,0.833,0.0968,...,,,,,,,,,,GB
4,3OrMbFUgXjchOYTU8TeLO7,0.678,0.868,8,-7.694,1,0.0362,0.00131,0.779,0.335,...,,,,,,,,,,GB


### 5.4 Getting the preliminary data analysis
---

In [57]:
comparing_countries = df_p.iloc[:,[0,20,24,1,2,3,4,5,6,7,8,9,10,11,-1]].groupby("country").mean().transpose()
comparing_countries

country,CA,GB
danceability,0.640919,0.556795
energy,0.550668,0.520504
key,5.381008,5.403068
loudness,-8.371981,-14.078446
mode,0.642438,0.548117
speechiness,0.08959,0.067323
acousticness,0.353031,0.372359
instrumentalness,0.080197,0.560196
liveness,0.164009,0.145328
valence,0.463984,0.338476
