# Spotify 

Your task is to analyse the data given to you and explore predictive tasks that could provide the company with insight on some burning questions such as:

1. What drives cross-regional popularity of music; is it the artist, or something about the song?(explore if artists from certain parts of the world tend to be more popular worldwide)
2. Can we figure out which artists or genres are going to be popular in 2024 given the historic data from 2017?
3. Does the popularity of a track in one region predict its (upcoming?) popularity in other regions?
4. Are there any patterns in what day(s) of the week and/or months experience the most streams?

1. Who are the most popular artists (say top 10)?
2. Who was the most popular each month?
3. Name the songs people dance to the most?
4. Is there a relationship between danceability and energy or loundness of the music?
5. Does a total number of artists affect the popularity of a song?
5. If it does, then can we predict if more artists would make a song more popular
6. How does valency affect the popularity of a song. (A high valence song is about happiness, excitement and joy, whereas a low valence song is about sadness, anger or depression.
7. Check the mean value of audio features of top 10 songs. Can we predict popularity of songs if they have greater value than the mean value for those features?


1. EDA for each feature
2. Release date and popularity date

In [1]:
import pandas as pd
import numpy as np

In [2]:
data = pd.read_csv(r'~/anaconda3/Anjali-Spotify/Spotify_Dataset_V3.csv',sep=';')

In [3]:
data.head()

Unnamed: 0,Rank,Title,Artists,Date,Danceability,Energy,Loudness,Speechiness,Acousticness,Instrumentalness,Valence,# of Artist,Artist (Ind.),# of Nationality,Nationality,Continent,Points (Total),Points (Ind for each Artist/Nat),id,Song URL
0,1,Ella Baila Sola,"Eslabon Armado, Peso Pluma",29/05/2023,0.668,0.758,-5176.0,0.033,0.483,0.0,0.834,Artist 1,Eslabon Armado,Nationality 1,Mexico,Latin-America,200,100.0,3qQbCzHBycnDpGskqOWY0E,https://open.spotify.com/track/3qQbCzHBycnDpGs...
1,1,Ella Baila Sola,"Eslabon Armado, Peso Pluma",29/05/2023,0.668,0.758,-5176.0,0.033,0.483,0.0,0.834,Artist 2,Peso Pluma,Nationality 2,Mexico,Latin-America,200,100.0,3qQbCzHBycnDpGskqOWY0E,https://open.spotify.com/track/3qQbCzHBycnDpGs...
2,2,WHERE SHE GOES,Bad Bunny,29/05/2023,0.652,0.8,-4019.0,0.061,0.143,0.629,0.234,Artist 1,Bad Bunny,Nationality 1,Puerto Rico,Latin-America,199,199.0,7ro0hRteUMfnOioTFI5TG1,https://open.spotify.com/track/7ro0hRteUMfnOio...
3,3,La Bebe - Remix,"Yng Lvcas, Peso Pluma",29/05/2023,0.812,0.479,-5678.0,0.333,0.213,0.0,0.559,Artist 1,Yng Lvcas,Nationality 1,Mexico,Latin-America,198,99.0,2UW7JaomAMuX9pZrjVpHAU,https://open.spotify.com/track/2UW7JaomAMuX9pZ...
4,3,La Bebe - Remix,"Yng Lvcas, Peso Pluma",29/05/2023,0.812,0.479,-5678.0,0.333,0.213,0.0,0.559,Artist 2,Peso Pluma,Nationality 2,Mexico,Latin-America,198,99.0,2UW7JaomAMuX9pZrjVpHAU,https://open.spotify.com/track/2UW7JaomAMuX9pZ...


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 651936 entries, 0 to 651935
Data columns (total 20 columns):
 #   Column                            Non-Null Count   Dtype  
---  ------                            --------------   -----  
 0   Rank                              651936 non-null  int64  
 1   Title                             651936 non-null  object 
 2   Artists                           651936 non-null  object 
 3   Date                              651936 non-null  object 
 4   Danceability                      651936 non-null  float64
 5   Energy                            651936 non-null  float64
 6   Loudness                          651936 non-null  float64
 7   Speechiness                       651936 non-null  float64
 8   Acousticness                      651936 non-null  float64
 9   Instrumentalness                  651936 non-null  float64
 10  Valence                           651936 non-null  float64
 11  # of Artist                       651936 non-null  o

# Although Title is the same, the ID/Song URL is different for some of the songs and the URL leads to the same song!

In [5]:
data['Continent'].unique()

array(['Latin-America', 'Asia', 'Anglo-America', 'Europe', 'Africa',
       'Oceania', 'Unknown'], dtype=object)

In [6]:
# Nationality and Continent is unknown for 465 rows out of 651,936 rows
data[data['Continent']=="Unknown"]

Unnamed: 0,Rank,Title,Artists,Date,Danceability,Energy,Loudness,Speechiness,Acousticness,Instrumentalness,Valence,# of Artist,Artist (Ind.),# of Nationality,Nationality,Continent,Points (Total),Points (Ind for each Artist/Nat),id,Song URL
2610,182,Cupid – Twin Ver. (FIFTY FIFTY) – Sped Up Version,sped up 8282,21/05/2023,0.737,0.725,-7989.00,0.041,0.417,0.001,0.746,Artist 1,sped up 8282,Nationality 1,Unknown,Unknown,19,19.000000,3B228N0GxfUCwPyfNcJxps,https://open.spotify.com/track/3B228N0GxfUCwPy...
2909,189,Cupid – Twin Ver. (FIFTY FIFTY) – Sped Up Version,sped up 8282,20/05/2023,0.737,0.725,-7989.00,0.041,0.417,0.001,0.746,Artist 1,sped up 8282,Nationality 1,Unknown,Unknown,12,12.000000,3B228N0GxfUCwPyfNcJxps,https://open.spotify.com/track/3B228N0GxfUCwPy...
3213,196,Cupid – Twin Ver. (FIFTY FIFTY) – Sped Up Version,sped up 8282,19/05/2023,0.737,0.725,-7989.00,0.041,0.417,0.001,0.746,Artist 1,sped up 8282,Nationality 1,Unknown,Unknown,5,5.000000,3B228N0GxfUCwPyfNcJxps,https://open.spotify.com/track/3B228N0GxfUCwPy...
3465,168,Cupid – Twin Ver. (FIFTY FIFTY) – Sped Up Version,sped up 8282,18/05/2023,0.737,0.725,-7989.00,0.041,0.417,0.001,0.746,Artist 1,sped up 8282,Nationality 1,Unknown,Unknown,33,33.000000,3B228N0GxfUCwPyfNcJxps,https://open.spotify.com/track/3B228N0GxfUCwPy...
3501,191,Watch This - ARIZONATEARS Pluggnb Remix,"Lil Uzi Vert, sped up nightcore, ARIZONATEARS",18/05/2023,0.686,0.897,-7.18,0.039,0.010,0.103,0.355,Artist 2,sped up nightcore,Nationality 2,Unknown,Unknown,10,3.333333,0FA4wrjDJvJTTU8AepZTup,https://open.spotify.com/track/0FA4wrjDJvJTTU8...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
518508,66,Genius,"Sia, Diplo, Labrinth, LSD",08/05/2018,0.622,0.621,-4924.00,0.129,0.086,0.000,0.567,Artist 4,LSD,Nationality 4,Unknown,Unknown,135,34.000000,73F87Sqh6jQWucOOvz1WFx,https://open.spotify.com/track/73F87Sqh6jQWucO...
518792,72,Genius,"Sia, Diplo, Labrinth, LSD",07/05/2018,0.622,0.621,-4924.00,0.129,0.086,0.000,0.567,Artist 4,LSD,Nationality 4,Unknown,Unknown,129,32.000000,73F87Sqh6jQWucOOvz1WFx,https://open.spotify.com/track/73F87Sqh6jQWucO...
519074,101,Genius,"Sia, Diplo, Labrinth, LSD",06/05/2018,0.622,0.621,-4924.00,0.129,0.086,0.000,0.567,Artist 4,LSD,Nationality 4,Unknown,Unknown,100,25.000000,73F87Sqh6jQWucOOvz1WFx,https://open.spotify.com/track/73F87Sqh6jQWucO...
519360,88,Genius,"Sia, Diplo, Labrinth, LSD",05/05/2018,0.622,0.621,-4924.00,0.129,0.086,0.000,0.567,Artist 4,LSD,Nationality 4,Unknown,Unknown,113,28.000000,73F87Sqh6jQWucOOvz1WFx,https://open.spotify.com/track/73F87Sqh6jQWucO...


Check if one artist impacts songs
Correlation between nationality and some other feature like danceability

In [16]:
Col_Name = []
Max = []
Min = []
Unique = []
for i in range(data.shape[1]):
    col_name = data.columns[i]
    req_data = data.iloc[:,i]
    max_val = max(req_data)
    min_val = min(req_data)
    no_uni = len(req_data.unique())
    Col_Name.append(col_name)
    Max.append(max_val)
    Min.append(min_val)
    Unique.append(no_uni)

In [19]:
req_df = pd.DataFrame(list(zip(Col_Name,Max,Min,Unique)),columns=["Col_Name","Max","Min","Unique"])

In [20]:
req_df

Unnamed: 0,Col_Name,Max,Min,Unique
0,Rank,200,1,200
1,Title,美女と野獣,'98 Braves,7457
2,Artists,"Ñengo Flow, Bad Bunny","$NOT, A$AP Rocky",2928
3,Date,31/12/2022,01/01/2017,2336
4,Danceability,0.985,0.073,739
5,Energy,0.996,0.005,860
6,Loudness,1509.0,-34475.0,5331
7,Speechiness,0.966,0.022,532
8,Acousticness,0.994,0.0,952
9,Instrumentalness,0.956,0.0,305
