# Spotify dataset analytics
### What questions are we answering?
    - What genres of music are the most succesful
    - Which artist per genre are the top ones?
    - Are there any patterns in reproductions by quarters by year?
    - What are the features in a song that impact the most for it success
    - How the music patterns have been evolving in time?

## Metrics by year


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px


In [2]:
metrics_by_year = pd.read_csv('data_by_year.csv')

## 1 - Beggining to explore the data
Let's see how music features has been evoling in time

In [3]:
metrics_by_year.head(5)

Unnamed: 0,year,acousticness,danceability,duration_ms,energy,instrumentalness,liveness,loudness,speechiness,tempo,valence,popularity,key,mode
0,1920,0.631242,0.51575,238092.997135,0.4187,0.354219,0.216049,-12.65402,0.082984,113.2269,0.49821,0.610315,2,1
1,1921,0.862105,0.432171,257891.762821,0.241136,0.337158,0.205219,-16.81166,0.078952,102.425397,0.378276,0.391026,2,1
2,1922,0.828934,0.57562,140135.140496,0.226173,0.254776,0.256662,-20.840083,0.464368,100.033149,0.57119,0.090909,5,1
3,1923,0.957247,0.577341,177942.362162,0.262406,0.371733,0.227462,-14.129211,0.093949,114.01073,0.625492,5.205405,0,1
4,1924,0.9402,0.549894,191046.707627,0.344347,0.581701,0.235219,-14.231343,0.092089,120.689572,0.663725,0.661017,10,1


In [4]:
print(np.amax(metrics_by_year['duration_ms'].values))
normalized_duration = (metrics_by_year['duration_ms'].values - np.amin(metrics_by_year['duration_ms'].values))/(np.amax(metrics_by_year['duration_ms'].values) - np.amin(metrics_by_year['duration_ms'].values))
metrics_by_year['duration'] = normalized_duration
metrics_by_year=  metrics_by_year.drop('duration_ms', axis=1)
metrics_by_year.head(5)


284759.93363844394


Unnamed: 0,year,acousticness,danceability,energy,instrumentalness,liveness,loudness,speechiness,tempo,valence,popularity,key,mode,duration
0,1920,0.631242,0.51575,0.4187,0.354219,0.216049,-12.65402,0.082984,113.2269,0.49821,0.610315,2,1,0.677324
1,1921,0.862105,0.432171,0.241136,0.337158,0.205219,-16.81166,0.078952,102.425397,0.378276,0.391026,2,1,0.814222
2,1922,0.828934,0.57562,0.226173,0.254776,0.256662,-20.840083,0.464368,100.033149,0.57119,0.090909,5,1,0.0
3,1923,0.957247,0.577341,0.262406,0.371733,0.227462,-14.129211,0.093949,114.01073,0.625492,5.205405,0,1,0.261416
4,1924,0.9402,0.549894,0.344347,0.581701,0.235219,-14.231343,0.092089,120.689572,0.663725,0.661017,10,1,0.352025


## 2 - Plotting to observe music evolution over years

In [5]:
metrics_by_year['popularity'].describe()
normalized_popularity = (metrics_by_year['popularity'].values - np.amin(metrics_by_year['popularity'].values))/(np.amax(metrics_by_year['popularity'].values) - np.amin(metrics_by_year['popularity'].values))
metrics_by_year['norm_popularity'] = normalized_popularity

In [6]:
fig =go.Figure()
fig.add_trace(go.Scatter(x=metrics_by_year['year'], y=metrics_by_year['danceability'], mode='markers + lines',name="danceability",showlegend=True))
fig.add_trace(go.Scatter(x=metrics_by_year['year'], y=metrics_by_year['norm_popularity'], mode='markers + lines',name="popularity",showlegend=True))
fig.update_layout(title='Acousticness',height=600)
fig.show()

In [7]:
fig = px.scatter(metrics_by_year, x='year', y='energy', trendline='lowess', trendline_color_override='red', title='Increase of energy in songs during time')
fig.show()


pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.



In [8]:
fig = px.scatter(metrics_by_year, x = 'year', y= 'tempo', trendline="ols", trendline_color_override="red", title="Increase of mean tempo over time")
fig.show()

In [9]:
fig = px.scatter(metrics_by_year, x='year', y='acousticness', trendline='ols', trendline_color_override='red', title='Decrease of acousticness in songs during time')
fig.show()

In [10]:
fig =go.Figure()
fig.add_trace(go.Scatter(x=metrics_by_year['year'], y=metrics_by_year['speechiness'], mode='markers + lines',name="danceability",showlegend=True))
fig.update_layout(title='Evolution of speechiness in songs during time',height=600)
fig.show()

## 3 - Observations
We can see a few patterns in the data. 
    - The mean energy of the songs has been increasing during time. Same as tempo, which makes sense because, even though it is not a rule, there is some relationship between the tempo and energy in music.
    - Speechiness and acousticness, in the ther hand, has been decrasing over time.
We may find an explanation if we put music in a social context. For example, the WW I went from 1914 to 1918 and the WW II went from 1939 to 1945. Seems to be a good explanation that music in that time had a trend to be more about the feelings of the society, telling war or survival stories. That could explain why during that time the speechiness of the music was much higher in general.
After WW II we can observ an increasy in the energy and tempo of the song. To this could have contribud new genres like Rock, metal, Disco in the 80's, pop music.