# Compare top songs from 2017, 2018, 2019 with each other 

### Description of song features
- **duration_ms** -> The duration of the track in milliseconds.

- **tempo** -> Beats per minute The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.

- **energy** -> Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.

- **danceability** -> Describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

- **loudness** -> The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.

- **liveness** -> Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.

- **acousticness** -> A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

- **speechiness** -> Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.

- **key** - The estimated overall key of the track. Integers map to pitches using standard Pitch Class notation . E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1.

- **valense** - A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

- **mode* - Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.

- **time signatur** -> An estimated overall time signature of a track. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure).

In [6]:
# data processing
import numpy as np
import pandas as pd
# Visualisiation
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import preprocessing

## 1. Loading data

In [49]:
top2017 = pd.read_csv("../Data/df_2017_new.csv")
top2017.head()

Unnamed: 0,id,name,artists,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,dancebility_new,tempo_rate,popularity
0,7qiZfU4dY1lWllzX7mPBI,Shape of You,Ed Sheeran,0.825,0.652,1.0,-3.183,0.0,0.0802,0.581,0.0,0.0931,0.931,95.977,233713.0,4.0,1,moderate,1
1,5CtI0qwDJkDQGwXD1H1cL,Despacito - Remix,Luis Fonsi,0.694,0.815,2.0,-4.328,1.0,0.12,0.229,0.0,0.0924,0.813,88.931,228827.0,4.0,1,moderate,1
2,4aWmUDTfIPGksMNLV2rQP,Despacito (Featuring Daddy Yankee),Luis Fonsi,0.66,0.786,2.0,-4.757,1.0,0.17,0.209,0.0,0.112,0.846,177.833,228200.0,4.0,1,very fast,1
3,6RUKPb4LETWmmr3iAEQkt,Something Just Like This,The Chainsmokers,0.617,0.635,11.0,-6.769,0.0,0.0317,0.0498,1.4e-05,0.164,0.446,103.019,247160.0,4.0,1,moderate,0
4,3DXncPQOG4VBw3QHh3S81,I'm the One,DJ Khaled,0.609,0.668,7.0,-4.284,1.0,0.0367,0.0552,0.0,0.167,0.811,80.924,288600.0,4.0,1,moderate,1


In [50]:
top2018 = pd.read_csv("../Data/top2018.csv")
top2018.head()

Unnamed: 0,id,name,artists,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
0,6DCZcSspjsKoFjzjrWoCd,God's Plan,Drake,0.754,0.449,7.0,-9.211,1.0,0.109,0.0332,8.3e-05,0.552,0.357,77.169,198973.0,4.0
1,3ee8Jmje8o58CHK66QrVC,SAD!,XXXTENTACION,0.74,0.613,8.0,-4.88,1.0,0.145,0.258,0.00372,0.123,0.473,75.023,166606.0,4.0
2,0e7ipj03S05BNilyu5bRz,rockstar (feat. 21 Savage),Post Malone,0.587,0.535,5.0,-6.09,0.0,0.0898,0.117,6.6e-05,0.131,0.14,159.847,218147.0,4.0
3,3swc6WTsr7rl9DqQKQA55,Psycho (feat. Ty Dolla $ign),Post Malone,0.739,0.559,8.0,-8.011,1.0,0.117,0.58,0.0,0.112,0.439,140.124,221440.0,4.0
4,2G7V7zsVDxg1yRsu7Ew9R,In My Feelings,Drake,0.835,0.626,1.0,-5.833,1.0,0.125,0.0589,6e-05,0.396,0.35,91.03,217925.0,4.0


In [75]:
top2019 = pd.read_csv("../Data/datasets_top50-2019.csv", encoding = "ISO-8859-1")
top2019.head()

Unnamed: 0.1,Unnamed: 0,Track.Name,Artist.Name,Genre,Beats.Per.Minute,Energy,Danceability,Loudness..dB..,Liveness,Valence.,Length.,Acousticness..,Speechiness.,Popularity
0,1,Señorita,Shawn Mendes,canadian pop,117,55,76,-6,8,75,191,4,3,79
1,2,China,Anuel AA,reggaeton flow,105,81,79,-4,8,61,302,8,9,92
2,3,boyfriend (with Social House),Ariana Grande,dance pop,190,80,40,-4,16,70,186,12,46,85
3,4,Beautiful People (feat. Khalid),Ed Sheeran,pop,93,65,64,-8,8,55,198,12,19,86
4,5,Goodbyes (Feat. Young Thug),Post Malone,dfw rap,150,65,58,-4,11,18,175,45,7,94


In [76]:
# delete "Unnamed" column and popularity since this feature was created by Spotify based on dayli analysis
top2019 = top2019.drop(["Unnamed: 0", "Popularity"],axis=1)
top2019.head()

Unnamed: 0,Track.Name,Artist.Name,Genre,Beats.Per.Minute,Energy,Danceability,Loudness..dB..,Liveness,Valence.,Length.,Acousticness..,Speechiness.
0,Señorita,Shawn Mendes,canadian pop,117,55,76,-6,8,75,191,4,3
1,China,Anuel AA,reggaeton flow,105,81,79,-4,8,61,302,8,9
2,boyfriend (with Social House),Ariana Grande,dance pop,190,80,40,-4,16,70,186,12,46
3,Beautiful People (feat. Khalid),Ed Sheeran,pop,93,65,64,-8,8,55,198,12,19
4,Goodbyes (Feat. Young Thug),Post Malone,dfw rap,150,65,58,-4,11,18,175,45,7


In [77]:
# rename colums from top2019 for better analysis
top2019 = top2019.rename(columns={"Beats.Per.Minute": "tempo", 
                                  "Track.Name": "name", 
                                  "Artist.Name": "artists", 
                                  "Danceability": "danceability", 
                                  "Energy": "energy", 
                                  "Loudness..dB..": "loudness", 
                                  "Liveness": "liveness", 
                                  "Valence.": "valence", 
                                  "Length.": "duration_ms", 
                                  "Acousticness..": "acousticness", 
                                  "Speechiness.": "speechiness"})
top2019.head()

Unnamed: 0,name,artists,Genre,tempo,energy,danceability,loudness,liveness,valence,duration_ms,acousticness,speechiness
0,Señorita,Shawn Mendes,canadian pop,117,55,76,-6,8,75,191,4,3
1,China,Anuel AA,reggaeton flow,105,81,79,-4,8,61,302,8,9
2,boyfriend (with Social House),Ariana Grande,dance pop,190,80,40,-4,16,70,186,12,46
3,Beautiful People (feat. Khalid),Ed Sheeran,pop,93,65,64,-8,8,55,198,12,19
4,Goodbyes (Feat. Young Thug),Post Malone,dfw rap,150,65,58,-4,11,18,175,45,7


In [78]:
# set all numeric colums to float
top2019[["tempo", "danceability", "energy", "loudness","liveness","valence", "duration_ms", 
         "acousticness", "speechiness"]] = top2019[["tempo", "danceability", "energy",
                                                    "loudness","liveness","valence", "duration_ms",
                                                    "acousticness", "speechiness"]].astype(float)
# compute sec into ms in new duration_ms column 
top2019[["duration_ms"]] = top2019[["duration_ms"]]* 1000
top2019.head()

Unnamed: 0,name,artists,Genre,tempo,energy,danceability,loudness,liveness,valence,duration_ms,acousticness,speechiness
0,Señorita,Shawn Mendes,canadian pop,117.0,55.0,76.0,-6.0,8.0,75.0,191000.0,4.0,3.0
1,China,Anuel AA,reggaeton flow,105.0,81.0,79.0,-4.0,8.0,61.0,302000.0,8.0,9.0
2,boyfriend (with Social House),Ariana Grande,dance pop,190.0,80.0,40.0,-4.0,16.0,70.0,186000.0,12.0,46.0
3,Beautiful People (feat. Khalid),Ed Sheeran,pop,93.0,65.0,64.0,-8.0,8.0,55.0,198000.0,12.0,19.0
4,Goodbyes (Feat. Young Thug),Post Malone,dfw rap,150.0,65.0,58.0,-4.0,11.0,18.0,175000.0,45.0,7.0


In [79]:
top2019[["energy", "speechiness", "danceability", 
         "acousticness", "liveness", "valence"]] = top2019[["energy", "speechiness", "danceability",
                                                            "acousticness", "liveness", "valence"]] / 100
top2019.head()

Unnamed: 0,name,artists,Genre,tempo,energy,danceability,loudness,liveness,valence,duration_ms,acousticness,speechiness
0,Señorita,Shawn Mendes,canadian pop,117.0,0.55,0.76,-6.0,0.08,0.75,191000.0,0.04,0.03
1,China,Anuel AA,reggaeton flow,105.0,0.81,0.79,-4.0,0.08,0.61,302000.0,0.08,0.09
2,boyfriend (with Social House),Ariana Grande,dance pop,190.0,0.8,0.4,-4.0,0.16,0.7,186000.0,0.12,0.46
3,Beautiful People (feat. Khalid),Ed Sheeran,pop,93.0,0.65,0.64,-8.0,0.08,0.55,198000.0,0.12,0.19
4,Goodbyes (Feat. Young Thug),Post Malone,dfw rap,150.0,0.65,0.58,-4.0,0.11,0.18,175000.0,0.45,0.07


## 2. Get a simple overview about datasets

In [55]:
top2017.describe()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,dancebility_new,popularity
count,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0
mean,0.69682,0.66069,5.57,-5.65265,0.58,0.103969,0.166306,0.004796,0.150607,0.517049,119.20246,218387.28,3.99,0.83,0.85
std,0.12508,0.139207,3.731534,1.802067,0.496045,0.095115,0.16673,0.026038,0.079011,0.216436,27.952928,32851.07772,0.1,0.377525,0.35887
min,0.258,0.346,0.0,-11.462,0.0,0.0232,0.000259,0.0,0.0424,0.0862,75.016,165387.0,3.0,0.0,0.0
25%,0.635,0.5565,2.0,-6.5945,0.0,0.043125,0.0391,0.0,0.098275,0.3755,99.91175,198490.5,4.0,1.0,1.0
50%,0.714,0.6675,6.0,-5.437,1.0,0.06265,0.1065,0.0,0.125,0.5025,112.468,214106.0,4.0,1.0,1.0
75%,0.77025,0.7875,9.0,-4.32675,1.0,0.123,0.23125,1.3e-05,0.17925,0.679,137.166,230543.0,4.0,1.0,1.0
max,0.927,0.932,11.0,-2.396,1.0,0.431,0.695,0.21,0.44,0.966,199.864,343150.0,4.0,1.0,1.0


In [56]:
top2017.isna().sum()

id                  0
name                0
artists             0
danceability        0
energy              0
key                 0
loudness            0
mode                0
speechiness         0
acousticness        0
instrumentalness    0
liveness            0
valence             0
tempo               0
duration_ms         0
time_signature      0
dancebility_new     0
tempo_rate          0
popularity          0
dtype: int64

In [60]:
top2018.describe()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
count,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0
mean,0.71646,0.65906,5.33,-5.67764,0.59,0.115569,0.195701,0.001584,0.158302,0.484443,119.90418,205206.78,3.98
std,0.13107,0.145067,3.676447,1.777577,0.494311,0.104527,0.220946,0.013449,0.111662,0.206145,28.795984,40007.893404,0.2
min,0.258,0.296,0.0,-10.109,0.0,0.0232,0.000282,0.0,0.0215,0.0796,64.934,95467.0,3.0
25%,0.6355,0.562,1.75,-6.6505,0.0,0.04535,0.040225,0.0,0.094675,0.341,95.73075,184680.0,4.0
50%,0.733,0.678,5.0,-5.5665,1.0,0.07495,0.109,0.0,0.1185,0.4705,120.116,205047.5,4.0
75%,0.79825,0.77225,8.25,-4.36375,1.0,0.137,0.24775,3.1e-05,0.17075,0.6415,140.02275,221493.25,4.0
max,0.964,0.909,11.0,-2.384,1.0,0.53,0.934,0.134,0.636,0.931,198.075,417920.0,5.0


In [61]:
top2017.isna().sum()

id                  0
name                0
artists             0
danceability        0
energy              0
key                 0
loudness            0
mode                0
speechiness         0
acousticness        0
instrumentalness    0
liveness            0
valence             0
tempo               0
duration_ms         0
time_signature      0
dancebility_new     0
tempo_rate          0
popularity          0
dtype: int64