<h1 style="font-family: 'Times New Roman'; color: #ff0000; text-align: left;">Spotify Top Tracks Analysis</h1>


In [1]:
import numpy as np
import pandas as pd

In [2]:
#load the data using pandas
data = pd.read_csv('data/spotifytoptracks.csv', index_col=0)#index_col=0 to get rid of Unnamed:0
data.head()

Unnamed: 0,artist,album,track_name,track_id,energy,danceability,key,loudness,acousticness,speechiness,instrumentalness,liveness,valence,tempo,duration_ms,genre
0,The Weeknd,After Hours,Blinding Lights,0VjIjW4GlUZAMYd2vXMi3b,0.73,0.514,1,-5.934,0.00146,0.0598,9.5e-05,0.0897,0.334,171.005,200040,R&B/Soul
1,Tones And I,Dance Monkey,Dance Monkey,1rgnBhdG2JDFTbYkYRZAku,0.593,0.825,6,-6.401,0.688,0.0988,0.000161,0.17,0.54,98.078,209755,Alternative/Indie
2,Roddy Ricch,Please Excuse Me For Being Antisocial,The Box,0nbXyq5TXYPCO7pr3N8S4I,0.586,0.896,10,-6.687,0.104,0.0559,0.0,0.79,0.642,116.971,196653,Hip-Hop/Rap
3,SAINt JHN,Roses (Imanbek Remix),Roses - Imanbek Remix,2Wo6QQD1KMDWeFkkjLqwx5,0.721,0.785,8,-5.457,0.0149,0.0506,0.00432,0.285,0.894,121.962,176219,Dance/Electronic
4,Dua Lipa,Future Nostalgia,Don't Start Now,3PfIrDoz19wz7qK7tYeu62,0.793,0.793,11,-4.521,0.0123,0.083,0.0,0.0951,0.679,123.95,183290,Nu-disco


In [3]:
data.index = data.index + 1

In [4]:
#with the easy to read indexing
data.head()

Unnamed: 0,artist,album,track_name,track_id,energy,danceability,key,loudness,acousticness,speechiness,instrumentalness,liveness,valence,tempo,duration_ms,genre
1,The Weeknd,After Hours,Blinding Lights,0VjIjW4GlUZAMYd2vXMi3b,0.73,0.514,1,-5.934,0.00146,0.0598,9.5e-05,0.0897,0.334,171.005,200040,R&B/Soul
2,Tones And I,Dance Monkey,Dance Monkey,1rgnBhdG2JDFTbYkYRZAku,0.593,0.825,6,-6.401,0.688,0.0988,0.000161,0.17,0.54,98.078,209755,Alternative/Indie
3,Roddy Ricch,Please Excuse Me For Being Antisocial,The Box,0nbXyq5TXYPCO7pr3N8S4I,0.586,0.896,10,-6.687,0.104,0.0559,0.0,0.79,0.642,116.971,196653,Hip-Hop/Rap
4,SAINt JHN,Roses (Imanbek Remix),Roses - Imanbek Remix,2Wo6QQD1KMDWeFkkjLqwx5,0.721,0.785,8,-5.457,0.0149,0.0506,0.00432,0.285,0.894,121.962,176219,Dance/Electronic
5,Dua Lipa,Future Nostalgia,Don't Start Now,3PfIrDoz19wz7qK7tYeu62,0.793,0.793,11,-4.521,0.0123,0.083,0.0,0.0951,0.679,123.95,183290,Nu-disco


I wanted to start indexing from 1 instead of 0 because it is way easier for me to work with and also I have gotten rid of the 'Unnamed:0' column which was quite annoying at first

In [5]:
#How many features this dataset has?
print(f'This dataset has {data.shape[1]} features')

This dataset has 16 features


In [6]:
#How many observations are there in this dataset?
print(f'This dataset has {data.shape[0]} observations')

This dataset has 50 observations


In total it gives us a 16x50 table which would make a database of 800 values.

In [37]:
#Identify the categorical rows and set the data type as 'category'
categorical_columns = ['energy', 'danceability', 'loudness', 'acousticness','speechiness','instrumentalness', 'liveness', 'valence', 'duration_ms']
for i in categorical_columns:
    data[i] = data[i].astype('category')

I had first done it one by one but realized it was extremely repetitive so converted it into a for loop to make it easier to read and obey PEP8.

In [38]:
#Which of the features are categorical?
for column in data.columns:
    if data[column].dtype.name == 'category':
        print(f"The column '{column}' is categorical")
    else:
        print(f"The column '{column}' is not categorical")

The column 'artist' is not categorical
The column 'album' is not categorical
The column 'track_name' is not categorical
The column 'track_id' is not categorical
The column 'energy' is categorical
The column 'danceability' is categorical
The column 'key' is not categorical
The column 'loudness' is categorical
The column 'acousticness' is categorical
The column 'speechiness' is categorical
The column 'instrumentalness' is categorical
The column 'liveness' is categorical
The column 'valence' is categorical
The column 'tempo' is not categorical
The column 'duration_ms' is categorical
The column 'genre' is not categorical


There is a thin difference between numerical and categorical data for this dataset but as long as you are familiar with the context of the data you can easily tell them apart.

In [9]:
#Which of the features are numeric?
numeric_columns = []
for column in data.columns:
    if (pd.api.types.is_numeric_dtype(data[column].values)) or (data[column].dtype.name) == 'category':
        numeric_columns.append(column)
print("Numeric features:")
for column in numeric_columns:
    print(column)
    

Numeric features:
energy
danceability
key
loudness
acousticness
speechiness
instrumentalness
liveness
valence
tempo
duration_ms


In [10]:
#Are there any artists that have more than 1 popular track? If yes, which and how many?
value_counting = data['artist'].value_counts()
print(value_counting[value_counting > 1])

Billie Eilish    3
Dua Lipa         3
Travis Scott     3
Justin Bieber    2
Harry Styles     2
Lewis Capaldi    2
Post Malone      2
Name: artist, dtype: int64


It is also worth to mention that 4/7 artists that have more than 1 popular track are in pop genre.

In [11]:
#Who was the most popular artist?
print(f'Most popular artist was {data.loc[1].artist}')

Most popular artist was The Weeknd


In [68]:
mask = data.genre == 'R&B/Soul'
values = data.loc[mask, :]
values

Unnamed: 0,artist,album,track_name,track_id,energy,danceability,key,loudness,acousticness,speechiness,instrumentalness,liveness,valence,tempo,duration_ms,genre
1,The Weeknd,After Hours,Blinding Lights,0VjIjW4GlUZAMYd2vXMi3b,0.73,0.514,1,-5.934,0.00146,0.0598,9.5e-05,0.0897,0.334,171.005,200040,R&B/Soul
18,Doja Cat,Hot Pink,Say So,3Dv1eDb0MEgF93GpLXlucZ,0.673,0.787,11,-4.577,0.256,0.158,4e-06,0.0904,0.786,110.962,237893,R&B/Soul


It is interesting that although these two artists are the only examples of the R&B/Soul genre their songs are tehcnically very different by the features that are defines by this dataset

In [12]:
#How many artists in total have their songs in the top 50?
unique_artists = data['artist'].unique()
print(f'{len(unique_artists)} different artists have their song in the top 50')

40 different artists have their song in the top 50


In [13]:
#Are there any albums that have more than 1 popular track? If yes, which and how many?
album_number = data['album'].value_counts()
print(album_number[album_number > 1])

Future Nostalgia        3
Hollywood's Bleeding    2
Fine Line               2
Changes                 2
Name: album, dtype: int64


In [14]:
#How many albums in total have their songs in the top 50?
unique_album= data['album'].unique()
print(f'{len(unique_album)} different albums in total have their song in the top 50')

45 different albums in total have their song in the top 50


There are more unique albums than unique artists which proves that artists can have different approaches in different albums and still can make it to the top 50.

In [76]:
#Which tracks have a danceability score above 0.7?

data['danceability']= data['danceability'].astype('float')
mask = data.danceability > 0.7
values = data.loc[mask, 'track_name']
print('Track number-----------------------------------Name')
print(values)

Track number-----------------------------------Name
2                                      Dance Monkey
3                                           The Box
4                             Roses - Imanbek Remix
5                                   Don't Start Now
6                      ROCKSTAR (feat. Roddy Ricch)
8                  death bed (coffee for your head)
9                                           Falling
11                                             Tusa
14                                  Blueberry Faygo
15                         Intentions (feat. Quavo)
16                                     Toosie Slide
18                                           Say So
19                                         Memories
20                       Life Is Good (feat. Drake)
21                 Savage Love (Laxed - Siren Beat)
23                                      Breaking Me
25                              everything i wanted
26                                         Señorita
27          

In [78]:
#Which tracks have a danceability score below 0.4?
mask = data.danceability < 0.4
values = data.loc[mask, 'track_name']
print('Track number--------Name')
print(values)

Track number--------Name
45    lovely (with Khalid)
Name: track_name, dtype: object


In [83]:
#Which tracks have their loudness above -5?
data['loudness']= data['loudness'].astype('float')
mask = data.loudness > -5
values = data.loc[mask, 'track_name']
print('Track number-----------------------------------Name')
print(values)

Track number-----------------------------------Name
5                                   Don't Start Now
7                                  Watermelon Sugar
11                                             Tusa
13                                          Circles
17                                    Before You Go
18                                           Say So
22                                        Adore You
24                           Mood (feat. iann dior)
32                                   Break My Heart
33                                         Dynamite
34                 Supalonely (feat. Gus Dapperton)
36                  Rain On Me (with Ariana Grande)
38    Sunflower - Spider-Man: Into the Spider-Verse
39                                            Hawái
40                                          Ride It
41                                       goosebumps
44                                          Safaera
49                                         Physical
50          

In [84]:
#Which tracks have their loudness below -8?
mask = data.loudness < -8
values = data.loc[mask, 'track_name']
print('Track number-----------------------------------Name')
print(values)

Track number-----------------------------------Name
8                   death bed (coffee for your head)
9                                            Falling
16                                      Toosie Slide
21                  Savage Love (Laxed - Siren Beat)
25                               everything i wanted
27                                           bad guy
37                               HIGHEST IN THE ROOM
45                              lovely (with Khalid)
48    If the World Was Ending - feat. Julia Michaels
Name: track_name, dtype: object


For the last 4 questions I initially used for loops but then realized it could be done easier and more efficiently by masking

In [99]:
#Which track is the longest?
data['duration_ms']= data['duration_ms'].astype('int')
max_index = data.duration_ms.idxmax()
corresponding_value = data.loc[max_index, 'track_name']
print(f'the longest track is {corresponding_value}')

the longest track is SICKO MODE


In [100]:
#Which track is the shortest?
min_index = data.duration_ms.idxmin()
corresponding_value = data.loc[min_index, 'track_name']
print(f'the shortest track is {corresponding_value}')

the shortest track is Mood (feat. iann dior)


In [21]:
#Which genre is the most popular?
genre_number = data['genre'].value_counts()
print(genre_number[genre_number > 1].head(1))

Pop    14
Name: genre, dtype: int64


I also could have answered this question with the idxmax function that I have used above for the longest track question

In [22]:
#Which genres have just one song on the top 50?
genre_number = data['genre'].value_counts()
print(genre_number[genre_number == 1])

Nu-disco                              1
R&B/Hip-Hop alternative               1
Pop/Soft Rock                         1
Pop rap                               1
Hip-Hop/Trap                          1
Dance-pop/Disco                       1
Disco-pop                             1
Dreampop/Hip-Hop/R&B                  1
Alternative/reggaeton/experimental    1
Chamber pop                           1
Name: genre, dtype: int64


In [23]:
#How many genres in total are represented in the top 50?
print(f'{len(data.genre.unique())} genres in total are represented in the top 50')

16 genres in total are represented in the top 50


Which highlights the fact that how dominating Pop genre and its sub genres are in the music industry

In [102]:
float_columns = ['energy', 'acousticness','speechiness','instrumentalness', 'liveness', 'valence', 'tempo']
for i in float_columns:
    data[i] = data[i].astype('float')

In [103]:
#Which features are strongly positively correlated?
correlation_matrix = data.corr(method='pearson',min_periods=50, numeric_only=True)
for i in range(len(correlation_matrix.columns)):
    for j in range(i + 1):
        if correlation_matrix.iloc[i, j] > 0.5:
            column1 = correlation_matrix.columns[i]
            column2 = correlation_matrix.columns[j]
            if correlation_matrix.columns[i] != correlation_matrix.columns[j]:
                print(f"{column1} and {column2} are strongly positively correlated")

loudness and energy are strongly positively correlated


In [26]:
#Which features are strongly negatively correlated?
correlation_matrix = data.corr(method='pearson',min_periods=50, numeric_only=True)
for i in range(len(correlation_matrix.columns)):
    for j in range(i + 1):
        if correlation_matrix.iloc[i, j] < -0.5:
            column1 = correlation_matrix.columns[i]
            column2 = correlation_matrix.columns[j]
            if correlation_matrix.columns[i] != correlation_matrix.columns[j]:
                print(f"{column1} and {column2} are strongly negatively correlated")

acousticness and energy are strongly negatively correlated
instrumentalness and loudness are strongly negatively correlated


According to various resources i found strong correlation equals to a value that is above 0.8 by minimum standards but since this specific database didnt had any values upto that point I went as down as possible without interfering the current result to show the confidence level I have got.

In [105]:
#Which features are not correlated?
correlation_matrix = data.corr(method='pearson',min_periods=50, numeric_only=True)
for i in range(len(correlation_matrix.columns)):
    for j in range(i + 1):
        if correlation_matrix.iloc[i, j] > -0.1 and correlation_matrix.iloc[i, j] < 0.1:
            column1 = correlation_matrix.columns[i]
            column2 = correlation_matrix.columns[j]
            if correlation_matrix.columns[i] != correlation_matrix.columns[j]:
                print(f"{column1} and {column2} are not correlated")

key and energy are not correlated
loudness and key are not correlated
speechiness and energy are not correlated
speechiness and key are not correlated
speechiness and loudness are not correlated
instrumentalness and danceability are not correlated
instrumentalness and key are not correlated
instrumentalness and speechiness are not correlated
liveness and energy are not correlated
liveness and danceability are not correlated
liveness and loudness are not correlated
liveness and instrumentalness are not correlated
valence and speechiness are not correlated
valence and liveness are not correlated
tempo and energy are not correlated
tempo and key are not correlated
tempo and instrumentalness are not correlated
tempo and liveness are not correlated
tempo and valence are not correlated
duration_ms and energy are not correlated
duration_ms and danceability are not correlated
duration_ms and key are not correlated
duration_ms and loudness are not correlated
duration_ms and acousticness are not

In [28]:
#How does the danceability score compare between Pop, Hip-Hop/Rap, Dance/Electronic, and Alternative/Indie genres?
pop_indices = []
pop_danceability = []
for index, value in data.genre.items():
    if value == 'Pop':
        pop_indices.append(index)
        
for index, value in data.danceability.items():
    for j in pop_indices:
        if j == index:
            pop_danceability.append(value)
            
hiphoprap_indices = []
hiphoprap_danceability = []
for index, value in data.genre.items():
    if value == 'Hip-Hop/Rap':
        hiphoprap_indices.append(index)
        
for index, value in data.danceability.items():
    for j in hiphoprap_indices:
        if j == index:
            hiphoprap_danceability.append(value)

danceElectronic_indices = []
danceElectronic_danceability = []
for index, value in data.genre.items():
    if value == 'Dance/Electronic':
        danceElectronic_indices.append(index)
        
for index, value in data.danceability.items():
    for j in danceElectronic_indices:
        if j == index:
            danceElectronic_danceability.append(value)

alternativeindie_indices = []
alternativeindie_danceability = []
for index, value in data.genre.items():
    if value == 'Alternative/Indie':
        alternativeindie_indices.append(index)
        
for index, value in data.danceability.items():
    for j in alternativeindie_indices:
        if j == index:
            alternativeindie_danceability.append(value)
            
print(f'the average danceability for pop genre is {np.mean(pop_danceability)}')
print(f'the average danceability for Hip-Hop/Rap genre is {np.mean(hiphoprap_danceability)}')
print(f'the average danceability for Dance/Electronic is {np.mean(danceElectronic_danceability)}')
print(f'the average danceability for Alternative/Indie is {np.mean(alternativeindie_danceability)}')

the average danceability for pop genre is 0.6775714285714286
the average danceability for Hip-Hop/Rap genre is 0.7655384615384614
the average danceability for Dance/Electronic is 0.7550000000000001
the average danceability for Alternative/Indie is 0.6617500000000001


In [29]:
#by using groupby
specific_names = ['Pop','Hip-Hop/Rap','Dance/Electronic', 'Alternative/Indie']
grouped = data[data['genre'].isin(specific_names)].groupby('genre')
print('This is the average values')
print(grouped.danceability.mean())
print()
print('This is median values')
print(grouped.danceability.median())
print()
print('This is the minimum values')
print(grouped.danceability.min())
print()
print('This is the maximum values')
print(grouped.danceability.max())
print()
print('This is the standard deviation')
print(grouped.danceability.std())

This is the average values
genre
Alternative/Indie    0.661750
Dance/Electronic     0.755000
Hip-Hop/Rap          0.765538
Pop                  0.677571
Name: danceability, dtype: float64

This is median values
genre
Alternative/Indie    0.663
Dance/Electronic     0.785
Hip-Hop/Rap          0.774
Pop                  0.690
Name: danceability, dtype: float64

This is the minimum values
genre
Alternative/Indie    0.459
Dance/Electronic     0.647
Hip-Hop/Rap          0.598
Pop                  0.464
Name: danceability, dtype: float64

This is the maximum values
genre
Alternative/Indie    0.862
Dance/Electronic     0.880
Hip-Hop/Rap          0.896
Pop                  0.806
Name: danceability, dtype: float64

This is the standard deviation
genre
Alternative/Indie    0.211107
Dance/Electronic     0.094744
Hip-Hop/Rap          0.085470
Pop                  0.109853
Name: danceability, dtype: float64


In [30]:
#How does the loudness score compare between Pop, Hip-Hop/Rap, Dance/Electronic, and Alternative/Indie genres?
pop_indices = []
pop_loudness = []
for index, value in data.genre.items():
    if value == 'Pop':
        pop_indices.append(index)
        
for index, value in data.loudness.items():
    for j in pop_indices:
        if j == index:
            pop_loudness.append(value)
            
hiphoprap_indices = []
hiphoprap_loudness = []
for index, value in data.genre.items():
    if value == 'Hip-Hop/Rap':
        hiphoprap_indices.append(index)
        
for index, value in data.loudness.items():
    for j in hiphoprap_indices:
        if j == index:
            hiphoprap_loudness.append(value)

danceElectronic_indices = []
danceElectronic_loudness = []
for index, value in data.genre.items():
    if value == 'Dance/Electronic':
        danceElectronic_indices.append(index)
        
for index, value in data.loudness.items():
    for j in danceElectronic_indices:
        if j == index:
            danceElectronic_loudness.append(value)

alternativeindie_indices = []
alternativeindie_loudness = []
for index, value in data.genre.items():
    if value == 'Alternative/Indie':
        alternativeindie_indices.append(index)
        
for index, value in data.loudness.items():
    for j in alternativeindie_indices:
        if j == index:
            alternativeindie_loudness.append(value)
            
print(f'the average loudness for pop genre is {np.mean(pop_loudness)}')
print(f'the average loudness for Hip-Hop/Rap genre is {np.mean(hiphoprap_loudness)}')
print(f'the average loudness for Dance/Electronic is {np.mean(danceElectronic_loudness)}')
print(f'the average loudness for Alternative/Indie is {np.mean(alternativeindie_loudness)}')

the average loudness for pop genre is -6.460357142857143
the average loudness for Hip-Hop/Rap genre is -6.917846153846154
the average loudness for Dance/Electronic is -5.338
the average loudness for Alternative/Indie is -5.420999999999999


In [31]:
#by using groupby
specific_names = ['Pop','Hip-Hop/Rap','Dance/Electronic', 'Alternative/Indie']
grouped = data[data['genre'].isin(specific_names)].groupby('genre')
print('This is the average values')
print(grouped.loudness.mean())
print()
print('This is median values')
print(grouped.loudness.median())
print()
print('This is the minimum values')
print(grouped.loudness.min())
print()
print('This is the maximum values')
print(grouped.loudness.max())
print()
print('This is the standard deviation')
print(grouped.loudness.std())

This is the average values
genre
Alternative/Indie   -5.421000
Dance/Electronic    -5.338000
Hip-Hop/Rap         -6.917846
Pop                 -6.460357
Name: loudness, dtype: float64

This is median values
genre
Alternative/Indie   -5.2685
Dance/Electronic    -5.4570
Hip-Hop/Rap         -7.6480
Pop                 -6.6445
Name: loudness, dtype: float64

This is the minimum values
genre
Alternative/Indie    -6.401
Dance/Electronic     -7.567
Hip-Hop/Rap          -8.820
Pop                 -14.454
Name: loudness, dtype: float64

This is the maximum values
genre
Alternative/Indie   -4.746
Dance/Electronic    -3.756
Hip-Hop/Rap         -3.370
Pop                 -3.280
Name: loudness, dtype: float64

This is the standard deviation
genre
Alternative/Indie    0.774502
Dance/Electronic     1.479047
Hip-Hop/Rap          1.891808
Pop                  3.014281
Name: loudness, dtype: float64


In [32]:
#How does the acousticness score compare between Pop, Hip-Hop/Rap, Dance/Electronic, and Alternative/Indie genres?
pop_indices = []
pop_acousticness = []
for index, value in data.genre.items():
    if value == 'Pop':
        pop_indices.append(index)
        
for index, value in data.acousticness.items():
    for j in pop_indices:
        if j == index:
            pop_acousticness.append(value)
            
hiphoprap_indices = []
hiphoprap_acousticness = []
for index, value in data.genre.items():
    if value == 'Hip-Hop/Rap':
        hiphoprap_indices.append(index)
        
for index, value in data.acousticness.items():
    for j in hiphoprap_indices:
        if j == index:
            hiphoprap_acousticness.append(value)

danceElectronic_indices = []
danceElectronic_acousticness = []
for index, value in data.genre.items():
    if value == 'Dance/Electronic':
        danceElectronic_indices.append(index)
        
for index, value in data.acousticness.items():
    for j in danceElectronic_indices:
        if j == index:
            danceElectronic_acousticness.append(value)

alternativeindie_indices = []
alternativeindie_acousticness = []
for index, value in data.genre.items():
    if value == 'Alternative/Indie':
        alternativeindie_indices.append(index)
        
for index, value in data.acousticness.items():
    for j in alternativeindie_indices:
        if j == index:
            alternativeindie_acousticness.append(value)
            
print(f'the average acousticness for pop genre is {np.mean(pop_acousticness)}')
print(f'the average acousticness for Hip-Hop/Rap genre is {np.mean(hiphoprap_acousticness)}')
print(f'the average acousticness for Dance/Electronic is {np.mean(danceElectronic_acousticness)}')
print(f'the average acousticness for Alternative/Indie is {np.mean(alternativeindie_acousticness)}')

the average acousticness for pop genre is 0.3238428571428571
the average acousticness for Hip-Hop/Rap genre is 0.18874076923076927
the average acousticness for Dance/Electronic is 0.09944
the average acousticness for Alternative/Indie is 0.5835


In [33]:
#by using groupby
specific_names = ['Pop','Hip-Hop/Rap','Dance/Electronic', 'Alternative/Indie']
grouped = data[data['genre'].isin(specific_names)].groupby('genre')
print('This is the average values')
print(grouped.acousticness.mean())
print()
print('This is median values')
print(grouped.acousticness.median())
print()
print('This is the minimum values')
print(grouped.acousticness.min())
print()
print('This is the maximum values')
print(grouped.acousticness.max())
print()
print('This is the standard deviation')
print(grouped.acousticness.std())

This is the average values
genre
Alternative/Indie    0.583500
Dance/Electronic     0.099440
Hip-Hop/Rap          0.188741
Pop                  0.323843
Name: acousticness, dtype: float64

This is median values
genre
Alternative/Indie    0.6460
Dance/Electronic     0.0686
Hip-Hop/Rap          0.1450
Pop                  0.2590
Name: acousticness, dtype: float64

This is the minimum values
genre
Alternative/Indie    0.29100
Dance/Electronic     0.01370
Hip-Hop/Rap          0.00513
Pop                  0.02100
Name: acousticness, dtype: float64

This is the maximum values
genre
Alternative/Indie    0.751
Dance/Electronic     0.223
Hip-Hop/Rap          0.731
Pop                  0.902
Name: acousticness, dtype: float64

This is the standard deviation
genre
Alternative/Indie    0.204086
Dance/Electronic     0.095828
Hip-Hop/Rap          0.186396
Pop                  0.318142
Name: acousticness, dtype: float64


<h1 style="font-family: 'Times New Roman'; color: #ff0000; text-align: left;">Conclusion</h1>
Some improvements I would have done for this analysis would be to go more in depth with other features such as tempo, instrumentalness, liveness, valence and etc because I feel like there is a lot more to learn and figure out if we compare the other aspects of this dataset. The key main take aways from this analysis was that:

1) Pop genre and its sub genres are heavily dominating the industry.
2) Humans have an understanding of energetic songs to be louder.
3) All the 3 features we have dived through showed us that median and average values were very close and the STD was relatively low so if an artist that is trying to get into the market should take into account those margins based on their genre and statistically they have a better shot at having a successfull song in the end.
4) There are less songs that are low in loudness than their counterpart which shows that the trend is to have a music that is high in loudness therefore if a song is high in loudness it is more likely to succeed.