# Data Analysis of the Music Industry

Today’s music landscape is constantly evolving, and understanding listener preferences is crucial. At Spotify, we’ve analyzed current top-performing music trends to showcase what resonates with audiences and drives engagement. This data provides C-level executives with insights to inform strategic decision-making and optimize their approach in a competitive market.

Each line in the dataset represents a unique song entry, detailing specific attributes such as artist name, song title, and various audio features. Spotify, the provider, collected this data through our platform's analytical tools, which track listener interactions and analyze track characteristics. The data was gathered through our APIs, which provide details on attributes like danceability, energy, and popularity based on real-time listener engagement.

The dataset contains 18 columns, each representing distinct song characteristics. Columns such as artist, song title, year, popularity, and genre are unique to each song entry. Other columns, including duration, explicit status, danceability, and energy, are not unique, as multiple songs can share similar values for these attributes.

This dataset represents a sample of Spotify’s music catalog, filtered to highlight popular or top-trending tracks within a specific timeframe and genre selections. This selection enables a focused analysis of current trends in listener preferences and engagement.

In [2]:
import pandas as pd

df = pd.read_csv('/Users/eleanizam/Desktop/Git Projets/Music/Music/songs_normalize.csv')

df.head()


Unnamed: 0,artist,song,duration_ms,explicit,year,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,genre
0,Britney Spears,Oops!...I Did It Again,211160,False,2000,77,0.751,0.834,1,-5.444,0,0.0437,0.3,1.8e-05,0.355,0.894,95.053,pop
1,blink-182,All The Small Things,167066,False,1999,79,0.434,0.897,0,-4.918,1,0.0488,0.0103,0.0,0.612,0.684,148.726,"rock, pop"
2,Faith Hill,Breathe,250546,False,1999,66,0.529,0.496,7,-9.007,1,0.029,0.173,0.0,0.251,0.278,136.859,"pop, country"
3,Bon Jovi,It's My Life,224493,False,2000,78,0.551,0.913,0,-4.063,0,0.0466,0.0263,1.3e-05,0.347,0.544,119.992,"rock, metal"
4,*NSYNC,Bye Bye Bye,200560,False,2000,65,0.614,0.928,8,-4.806,0,0.0516,0.0408,0.00104,0.0845,0.879,172.656,pop


In [4]:
songs_per_genre = df['genre'].value_counts()
print("Count of Songs per Genre:\n", songs_per_genre)

Count of Songs per Genre:
 genre
pop                                      428
hip hop, pop                             277
hip hop, pop, R&B                        244
pop, Dance/Electronic                    221
pop, R&B                                 178
hip hop                                  124
hip hop, pop, Dance/Electronic            78
rock                                      58
rock, pop                                 43
Dance/Electronic                          41
rock, metal                               38
pop, latin                                28
pop, rock                                 26
set()                                     22
hip hop, Dance/Electronic                 16
latin                                     15
pop, rock, metal                          14
hip hop, pop, latin                       14
R&B                                       13
pop, rock, Dance/Electronic               13
country                                   10
metal                 

In [5]:
avg_popularity_per_genre = df.groupby('genre')['popularity'].mean()
print("\nAverage Popularity per Genre:\n", avg_popularity_per_genre)


Average Popularity per Genre:
 genre
Dance/Electronic                         51.756098
Folk/Acoustic, pop                       78.000000
Folk/Acoustic, rock                       0.000000
Folk/Acoustic, rock, pop                 68.000000
R&B                                      60.461538
World/Traditional, Folk/Acoustic         69.000000
World/Traditional, hip hop               59.000000
World/Traditional, pop                   61.000000
World/Traditional, pop, Folk/Acoustic    67.500000
World/Traditional, rock                  60.000000
World/Traditional, rock, pop             17.500000
country                                  53.000000
country, latin                            0.000000
easy listening                           72.000000
hip hop                                  64.346774
hip hop, Dance/Electronic                60.625000
hip hop, R&B                             63.666667
hip hop, country                         69.000000
hip hop, latin, Dance/Electronic         72.

In [6]:
songs_per_artist = df['artist'].value_counts()
print("\nTop Artists by Song Count:\n", songs_per_artist)


Top Artists by Song Count:
 artist
Rihanna           25
Drake             23
Eminem            21
Calvin Harris     20
Britney Spears    19
                  ..
Sidney Samson      1
Cam’ron            1
Elvis Presley      1
Lucenzo            1
Blanco Brown       1
Name: count, Length: 835, dtype: int64


In [7]:
avg_audio_features_per_genre = df.groupby('genre')[['danceability', 'energy', 'valence']].mean()
print("\nAverage Audio Features per Genre:\n", avg_audio_features_per_genre)


Average Audio Features per Genre:
                                        danceability    energy   valence
genre                                                                  
Dance/Electronic                           0.684610  0.775829  0.523585
Folk/Acoustic, pop                         0.563000  0.744500  0.352500
Folk/Acoustic, rock                        0.408000  0.849000  0.628000
Folk/Acoustic, rock, pop                   0.517000  0.492000  0.455000
R&B                                        0.663462  0.688077  0.633308
World/Traditional, Folk/Acoustic           0.418000  0.249000  0.213000
World/Traditional, hip hop                 0.807000  0.750000  0.852000
World/Traditional, pop                     0.700000  0.465000  0.719000
World/Traditional, pop, Folk/Acoustic      0.667500  0.776500  0.790500
World/Traditional, rock                    0.473000  0.708000  0.489000
World/Traditional, rock, pop               0.508500  0.837000  0.671000
country                     

In [8]:
songs_by_year = df['year'].value_counts().sort_index()
print("\nSongs by Year:\n", songs_by_year)


Songs by Year:
 year
1998      1
1999     38
2000     74
2001    108
2002     90
2003     97
2004     96
2005    104
2006     95
2007     94
2008     97
2009     84
2010    107
2011     99
2012    115
2013     89
2014    104
2015     99
2016     99
2017    111
2018    107
2019     89
2020      3
Name: count, dtype: int64


In [9]:
explicit_count = df['explicit'].value_counts()
print("\nExplicit vs. Non-explicit Content:\n", explicit_count)


Explicit vs. Non-explicit Content:
 explicit
False    1449
True      551
Name: count, dtype: int64


In [11]:
avg_duration_per_genre = df.groupby('genre')['duration_ms'].mean()
print("\nAverage Song Duration per Genre (ms):\n", avg_duration_per_genre)


Average Song Duration per Genre (ms):
 genre
Dance/Electronic                         241092.390244
Folk/Acoustic, pop                       239590.500000
Folk/Acoustic, rock                      278666.000000
Folk/Acoustic, rock, pop                 245173.000000
R&B                                      227837.076923
World/Traditional, Folk/Acoustic         218546.000000
World/Traditional, hip hop               242373.500000
World/Traditional, pop                   151640.000000
World/Traditional, pop, Folk/Acoustic    187039.500000
World/Traditional, rock                  249300.000000
World/Traditional, rock, pop             231513.000000
country                                  223327.800000
country, latin                           166866.000000
easy listening                           302146.000000
hip hop                                  229705.725806
hip hop, Dance/Electronic                240801.375000
hip hop, R&B                             242790.666667
hip hop, country   

In [12]:
tempo_valence_per_genre = df.groupby('genre')[['tempo', 'valence']].mean()
print("\nAverage Tempo and Valence per Genre:\n", tempo_valence_per_genre)


Average Tempo and Valence per Genre:
                                             tempo   valence
genre                                                      
Dance/Electronic                       125.507537  0.523585
Folk/Acoustic, pop                     111.938000  0.352500
Folk/Acoustic, rock                     84.192000  0.628000
Folk/Acoustic, rock, pop               138.585000  0.455000
R&B                                    106.924846  0.633308
World/Traditional, Folk/Acoustic        82.803000  0.213000
World/Traditional, hip hop             100.035000  0.852000
World/Traditional, pop                 108.102000  0.719000
World/Traditional, pop, Folk/Acoustic  102.606500  0.790500
World/Traditional, rock                118.041500  0.489000
World/Traditional, rock, pop           135.530500  0.671000
country                                137.409600  0.571500
country, latin                          96.055000  0.850000
easy listening                         157.920000  0.400000
h