![Spotify Banner](https://raw.githubusercontent.com/leonardovaloppi/Spotify-Top-50-Songs/main/Spotify_Banner.jpg)

# **What Makes A Song Become A Hit?**

### Spotify Content Analysis - Top 50 Tracks 2020
by Leonardo Valoppi
_____________________________________________________________________________________________________________________________________________

# **Index**

### **0 &nbsp; Introduction**
- 0.1 &nbsp; Methodology
- 0.2 &nbsp; Understanding the Dataset

### **1 &nbsp; Data Preparation and Cleaning**
- 1.1 &nbsp; Enhancing Readability
- 1.2 &nbsp; Dataset Validation

### **2 &nbsp; First Step Into the Data**
- 2.1 &nbsp; Unique Values in Key Columns
- 2.2 &nbsp; Identifying the Most Popular Artists
- 2.3 &nbsp; Identifying Albums With Multiple Appereances
- 2.4 &nbsp; Summary Statistics for Numerical Features

### **3 &nbsp; Investigating Extremes in Song Characteristics**
- 3.1 &nbsp; Identifying Highly Danceable Tracks
- 3.2 &nbsp; Identifying Low Danceability Tracks
- 3.3 &nbsp; Identifying High-Loudness Tracks
- 3.4 &nbsp; Identifying Low-Loudness Tracks
- 3.5 &nbsp; Identifying the Longest and Shortest Songs

### **4 &nbsp; Examining Genre Popularity and Diversity**
- 4.1 &nbsp; Identifying the Most Common Genres
- 4.2 &nbsp; Identifying Unique Genres

### **5 &nbsp; Examining Correlations Between Musical Attributes**
- 5.1 &nbsp; Positive Correlations
- 5.2 &nbsp; Negative Correlations
- 5.3 &nbsp; Neutral or Weak Correlations

### **6 &nbsp; How Musical Features Vary Across Genres**
- 6.1 &nbsp; Comparing Danceability
- 6.2 &nbsp; Comparing Loudness
- 6.3 &nbsp; Comparing Acousticness
- 6.4 &nbsp; Comparing Energy
- 6.5 &nbsp; Comparing Valence

### **7 &nbsp; Final Conclusion: Insights from the Spotify Top 50 Tracks 2020 Dataset**
- 7.1 &nbsp; Main Findings From the Analysis
- 7.2 &nbsp; Next Steps: Further Areas for Exploration

&nbsp;

---

&nbsp;

# **| &nbsp; 0 &nbsp; Introduction**

Understanding what makes a song a hit is crucial for platforms like Spotify, where content performance directly impacts user engagement and business decisions. 

**This project aims to analyze Spotify’s Top 50 Tracks of 2020 dataset to uncover patterns and insights that define a successful song.** By leveraging data analysis techniques in Python with Pandas, we will explore key factors that contribute to a track’s popularity, including artist influence, genre distribution, and audio features such as danceability, loudness, and acousticness.

## **0.1 &nbsp; Methodology**

The analysis will begin with data cleaning to ensure accuracy, handling missing values, eliminating duplicates, and treating potential outliers. <br>
Then, through **[exploratory data analysis (EDA)](https://en.wikipedia.org/wiki/Exploratory_data_analysis)**, we will assess the dataset’s structure, identifying the number of observations, features, and categorical or numerical variables. Key business questions will be addressed, such as which artists and albums dominate the charts, what characteristics distinguish high-ranking tracks, and how different genres perform in terms of danceability, loudness, and acoustic properties. Additionally, correlation analysis will reveal which audio features are most strongly related to hit status, providing valuable insights for Spotify’s content curation strategies.

The outcomes of this study will help refine **music recommendation algorithms**, **optimize playlist curation**, and **guide marketing** and promotional efforts by identifying trends in consumer music preferences. Further improvements could involve expanding the dataset to multiple years, incorporating listener engagement metrics, and applying machine learning techniques to predict future hits.

## **0.2 &nbsp; Understanding the Dataset**

The dataset used in this analysis was sourced from the **[Spotify Top 50 Tracks of 2020 dataset](https://www.kaggle.com/datasets/atillacolak/top-50-spotify-tracks-2020)**, originally obtained from Kaggle. It contains structured information about the most popular songs on Spotify during that year, allowing for an in-depth analysis of the key attributes that contribute to a track’s success. 

Originally, the dataset was extracted using **[Spotypy](https://spotipy.readthedocs.io/en/2.16.1/)**, a Python library for accessing the Spotify **[Web API](https://developer.spotify.com/documentation/web-api)**.

*Let's see how the table looks like:*

In [94]:
import pandas as pd

df = pd.read_csv(
    "https://raw.githubusercontent.com/leonardovaloppi/"
    "Spotify-Top-50-Songs/refs/heads/main/spotifytoptracks.csv",
    index_col=0
)

initialize = df # backup raw

df.head()

Unnamed: 0,artist,album,track_name,track_id,energy,danceability,key,loudness,acousticness,speechiness,instrumentalness,liveness,valence,tempo,duration_ms,genre
0,The Weeknd,After Hours,Blinding Lights,0VjIjW4GlUZAMYd2vXMi3b,0.73,0.514,1,-5.934,0.00146,0.0598,9.5e-05,0.0897,0.334,171.005,200040,R&B/Soul
1,Tones And I,Dance Monkey,Dance Monkey,1rgnBhdG2JDFTbYkYRZAku,0.593,0.825,6,-6.401,0.688,0.0988,0.000161,0.17,0.54,98.078,209755,Alternative/Indie
2,Roddy Ricch,Please Excuse Me For Being Antisocial,The Box,0nbXyq5TXYPCO7pr3N8S4I,0.586,0.896,10,-6.687,0.104,0.0559,0.0,0.79,0.642,116.971,196653,Hip-Hop/Rap
3,SAINt JHN,Roses (Imanbek Remix),Roses - Imanbek Remix,2Wo6QQD1KMDWeFkkjLqwx5,0.721,0.785,8,-5.457,0.0149,0.0506,0.00432,0.285,0.894,121.962,176219,Dance/Electronic
4,Dua Lipa,Future Nostalgia,Don't Start Now,3PfIrDoz19wz7qK7tYeu62,0.793,0.793,11,-4.521,0.0123,0.083,0.0,0.0951,0.679,123.95,183290,Nu-disco


### **Columns Identification**

The dataset is organized into **16 columns**, each capturing different aspects of a song’s metadata and audio features:

- **4 metadata-related columns:**
Includes ***artist***, ***album***, ***track_name***, and ***track_id***, providing basic identification details about each song and its creator.

- **5 classifications:**
Categorizes tracks based on their ***duration_ms** (millisecond)*, ***[tempo](https://en.wikipedia.org/wiki/Tempo)** (beat per minute or bpm)*, ***[key](https://www.classicfm.com/discover-music/music-theory/what-are-musical-keys/)***, ***genre*** and ***[loudness](https://aes2.org/resources/audio-topics/loudness-project/loudness-basics/#:~:text=Loudness%20is%20the%20perceived%20%E2%80%9Cstrength,acoustic%20or%20electronically%20reproduced%20sounds.)** (here measured in [dB](https://geoffthegreygeek.com/understanding-decibels/), typically measured in [LUFs](https://www.izotope.com/en/learn/what-are-lufs.html?srsltid=AfmBOoobu2X4VVTar-1hBNUgT8Ux9sFgiPi04kddkv8b9Mz3qq3bzzm_))*.

- **7 musical and audio features:**
These are numerical attributes that describe different characteristics of a song’s composition and production:
  - ***energy***: Measures the intensity and activity level of a song.
  - ***danceability:*** Quantifies how suitable a track is for dancing based on tempo, rhythm stability, and beat strength.
  - ***acousticness:*** Predicts the likelihood of a song being acoustic.
  - ***speechiness:*** Evaluates the presence of spoken words in the track.
  - ***instrumentalness:*** Measures the extent to which a track is purely instrumental.
  - ***liveness:*** Detects whether the song was recorded in a live setting.
  - ***valence:*** Describes the musical positivity of a track.

&nbsp;

---

# **| &nbsp; 1 &nbsp; Data Preparation and Cleaning**

This section details the preprocessing steps applied to the dataset, ensuring consistency, clarity, and usability for further analysis. The dataset has been refined through various transformations that enhance its structure and readability.  

## **1.1 &nbsp; Enhancing Readability**

The first step in the cleaning process involves resetting the **[DataFrame](https://www.geeksforgeeks.org/python-pandas-dataframe/)** to its initial state. The index is removed and replaced with a new sequential order starting from 1. This ensures that each track is properly indexed, making the dataset easier to navigate.  

To improve readability, specific columns have been selected and reordered in a way that facilitates quick access to relevant information.  

Several numerical features have been rounded to maintain consistency and enhance interpretability. The **loudness** values are rounded to 2 decimal places, while **acousticness** is kept at 4 decimal places to capture finer variations. **Speechiness**, **liveness**, and **valence** are rounded to 3 decimal places, ensuring that they remain precise while being easy to read.  

In addition to rounding, some data type conversions have been applied to standardize the dataset. The **tempo** values, originally in floating-point format, are converted into integers for clarity. The **duration** of each track, initially stored in milliseconds, is transformed into **seconds** and renamed as `duration_sec`. This modification enhances comprehension, making the dataset more intuitive for users analyzing track lengths.  

In [95]:
df = initialize # backup raw

df = df.reset_index(drop=True)

df.index = range(1, len(df) + 1)

df = df[
    [
        "track_name",
        "artist",
        "album",
        "track_id",
        "duration_ms",
        "tempo",
        "key",
        "genre",
        "loudness",
        "energy",
        "danceability",
        "valence",
        "speechiness",
        "instrumentalness",
        "liveness",
        "acousticness",
    ]
]

df = df.round(
    {
        "loudness": 2,    
        "acousticness": 4,
        "speechiness": 3, 
        "liveness": 3,   
        "valence": 3,  
    }
)

df["tempo"] = df["tempo"].round(0).astype(int)

df["duration_ms"] = (df["duration_ms"] / 1000).round(0).astype(int)

df = df.rename(columns={"duration_ms": "duration_sec"})

df.head()

Unnamed: 0,track_name,artist,album,track_id,duration_sec,tempo,key,genre,loudness,energy,danceability,valence,speechiness,instrumentalness,liveness,acousticness
1,Blinding Lights,The Weeknd,After Hours,0VjIjW4GlUZAMYd2vXMi3b,200,171,1,R&B/Soul,-5.93,0.73,0.514,0.334,0.06,9.5e-05,0.09,0.0015
2,Dance Monkey,Tones And I,Dance Monkey,1rgnBhdG2JDFTbYkYRZAku,210,98,6,Alternative/Indie,-6.4,0.593,0.825,0.54,0.099,0.000161,0.17,0.688
3,The Box,Roddy Ricch,Please Excuse Me For Being Antisocial,0nbXyq5TXYPCO7pr3N8S4I,197,117,10,Hip-Hop/Rap,-6.69,0.586,0.896,0.642,0.056,0.0,0.79,0.104
4,Roses - Imanbek Remix,SAINt JHN,Roses (Imanbek Remix),2Wo6QQD1KMDWeFkkjLqwx5,176,122,8,Dance/Electronic,-5.46,0.721,0.785,0.894,0.051,0.00432,0.285,0.0149
5,Don't Start Now,Dua Lipa,Future Nostalgia,3PfIrDoz19wz7qK7tYeu62,183,124,11,Nu-disco,-4.52,0.793,0.793,0.679,0.083,0.0,0.095,0.0123


&nbsp;

By implementing these transformations, the dataset is now well-structured and optimized for further exploration. The refined format facilitates **trend analysis**, **correlation studies**, and potential **machine learning applications** focused on music characteristics and popularity trends. This ensures that the data is both reliable and easy to work with in subsequent analyses.  

## **1.2 &nbsp; Dataset Validation**

Before conducting any meaningful analysis, it is essential to verify the dataset’s **completeness** and **consistency**. This section focuses on assessing the presence of missing values, ensuring that each column maintains a uniform data type, and confirming that the data is correctly structured for further exploration. By performing these checks, we can validate the dataset’s reliability and prevent potential issues arising from inconsistent or incomplete data.

In [96]:
a = df.count()

b = df.map(type).nunique() == 1

c = df.loc[1].map(lambda x: type(x).__name__)

df_summary = pd.concat(
    [a, b, c], 
    axis=1,
    keys=["non_nulls", "single_dtype", "dtype"] 
)

df_summary

Unnamed: 0,non_nulls,single_dtype,dtype
track_name,50,True,str
artist,50,True,str
album,50,True,str
track_id,50,True,str
duration_sec,50,True,int64
tempo,50,True,int64
key,50,True,int64
genre,50,True,str
loudness,50,True,float64
energy,50,True,float64


### **Data Type Consistency**

The dataset summary confirms that all columns analyzed contain **50 non-null values**, indicating that there are no missing data points in these fields. This ensures data integrity and eliminates the need for further handling of missing values.  

Each column maintains a **consistent data type**, as indicated by the `single_dtype` column returning `True` for all entries. This guarantees that the dataset does not contain mixed data types, which could lead to inconsistencies or processing errors.  

Regarding data types, **track_name**, **artist**, **album**, and **track_id** are correctly stored as strings, which aligns with their purpose as text-based identifiers. The **duration_sec** column is appropriately stored as an integer (`int64`), confirming that the duration conversion from milliseconds to seconds was successful.  

&nbsp;

___

# **| &nbsp; 2 &nbsp; First Steps Into the Data**

Exploratory Data Analysis (EDA) is a crucial step in understanding the structure and characteristics of a dataset. This phase involves **summarizing key attributes**, identifying **patterns**, detecting **anomalies**, and uncovering **relationships between variables**. In this section, we begin by examining the uniqueness of key categorical features to better understand the dataset’s composition.  

## **2.1 &nbsp; Unique Values in Key Columns** 

A fundamental aspect of data exploration is determining the number of unique values in key categorical columns. This helps us assess the diversity of entries in the dataset, such as the number of distinct tracks, artists, albums, and genres. By counting the unique values in columns like `track_id`, `track_name`, `artist`, `album`, and `genre`, we can evaluate the **dataset’s variability** and identify **potential redundancies** or patterns.  

In [97]:
df[[
    "track_id",
    "track_name",
    "artist",
    "album",
    "genre",
    "key"
]].nunique()

track_id      50
track_name    50
artist        40
album         45
genre         16
key           12
dtype: int64

### **Key Insights**

The results indicate that the dataset contains **50 unique track IDs** and **track names**, confirming that each song in the dataset is distinct. This suggests that no duplicate songs are present, ensuring data integrity.  

The number of **unique artists** is **40**, meaning that some artists have multiple songs included in the dataset. This implies that the dataset does not strictly contain one song per artist but rather includes multiple entries from certain musicians.  

Regarding albums, there are **45 unique albums** for 50 songs, indicating that some albums contain more than one track featured in this dataset. This is common in streaming charts, where popular albums often have multiple hit songs.  

The **genre column** has **16 unique values**, which suggests a diverse representation of musical styles. While not overly broad, this variety allows for an analysis of trends across different genres and how they contribute to the popularity of songs.   

Finally, the **key** column contains **12 unique values**. This makes sense, since all of today’s mainstream music follows the standard system of **[equal temperament](https://en.wikipedia.org/wiki/12_equal_temperament)**.

## **2.2 &nbsp; Identifying the Most Popular Artists** 

In the music industry, some artists consistently dominate the charts, securing multiple spots in top playlists. Whether due to **strong fan engagement**, **mainstream appeal**, or **high productivity**, these artists manage to maintain a significant presence in the industry.  

In this section, we analyze the artists who appear more than once in the dataset, identifying those who have multiple top tracks. By counting the number of appearances, we gain insights into **which artists have the highest hit frequency**, shedding light on industry trends, artist influence, and potential market patterns.  

In [98]:
count = df["artist"].value_counts()

repeated_artists = count[count > 1].to_frame(name="appearances")

repeated_artists = repeated_artists.reset_index()

repeated_artists.columns = ["artist", "appearances"]

repeated_artists.index += 1

repeated_artists

Unnamed: 0,artist,appearances
1,Billie Eilish,3
2,Dua Lipa,3
3,Travis Scott,3
4,Justin Bieber,2
5,Harry Styles,2
6,Lewis Capaldi,2
7,Post Malone,2



### **Key Insights**  

- **Billie Eilish**, **Dua Lipa**, and **Travis Scott** each have three songs in the top charts, making them the most dominant artists in this dataset.  
  - **Billie Eilish** is known for her **unique vocal style** and **genre-blending production**, appealing to both mainstream and alternative audiences.  
  - **Dua Lipa** has solidified her position as a pop and dance icon, with songs characterized by high energy and **rhythmic appeal**.  
  - **Travis Scott** represents Hip-Hop/Rap’s strong influence on the charts, with his moody production and **immersive soundscapes** resonating with a wide audience.  

- **Justin Bieber**, **Harry Styles**, **Lewis Capaldi**, and **Post Malone** appear twice, showing **consistent hit-making ability** across different styles.  
  - **Justin Bieber** and **Harry Styles** maintain strong Pop influences, though Styles leans toward **rock and indie elements**, differentiating his sound.  
  - **Lewis Capaldi**, known for his emotionally charged ballads, stands out in a dataset otherwise dominated by upbeat pop and hip-hop tracks.  
  - **Post Malone**’s mix of rap, rock, and melodic elements reflects his versatility and wide audience appeal.  

### **Industry and Market Insights**  

- The presence of **pop**, **hip-hop**, and **crossover artists** in this list reflects the **genre diversity** that dominates today’s charts.  
- Artists with multiple hits have mastered the balance between mainstream appeal and artistic identity**, allowing them to consistently rank among top tracks.  
- The inclusion of **Lewis Capaldi**, an artist known for **emotional ballads** rather than high-energy dance tracks, suggests that audiences still embrace **slower**, **heartfelt music** alongside more rhythm-driven hits.  
- **Collaboration trends and streaming influence** could be explored further to understand how these artists maintain sustained popularity.  

### **Final Thoughts**  

This analysis confirms that certain artists dominate the charts through consistent hit-making strategies, strong fan engagement, and cross-genre appeal. As the music industry evolves, it will be interesting to see whether these artists continue to dominate or if emerging artists disrupt their hold on the charts.  

## **2.3 &nbsp; Identifying Albums With Multiple Appearances** 

Analyzing the frequency of album appearances allows us to determine which records have more than one track featured in the dataset, highlighting their impact on the music charts. 
It provides insights into whether the dataset is dominated by a few highly successful albums or if it includes a more diverse selection of tracks.

In [99]:
count = df["album"].value_counts()

repeated_albums = count[count > 1].index

album_artists = df[df["album"].isin(repeated_albums)][["album", "artist"]].drop_duplicates()

combined_df = album_artists.merge(
    count[count > 1].to_frame(name="album_appearances"),
    left_on="album",
    right_index=True
)

combined_df = combined_df.reset_index(drop=True)

combined_df.index += 1

combined_df

Unnamed: 0,album,artist,album_appearances
1,Future Nostalgia,Dua Lipa,3
2,Fine Line,Harry Styles,2
3,Hollywood's Bleeding,Post Malone,2
4,Changes,Justin Bieber,2


### **Key Insights**
The results reveal that **4 albums** have multiple tracks featured in the dataset, **covering 18% of the Top 50**.

**"Future Nostalgia" by Dua Lipa** stands out with three songs, making it the most frequently appearing album. This suggests that the album had widespread commercial success, with multiple tracks resonating with listeners.  

Three other albums —**"Fine Line" by Harry Styles, "Hollywood's Bleeding" by Post Malone, and "Changes" by Justin Bieber**— each have 2 songs in the dataset. These albums were likely among the most influential releases, producing multiple chart-topping hits.  

### **Final Thoughts**
The presence of multiple songs from the same album implies that certain records dominated streaming platforms and audience preferences. Further analysis could explore how these albums compare in terms of streaming numbers, listener demographics, and overall impact on the music industry.  

## **2.4 &nbsp; Summary Statistics for Numerical Features**

To better understand the dataset's numerical attributes, it is essential to compute summary statistics that provide insights into their **distribution**, **central tendency**, and **variability**.

In [100]:
df_no_key = df.drop(columns=["key"])

df_summary = df_no_key.describe().round({
    "duration_sec": 1,
    "tempo": 1,
    "loudness": 2,
    "energy": 3,
    "danceability": 3,
    "valence": 3,
    "speechiness": 3,
    "instrumentalness": 5,
    "liveness": 3,
    "acousticness": 4,
})

df_cleaned = df_summary.drop(index="count")

df_cleaned

Unnamed: 0,duration_sec,tempo,loudness,energy,danceability,valence,speechiness,instrumentalness,liveness,acousticness
mean,200.0,119.7,-6.23,0.609,0.717,0.556,0.124,0.01596,0.197,0.2562
std,33.9,25.4,2.35,0.154,0.125,0.216,0.117,0.09431,0.177,0.2653
min,141.0,76.0,-14.45,0.225,0.351,0.06,0.029,0.0,0.057,0.0015
25%,176.0,99.5,-7.56,0.494,0.673,0.434,0.048,0.0,0.094,0.0528
50%,198.0,117.0,-5.99,0.597,0.746,0.56,0.07,0.0,0.111,0.1885
75%,215.0,132.2,-4.29,0.73,0.794,0.726,0.156,2e-05,0.271,0.2987
max,313.0,180.0,-3.28,0.855,0.935,0.925,0.487,0.657,0.792,0.934


The summary statistics reveal key trends in the dataset, highlighting both expected distributions and some intriguing deviations. By comparing means and medians, as well as looking at standard deviations and percentiles, we can better understand the structure of these songs and detect potential outliers.  

### **Duration and Tempo: A Standardized Pop Formula?**

The **average** song duration is **200 seconds** (3 minutes and 20 seconds), with a median of 198 seconds, suggesting a **relatively balanced distribution**. However, the standard deviation indicates a **moderate spread**, with some noticeably longer tracks. The **maximum duration** of **313 seconds** (over 5 minutes) is an **outlier** compared to the majority of songs, considering that even the 75th percentile is just 215 seconds. This suggests that while most songs fit within a conventional **3-4 minute structure**, a few much longer tracks are pulling up the maximum.  

**Tempo** shows **less variance**, with a mean of 119.7 BPM. The range is **quite broad**, from a low of 76 BPM to a high of 180 BPM, but the narrow interquartile range suggests that most tracks cluster around a **mid-tempo**, which is common in pop music. The 75th percentile at 132.2 BPM indicates that **faster songs are somewhat limited**, with only a few breaking into high-energy dance or electronic tempo ranges.  

### **Loudness and Energy: A Pattern of High-Impact Tracks**

**Loudness** has a mean of -6.23 dB and a median of -5.99 dB. Its relatively **small standard deviation** shows that most tracks have similar loudness levels, sticking to a **compressed**, **radio-friendly** volume range. However, the minimum value of -14.45 dB stands out since it is significantly lower than the 25th percentile, suggesting that a handful of **quieter tracks exist** in an otherwise consistently loud dataset.  

Energy, in contrast, shows **more balance and less variability**, with a mean of 0.609 and a median of 0.597. The values range from 0.225 to 0.855, but since the 75th percentile is already at 0.730, **the highest energy songs do not seem like extreme outliers**. This suggests that the dataset leans towards energetic tracks but **still includes some low-energy songs**, possibly ballads or acoustic performances.  

### **Danceability and Valence: What Defines a Hit?**

**Danceability** exhibits an interesting skew: the median is 0.746, and the 25th percentile is already at 0.673. This suggests that most tracks are **highly danceable**, with only a few pulling the average down. Given that the maximum value is close to 1.0, it’s clear that nearly all tracks are designed to be **rhythmically engaging**.  

**Valence**, which measures **musical positivity**, is **fairly balanced** with a mean of 0.556 and a median of 0.560. The interquartile range (0.434 to 0.726) suggests a mix of both **melancholic** and **upbeat** songs, without extreme biases. However, the minimum (0.060) and maximum (0.925) values suggest that there are both deeply emotional and extremely cheerful tracks in the dataset.  

### **Speechiness, Instrumentalness, and Liveness: Are There Any Outliers?**

**Speechiness** has a mean of 0.124 and a standard deviation of 0.117, meaning most tracks contain **typical vocal structures** rather than spoken-word sections. However, the maximum of 0.487 suggests that at least one song is either **heavily spoken** or features a **rap-heavy structure**, making it an **outlier** in this distribution.  

**Instrumentalness** is overwhelmingly low, with a median and 25th percentile at 0, meaning most songs have **clear vocal content**. The maximum value of 0.657 is a stark contrast, suggesting the presence of at least one track that is significantly more **instrumental** than the rest —likely a **remix** or an ambient track that made it into the dataset.  

**Liveness**, which detects **live recordings**, has a mean of 0.197 and a median of 0.111, but its standard deviation is relatively large. The 75th percentile (0.271) and a maximum of 0.792 suggest that while most songs are **studio recordings**, there are a **few clear outliers** that could be live performances or have strong **crowd noise effects**.  

### **Final Thoughts**

The dataset follows many expected trends, with most songs being **radio-friendly** in duration, **mid-to-high energy**, and **highly danceable**. However, we observe some **clear outliers** in duration, loudness, speechiness, instrumentalness, and liveness, suggesting a **few unique tracks** that do not fit the standard mold.

&nbsp;

---

# **| &nbsp; 3 &nbsp; Investigating Extremes in Song Characteristics**  

Building upon the first phase of our exploratory data analysis, which focused on categorical distribution and overall trends, this section delves deeper into **extreme values** within key musical attributes. By identifying the tracks that exhibit the **highest** and **lowest danceability** and **loudness**, as well as the **longest** and **shortest** durations, we can gain insights into how different musical characteristics influence commercial success and listener engagement.  

Songs that reach the upper or lower bounds of a particular feature often stand out in terms of production, audience reception, or **cultural impact**. By examining these extremes, we can assess whether these unconventional songs still find mainstream success or if they cater to more niche audiences.  

## **3.1 &nbsp; Identifying Highly Danceable Tracks**  

As the previous statistics demonstated, danceability is a crucial feature in popular music, measuring how suitable a song is for dancing based on **rhythm stability**, **beat strength**, and **overall tempo**. Given that many hit songs are designed to be engaging on the **dance floor**, analyzing tracks with high danceability can provide insights into trends in music production and audience preferences.  

In this section, we filter the dataset to include only tracks with a **danceability score above 0.7**, ensuring that we focus on songs that are highly rhythm-driven and likely intended for club or party settings.
By isolating these songs, we can explore common characteristics among them, such as tempo, energy, and valence, to determine what makes them particularly suited for dancing.

In [101]:
high_danceability_tracks = df[df["danceability"] > 0.7].copy()

high_danceability_tracks.insert(0, "position", high_danceability_tracks.index)

high_danceability_tracks = high_danceability_tracks.reset_index(drop=True)

high_danceability_tracks.index += 1

high_danceability_tracks = high_danceability_tracks.drop(
    columns=[
        "album", "track_id", "key", "loudness", "speechiness",
        "instrumentalness", "liveness", "acousticness"
    ]
)

high_danceability_tracks

Unnamed: 0,position,track_name,artist,duration_sec,tempo,genre,energy,danceability,valence
1,2,Dance Monkey,Tones And I,210,98,Alternative/Indie,0.593,0.825,0.54
2,3,The Box,Roddy Ricch,197,117,Hip-Hop/Rap,0.586,0.896,0.642
3,4,Roses - Imanbek Remix,SAINt JHN,176,122,Dance/Electronic,0.721,0.785,0.894
4,5,Don't Start Now,Dua Lipa,183,124,Nu-disco,0.793,0.793,0.679
5,6,ROCKSTAR (feat. Roddy Ricch),DaBaby,182,90,Hip-Hop/Rap,0.69,0.746,0.497
6,8,death bed (coffee for your head),Powfu,173,144,Hip-Hop/Rap,0.431,0.726,0.348
7,9,Falling,Trevor Daniel,159,127,R&B/Hip-Hop alternative,0.43,0.784,0.236
8,11,Tusa,KAROL G,201,101,Pop,0.715,0.803,0.574
9,14,Blueberry Faygo,Lil Mosey,163,99,Hip-Hop/Rap,0.554,0.774,0.349
10,15,Intentions (feat. Quavo),Justin Bieber,213,148,Pop,0.546,0.806,0.874


### **Key Insights**

The results show that **32 out of 50 tracks** in Spotify’s Top 50 have a danceability score above 7, representing **64%** of all records in the dataset.

One of the most striking observations is that these highly danceable tracks **tend to cluster** around a specific tempo range, likely between **120** and **130 BPM**. This range is commonly associated with pop, electronic, and dance music, genres that heavily rely on steady, engaging rhythms. The fact that few songs in this subset fall below 100 BPM suggests that slower, more **groove-based rhythms** are less dominant among the most danceable hits. Instead, producers and artists seem to prioritize beats that feel natural for movement, reinforcing how critical tempo selection is in crafting commercially successful music.   

Interestingly, while most of these tracks exhibit **high energy** and **positive valence**, there are some notable **outliers** that are highly danceable but have a **lower valence score**. This suggests that even **darker**, **moodier** tracks can succeed in dance environments, provided they maintain a compelling rhythm and beat structure. This opens up an interesting space for market differentiation, where artists and producers can explore unconventional dance tracks that break away from the typical **"feel-good" formula** yet still engage listeners physically. 

## **3.2 &nbsp; Identifying Low Danceability Tracks**

While highly danceable tracks dominate the charts, there is another category of songs that succeeds despite having a much lower danceability score. These tracks, with a **danceability score below 0.4**, prioritize other musical elements such as lyrical depth, complex instrumentation, or emotional weight over rhythmic engagement.  

In this section, we isolate these low-danceability tracks to better understand their role in the music industry. By filtering the dataset, we can examine which artists and genres contribute to this category and whether these songs share common characteristics. Unlike rhythm-heavy tracks designed for movement, these songs may be more introspective, cinematic, or structured around free-form musical elements rather than predictable beats.  

In [102]:
low_danceability_tracks = df[df["danceability"] < 0.4].copy()

low_danceability_tracks.insert(0, "position", low_danceability_tracks.index)

low_danceability_tracks = low_danceability_tracks.reset_index(drop=True)

low_danceability_tracks.index += 1

low_danceability_tracks = low_danceability_tracks.drop(
    columns=[
        "album", "track_id", "key", "speechiness", 
        "instrumentalness", "liveness"
    ]
)

low_danceability_tracks

Unnamed: 0,position,track_name,artist,duration_sec,tempo,genre,loudness,energy,danceability,valence,acousticness
1,45,lovely (with Khalid),Billie Eilish,200,115,Chamber pop,-10.11,0.296,0.351,0.12,0.934


### **Key Insights**

Filtering the dataset to include tracks with a danceability score **below 0.4** highlights a fascinating insight: there is only **one track** that meets this criterion. This extreme rarity suggests that nearly all successful songs in the dataset have at least some **rhythmic engagement**.  

### **Market Implications**  

The presence of only a single low-danceability track raises important questions about how commercial success is **shaped by rhythm** and **accessibility**. While music varies widely in structure and purpose, mainstream hits tend to maintain some degree of rhythmic consistency, likely to increase their suitability for **passive listening**, **playlist curation**, and engagement across different contexts.

The overwhelming dominance of higher-danceability songs suggests that **mainstream music** tends to **avoid** compositions that lack rhythmic flow. The fact that only one track in the dataset falls below this threshold indicates that completely non-rhythmic or highly experimental music **struggles to achieve mass success** on streaming platforms.

## **3.3 &nbsp; Identifying High-Loudness Tracks**

Loudness plays a crucial role in shaping a song’s impact, influencing how dynamic and powerful it feels to listeners. In modern music production, **tracks are often [mastered](https://www.sageaudio.com/articles/what-is-mastering)** to achieve high loudness levels, making them more attention-grabbing and competitive in streaming environments.  

In this section, we filter the dataset to focus on **tracks with a loudness greater than -5 dB**, highlighting songs that are mastered to be particularly strong and present in a mix. These high-loudness tracks typically belong to genres that emphasize **energetic** and **immersive soundscapes**, such as pop, electronic, and hip-hop.  

By analyzing this subset, we can explore how loudness correlates with other musical features like **energy**, **danceability**, and **tempo**. Additionally, identifying which artists and genres dominate this category provides insights into current industry trends in audio mastering and production techniques.  

In [103]:
high_loudness_tracks = df[df["loudness"] > -5].copy()

high_loudness_tracks.insert(0, "position", high_loudness_tracks.index)

high_loudness_tracks = high_loudness_tracks.reset_index(drop=True)

high_loudness_tracks.index += 1

high_loudness_tracks = high_loudness_tracks.drop(
    columns=[
        "album", "track_id", "key", "speechiness", 
        "instrumentalness", "liveness", "acousticness"
    ]
)

high_loudness_tracks

Unnamed: 0,position,track_name,artist,duration_sec,tempo,genre,loudness,energy,danceability,valence
1,5,Don't Start Now,Dua Lipa,183,124,Nu-disco,-4.52,0.793,0.793,0.679
2,7,Watermelon Sugar,Harry Styles,174,95,Pop,-4.21,0.816,0.548,0.557
3,11,Tusa,KAROL G,201,101,Pop,-3.28,0.715,0.803,0.574
4,13,Circles,Post Malone,215,120,Pop/Soft Rock,-3.5,0.762,0.695,0.553
5,17,Before You Go,Lewis Capaldi,215,112,Alternative/Indie,-4.86,0.575,0.459,0.183
6,18,Say So,Doja Cat,238,111,R&B/Soul,-4.58,0.673,0.787,0.786
7,22,Adore You,Harry Styles,207,99,Pop,-3.68,0.771,0.676,0.569
8,24,Mood (feat. iann dior),24kGoldn,141,91,Pop rap,-3.56,0.722,0.7,0.756
9,32,Break My Heart,Dua Lipa,222,113,Dance-pop/Disco,-3.43,0.729,0.73,0.467
10,33,Dynamite,BTS,199,114,Disco-pop,-4.41,0.765,0.746,0.737


The results show that **19 out of 50 tracks** in Spotify’s Top 50 have a loudness greater than -5 dB, representing **38%** of all records in the dataset.

### **Key Insights**  

The **average loudness** in this subset is significantly higher than the overall dataset, with a median value close to the **-4 dB mark**, indicating an industry-wide preference for high loudness levels. This is in line with the so-called **[“Loudness War”](https://en.wikipedia.org/wiki/Loudness_war)**, where modern music production trends push for **maximizing perceived volume** to capture listener attention instantly.  

These high-loudness tracks tend to exhibit **strong correlations** with **energy** and **tempo**. The mean energy score is likely above 0.7, reinforcing the idea that louder songs are often produced to feel more dynamic and engaging. Similarly, **tempo** values are expected to be on the **higher end**, suggesting that these songs are designed for movement and intensity, often aligning with **pop**, **electronic**, and **hip-hop genres**.  

Interestingly, **danceability** in this subset appears to be **consistently high**, though not all high-loudness tracks are necessarily dance-oriented. Some loud tracks may instead focus on **power** and **aggression**, often found in **alternative**, **rock**, or **high-energy rap songs**. 

**Valence**, which measures emotional positivity, may exhibit a more **varied distribution**, meaning that both uplifting and intense, dramatic songs can be mastered at high loudness levels.  

### **Market Implications**  

From an industry perspective, these findings suggest that high loudness is still a **dominant factor** in commercial music production. The fact that a significant number of tracks fall into this category confirms that, despite some pushback from audio purists, loudness remains a **key ingredient** in competitive music markets.  

For producers and artists, this reinforces the importance of mastering techniques that enhance perceived energy **without sacrificing audio quality**. The continued prevalence of highly compressed, loud tracks suggests that streaming platforms still favor music that maintains a consistent, **punchy sound** across different listening environments. However, as platforms like Spotify introduce **[loudness normalization](https://support.spotify.com/us/artists/article/loudness-normalization/)** features, it will be interesting to see whether these trends shift in the future.  

## **3.4 &nbsp; Identifying Low-Loudness Tracks**

Some songs intentionally maintain lower loudness levels, preserving a more **natural dynamic range** and focusing on **subtlety** rather than sheer intensity.  

In this section, we filter the dataset to include only tracks with a **loudness lower than -8 dB**, highlighting songs that deviate from the typical loudness-maximization trend. These tracks may belong to genres that emphasize softer dynamics, such as **acoustic**, **jazz**, **ambient**, or **indie** music, where preserving detail and contrast between quiet and loud moments enhances emotional depth.  

By analyzing this subset, we can explore how low-loudness tracks differ in terms of **energy**, **tempo**, and **danceability**, and whether they maintain commercial success despite their lower volume levels. Understanding these trends can offer insights into how dynamic range is being used artistically and how it affects a song’s reception in a streaming-dominated market.  

In [104]:
low_loudness_tracks = df[df["loudness"] < -8].copy()

low_loudness_tracks.insert(0, "position", low_loudness_tracks.index)

low_loudness_tracks = low_loudness_tracks.reset_index(drop=True)

low_loudness_tracks.index += 1

low_loudness_tracks = low_loudness_tracks.drop(
    columns=[
        "album", "track_id", "key", "valence", "speechiness", 
        "instrumentalness", "liveness", "acousticness"
    ]
)

low_loudness_tracks

Unnamed: 0,position,track_name,artist,duration_sec,tempo,genre,loudness,energy,danceability
1,8,death bed (coffee for your head),Powfu,173,144,Hip-Hop/Rap,-8.76,0.431,0.726
2,9,Falling,Trevor Daniel,159,127,R&B/Hip-Hop alternative,-8.76,0.43,0.784
3,16,Toosie Slide,Drake,247,82,Hip-Hop/Rap,-8.82,0.49,0.83
4,21,Savage Love (Laxed - Siren Beat),Jawsh 685,171,150,Hip-Hop/Rap,-8.52,0.481,0.767
5,25,everything i wanted,Billie Eilish,245,120,Pop,-14.45,0.225,0.704
6,27,bad guy,Billie Eilish,194,135,Electro-pop,-10.96,0.425,0.701
7,37,HIGHEST IN THE ROOM,Travis Scott,176,76,Hip-Hop/Rap,-8.76,0.427,0.598
8,45,lovely (with Khalid),Billie Eilish,200,115,Chamber pop,-10.11,0.296,0.351
9,48,If the World Was Ending - feat. Julia Michaels,JP Saxe,209,76,Pop,-10.09,0.473,0.464


The results show that **9 out of 50 tracks** in Spotify’s Top 50 have a loudness Llower than -8 dB, representing **18%** of all records in the dataset.

### **Key Insights**  

The **average loudness** in this subset is **notably lower** than the dataset’s overall mean, with a median likely around -9 dB or lower. This indicates that these songs **resist the industry** trend of extreme compression and loudness maximization. This group of tracks may include **acoustic**, **indie**, **jazz**, **classical**, or **ambient music**, where preserving sonic detail is more important than maximizing perceived volume.  

Another striking feature of this subset is its likely **correlation** with **lower energy levels**. While not all soft songs are low-energy, a pattern may emerge where these tracks favor more **organic**, **relaxed instrumentation** rather than high-energy, compressed soundscapes. Similarly, the **tempo** of these tracks may trend **lower**, suggesting that they often fall into slower, more introspective musical categories.  

**Danceability** is another key metric to consider. Given that high-loudness tracks tend to be more rhythmically engaging, it is likely that low-loudness songs have a **lower** danceability score on average, aligning more with music meant for **listening** rather than movement. However, there may be exceptions: Some softer, atmospheric electronic tracks may still be structured for danceability despite their lower volume levels.  

### **Market Implications**  

As streaming platforms introduce **loudness normalization**, the traditional advantage of mastering songs at higher volumes is **diminishing**. This could open more opportunities for **dynamically rich**, lower-loudness tracks to thrive, shifting the focus back to **composition**, **arrangement**, and emotional connection rather than pure sonic intensity.  

These findings indicate that while high loudness remains a dominant factor in mainstream hits, there is still **room** for softer, more dynamically rich tracks to find success, particularly in **niche markets** and curated playlists. Future research could explore how listener engagement differs between high and low loudness tracks across different contexts.  

## **3.5 &nbsp; Identifying the Longest and Shortest Songs**

Song duration plays a significant role in shaping listening habits, streaming success, and radio playability. While most mainstream tracks follow a **conventional length**, some outliers break the norm, either by being exceptionally short or significantly longer than the average hit song.  

In this section, we identify the **longest** and **shortest songs** in the dataset and analyze their characteristics. Examining these extremes provides valuable insights into how song length influences market performance and listener engagement.  

In [105]:
longest_song = df[df["duration_sec"] == df["duration_sec"].max()].copy()
shortest_song = df[df["duration_sec"] == df["duration_sec"].min()].copy()

longest_song["position"] = longest_song.index
shortest_song["position"] = shortest_song.index

longest_song.insert(0, "position", longest_song.pop("position"))
shortest_song.insert(0, "position", shortest_song.pop("position"))

longest_song.index = ["longest"]
shortest_song.index = ["shortest"]

duration_extremes = pd.concat([shortest_song, longest_song])

duration_extremes = duration_extremes.drop(
    columns=[
        "album", "track_id", "key", "valence", 
        "instrumentalness", "liveness", "acousticness"
    ]
)

duration_extremes

Unnamed: 0,position,track_name,artist,duration_sec,tempo,genre,loudness,energy,danceability,speechiness
shortest,24,Mood (feat. iann dior),24kGoldn,141,91,Pop rap,-3.56,0.722,0.7,0.037
longest,50,SICKO MODE,Travis Scott,313,155,Hip-Hop/Rap,-3.71,0.73,0.834,0.222


The dataset reveals two extreme cases in terms of song duration:  
- The **shortest song**, *Mood (feat. iann dior)* by **24kGoldn**, has a length of **141 seconds** (2 minutes and 21 seconds).  
- The **longest song**, *SICKO MODE* by **Travis Scott**, runs for **313 seconds** (5 minutes and 13 seconds).  

#### **The Shortest Song – “Mood” (24kGoldn)**  
***Mood*** stands out as a compact, high-energy **Pop Rap** track with a tempo of **91 BPM** and a high valence score of 0.756, indicating an overall **positive** and **engaging** vibe. Despite its short length, it maintains a **moderate danceability score** and an energy level of 0.722, making it both rhythmically engaging and sonically powerful.  

A key factor contributing to its success could be its **short duration**, which aligns with modern streaming trends. Many artists **intentionally** keep songs under 3 minutes to encourage **repeat plays**, **boosting streaming numbers** on platforms like Spotify. The track also has a **low speechiness score**, meaning it is more melodic than spoken-word-heavy, distinguishing it from traditional rap tracks.  

#### **The Longest Song – “SICKO MODE” (Travis Scott)**  
***SICKO MODE*** follows a very different structure. As a **Hip-Hop/Rap** track with a length of **over 5 minutes**, it breaks away from the conventional length of mainstream hits. It has a **high tempo (155 BPM)**, indicating an intense, fast-paced energy that keeps listeners engaged throughout its multiple **beat-switches** and **structural shifts**.  

Despite its extended length, *SICKO MODE* maintains **high danceability** and **energy**, showing that even longer songs can be engaging and commercially viable. Its **valence score** is significantly **lower** than *Mood*, suggesting a more neutral or **darker** emotional tone. Notably, its **speechiness** is **much higher**, reflecting the rap-heavy nature of the song, which incorporates multiple verses and changes in flow.  

### **Market Implications**  
The stark contrast between these two songs reflects **two different strategies** in modern music production. Shorter songs like *Mood* maximize **replay value** and fit seamlessly into playlist culture, making them more digestible for streaming audiences. On the other hand, longer tracks like *SICKO MODE* prioritize **storytelling**, **structural complexity**, and **artistic experimentation**.  

Despite their differences in duration, both songs maintain **high loudness** (-3.56 dB and -3.71 dB, respectively), ensuring that they remain sonically impactful.

The presence of an unusually long rap song in the dataset indicates that length alone is **not** a limiting factor for mainstream success, provided the song is **dynamic**, **engaging**, and **structured** in a way that retains listener attention. Meanwhile, the trend of shorter songs dominating streaming platforms suggests that **brevity can be a strategic advantage** in today’s music landscape.  

&nbsp;

---

# **| &nbsp; 4 &nbsp; Examining Genre Popularity and Diversity**  

## **4.1 &nbsp; Identifying the Most Common Genres**

Genre plays a fundamental role in shaping **musical identity** and listener preferences. Some genres dominate the charts, appearing frequently across hit songs, while others remain niche, with fewer mainstream representations.  

In this section, we analyze the frequency of different genres within the dataset, identifying which musical styles **appear more than once**. By counting occurrences and isolating the most represented genres, we can gain insights into which styles are the most commercially successful and prevalent in the top-ranked tracks.  

Understanding genre distribution offers valuable insights into how **industry focus shifts over time**, as certain genres become more dominant due to streaming trends, **cultural influence**, or evolving production techniques.  

In [106]:
count = df["genre"].value_counts()

repeated_genres = count[count > 1].to_frame(name="appearances").reset_index()

repeated_genres.columns = ["genre", "appearances"]

repeated_genres.index += 1

repeated_genres

Unnamed: 0,genre,appearances
1,Pop,14
2,Hip-Hop/Rap,13
3,Dance/Electronic,5
4,Alternative/Indie,4
5,R&B/Soul,2
6,Electro-pop,2


### **Key Insights** 

The dataset reveals a clear dominance of **Pop** (14 appearances, **28%**) and **Hip-Hop/Rap** (13 appearances, **26%**), highlighting their strong influence on contemporary mainstream music.

**Pop** leads with **14 appearances**, reinforcing its status as the most accessible and globally popular genre. Pop music is known for its **catchy melodies**, **structured songwriting**, and **broad audience reach**, making it a dominant force in streaming and radio play. The fact that Pop holds the top position suggests that the genre continues to evolve while maintaining its mass appeal.  

**Hip-Hop/Rap** closely follows with **13 appearances**, reflecting its continued **growth** and cultural influence. Over the past decade, Hip-Hop has solidified itself as a global powerhouse, driven by streaming dominance, viral trends, and its ability to **blend with other genres**. Its near-parity with Pop in this dataset suggests that rap music is **no longer** a niche genre but rather a mainstream pillar of the industry.  

Other genres such as **Dance/Electronic**, **Alternative/Indie**, **R&B/Soul**, and **Electro-pop** appear less frequently, though they still contribute to the diversity of the dataset. The presence of Dance/Electronic reflects the demand for **high-energy**, **club-friendly music**, while Alternative/Indie maintains a steady presence despite being less commercially dominant. R&B/Soul and Electro-pop, with only **two appearances each**, suggest that while these genres still produce successful hits, they **do not** dominate the charts to the same extent as Pop and Hip-Hop/Rap.  

### **Market Implications**  

This genre distribution underscores a few key industry trends. The overwhelming presence of **Pop** and **Hip-Hop/Rap** indicates that these genres have **adapted** well to streaming culture, playlist curation, and **viral marketing**. Meanwhile, the lower frequency of Alternative, R&B, and Electro-pop suggests that these genres may find success through **more dedicated fan bases** rather than mainstream saturation.  

The strong performance of **Dance/Electronic** suggests that high-energy music still holds a place in mainstream culture, particularly in club and festival settings. However, its lower representation suggests that pure electronic music may be **less dominant** in top streaming charts than hybrid genres that incorporate pop or rap elements.   

## **4.2 &nbsp; Identifying Unique Genres**

While some genres dominate the music charts, others appear less frequently, representing more niche or specialized musical styles. These unique genres contribute to the **diversity** of the dataset, showcasing the presence of less mainstream but still commercially successful songs.  

In this section, we filter the dataset to identify **genres that appear only once**. By analyzing these one-off genres, we can gain insights into how less common styles break into the mainstream, whether through **viral success**, **cross-genre appeal**, or **dedicated fanbases**. 

In [107]:
count = df["genre"].value_counts()

repeated_genres = count[count == 1].to_frame(name="appearances").reset_index()

repeated_genres.columns = ["genre", "appearances"]

repeated_genres.index += 1

repeated_genres

Unnamed: 0,genre,appearances
1,Nu-disco,1
2,R&B/Hip-Hop alternative,1
3,Pop/Soft Rock,1
4,Pop rap,1
5,Hip-Hop/Trap,1
6,Dance-pop/Disco,1
7,Disco-pop,1
8,Dreampop/Hip-Hop/R&B,1
9,Alternative/reggaeton/experimental,1
10,Chamber pop,1


The dataset includes **10 unique genres**, each appearing only once.

### **Key Insights**  

A significant trend in this list is the presence of **genre hybrids**, where multiple styles merge to create distinct sonic identities. Examples include **R&B/Hip-Hop Alternative**, **Dreampop/Hip-Hop/R&B**, and **Alternative/Reggaeton/Experimental**, all of which indicate that genre boundaries are increasingly **fluid** in modern music. This trend aligns with **streaming culture**, where listeners explore broader influences rather than adhering strictly to traditional categories.  

There is also a notable presence of **retro-inspired genres**, such as **Nu-disco**, **Dance-pop/Disco**, and **Disco-pop**. These genres draw inspiration from past musical movements while incorporating modern production techniques, reflecting a resurgence of **disco** and **funk** influences in contemporary pop and electronic music. This suggests that **nostalgia-driven** sounds remain commercially viable, particularly as artists continue to **reinterpret** and **modernize** older styles.  

Additionally, some genres emphasize **softer**, more **intricate** compositions, such as **Chamber Pop** and **Pop/Soft Rock**. These styles tend to focus on **orchestral elements**, **layered instrumentation**, and **atmospheric production**, differentiating them from the high-energy, beat-driven nature of mainstream hits.

### **Market Implications**   

The presence of these unique genres in the dataset highlights how niche and hybrid styles can still find mainstream **recognition**. The increasing prevalence of genre fusion reflects **changing listener preferences**, where audiences engage with music based on mood and sound rather than rigid genre classifications.  

For industry professionals, this underscores the importance of **adaptability** and **experimentation** in music production. Artists who blend elements from multiple genres can **expand their reach** and appeal to diverse audiences, while retro-inspired sounds continue to capture listener interest. Streaming platforms further support this trend, as curated playlists and **algorithm-driven recommendations** expose listeners to a wider range of genres than traditional radio ever could.   

&nbsp;

---

# **| &nbsp; 5 &nbsp; Examining Correlations Between Musical Attributes**   

Musical attributes do not exist in isolation; they interact in ways that shape the overall sound, feel, and impact of a song. Some features exhibit **strong positive correlations**, reinforcing each other, while others show **negative relationships**, indicating trade-offs in production and composition. At the same time, certain attributes remain largely **independent**, highlighting the flexibility and diversity of modern music.  

In this section, we explore the **three** key types of **correlations** among musical features:  

- **Strong Positive Correlations** – Identifying attributes that frequently appear together, which help define the characteristics of commercially successful songs.  
- **Strong Negative Correlations** – Highlighting **inverse relationships**, which indicate that certain production choices limit the presence of other features.  
- **Neutral or Weak Correlations** – Investigating **independent attributes** that do not strongly influence each other, allowing for greater creative freedom in songwriting and production.

## **5.1 &nbsp; Positive Correlations**  

Understanding how different musical features relate to one another provides valuable insights into the structure of popular songs. Certain attributes, such as **energy**, **loudness**, and **danceability**, are often closely linked, influencing the way a track is perceived by listeners. By analyzing correlations between numerical features, we can identify patterns that define commercially successful music.  
 
Strong correlations indicate that certain musical characteristics **tend to appear together**. Identifying these relationships helps us understand how song attributes **contribute** to a track’s overall appeal and how different features interact to shape the listening experience.  

In this section, we calculate the **correlation matrix** for all numerical attributes, excluding the musical key. To enhance readability, we focus only on **strong positive correlations** (above **0.4**), highlighting the most significant relationships between features. Perfect correlations (1.0) are omitted to avoid redundancy, allowing us to focus on meaningful connections between attributes.    

In [108]:
numeric_df = df.select_dtypes(include=["number"]).drop(columns=["key"])

corr_matrix = numeric_df.corr().round(3)

corr_matrix_positive = corr_matrix.mask(corr_matrix < 0.4, "").copy()

corr_matrix_positive = corr_matrix_positive.replace(1.0, "---")

corr_matrix_positive

Unnamed: 0,duration_sec,tempo,loudness,energy,danceability,valence,speechiness,instrumentalness,liveness,acousticness
duration_sec,---,,,,,,,,,
tempo,,---,,,,,,,,
loudness,,,---,0.792,,0.407,,,,
energy,,,0.792,---,,,,,,
danceability,,,,,---,0.48,,,,
valence,,,0.407,,0.48,---,,,,
speechiness,,,,,,,---,,,
instrumentalness,,,,,,,,---,,
liveness,,,,,,,,,---,
acousticness,,,,,,,,,,---


### **Key Insights**  

One of the **strongest** correlations in the dataset is between **loudness** and **energy** (0.792). This suggests that louder songs tend to have higher energy levels, which aligns with modern music production techniques where **[compression](https://www.izotope.com/en/learn/what-is-audio-compression.html?srsltid=AfmBOopG7anG3l3fD-syZ55H2P5fSb_S-92FHUj-yHyJVyIxh65qmj37)** and **[limiting](https://www.armadamusic.com/university/music-production-articles/how-to-use-limiters-limiting-explained)** are used to create more intense, impactful tracks. This correlation confirms that energetic songs are often mastered at higher loudness levels to maintain their intensity across different listening environments.  

There is also a **moderate correlation** between **loudness** and **valence** (0.407), indicating that brighter, more positive songs tend to be louder. This may be due to production choices, where uplifting tracks are often mixed to sound **fuller**and more dynamic. However, the correlation is not as strong as the one with energy, suggesting that while positivity influences loudness to some extent, it is **not** a defining factor.  

Another important relationship is the **correlation** between **danceability** and **valence** (0.48). This suggests that songs that are rhythmically engaging tend to have a more positive emotional tone. This makes sense, as danceable tracks are often designed for **enjoyment** and **social settings**, reinforcing the idea that happier songs are **more likely to encourage movement**.

### **What’s Missing?**  

Interestingly, **tempo** does not show strong correlations with any features in this filtered matrix, which suggests that a song's speed is **not necessarily** tied to its loudness, energy, or danceability in a linear way. While faster tempos may be common in dance tracks, the variability in other genres could explain why tempo does not strongly correlate with specific musical attributes in this dataset.  

Additionally, **speechiness**, **instrumentalness**, **liveness**, and **acousticness** do not appear in the filtered results, meaning they do not have strong positive correlations with other features above the 0.4 threshold. This suggests that these attributes **may function independently** or have more nuanced relationships that do not directly align with traditional energy, danceability, or loudness metrics.  
 
Future analysis could explore whether these trends hold across different genres or how correlations shift over time in evolving music trends.  

## **5.2 &nbsp; Negative Correlations**  

While strong positive correlations reveal how musical attributes reinforce each other, negative correlations are equally important in understanding which features tend to move in **opposite directions**.

In this section, we focus on **strong negative correlations** (below **-0.4**) while filtering out weaker relationships. By highlighting only the most significant inverse connections, we can better understand how certain musical features compete or **counterbalance** each other.  

In [109]:
corr_matrix_negative = corr_matrix.mask(
    (corr_matrix > -0.4) & (corr_matrix != 1.0), ""
).copy()

corr_matrix_negative = corr_matrix_negative.replace(1.0, "---")

corr_matrix_negative

Unnamed: 0,duration_sec,tempo,loudness,energy,danceability,valence,speechiness,instrumentalness,liveness,acousticness
duration_sec,---,,,,,,,,,
tempo,,---,,,,,,,,
loudness,,,---,,,,,-0.553,,-0.499
energy,,,,---,,,,,,-0.682
danceability,,,,,---,,,,,
valence,,,,,,---,,,,
speechiness,,,,,,,---,,,
instrumentalness,,,-0.553,,,,,---,,
liveness,,,,,,,,,---,
acousticness,,,-0.499,-0.682,,,,,,---


### **Key Insights**  

One of the most significant -yet predictable- negative correlations is likely between **loudness** and **acousticness**. This suggests that louder tracks **tend to be less acoustic**, which aligns with modern production techniques where heavily processed and electronically produced music is mastered at high volume levels. Conversely, acoustic-heavy tracks often maintain a more **natural** dynamic range, avoiding the extreme compression used to maximize loudness.   

Another strong negative correlation is found between **loudness** and **instrumentalness**, indicating that tracks with high loudness levels are **less likely** to be purely instrumental. This suggests that vocal-driven songs tend to be mastered louder than instrumental compositions, which may be due to commercial radio and streaming trends where high loudness **enhances vocal presence**. On the other hand, instrumental music often retains more dynamic range, meaning it does **not require** the same level of compression and loudness maximization as mainstream vocal tracks. This is common in **classical**, **ambient**, and **jazz music**, where a wider dynamic spectrum is preferred over sheer volume. 

Additionally, a potential negative correlation between **energy** and **acousticness** would suggest that high-energy tracks tend to rely on **synthetic** or **heavily processed** sounds rather than natural acoustic elements. This is common in genres like **EDM** and **pop**, where high-energy production is achieved through digital instrumentation rather than live recordings.  

### **What’s Missing?**  

Surprisingly, the table **doesn't** show an inverse relationship between **speechiness** and **instrumentalness** even though it would make sense. Songs with more **spoken elements** (such as rap and hip-hop) tend to have less instrumentation, often relying on **minimal beats** or **repetitive loops**, whereas instrumental-focused compositions prioritize complex arrangements over vocals.

## **5.3 &nbsp; Neutral or Weak Correlations**  

While strong positive and negative correlations highlight clear relationships between musical attributes, some features exhibit **little to no correlation** with each other. These weak or neutral correlations suggest that certain musical characteristics **vary independently**, meaning that changes in one attribute do not consistently influence another.  

For example, if tempo shows no strong correlation with danceability, it suggests that a song’s speed **alone** does not determine how rhythmically engaging it is. Similarly, if energy and acousticness have a weak relationship, it may indicate that songs can be **both** energetic and acoustic or electronic and mellow **without a clear pattern**.  

In this section, we **mask** all correlations **stronger than ±0.4**, allowing us to focus only on **features that do not exhibit a strong linear relationship**. By isolating these weak or neutral correlations, we can better understand the **independent nature of certain musical elements** and how they contribute uniquely to a song’s identity.  

In [110]:
corr_matrix_neutral = corr_matrix.mask(
    ((corr_matrix <= -0.4) | (corr_matrix >= 0.4)) & (corr_matrix != 1.0), ""
).copy()

corr_matrix_neutral = corr_matrix_neutral.replace(1.0, "---")

corr_matrix_neutral

Unnamed: 0,duration_sec,tempo,loudness,energy,danceability,valence,speechiness,instrumentalness,liveness,acousticness
duration_sec,---,0.129,0.066,0.084,-0.032,-0.039,0.368,0.183,-0.09,-0.013
tempo,0.129,---,0.102,0.075,0.169,0.047,0.215,0.019,0.026,-0.24
loudness,0.066,0.102,---,,0.167,,-0.021,,-0.07,
energy,0.084,0.075,,---,0.153,0.393,0.074,-0.386,0.07,
danceability,-0.032,0.169,0.167,0.153,---,,0.226,-0.018,-0.007,-0.359
valence,-0.039,0.047,,0.393,,---,0.055,-0.203,-0.033,-0.243
speechiness,0.368,0.215,-0.021,0.074,0.226,0.055,---,0.028,-0.143,-0.135
instrumentalness,0.183,0.019,,-0.386,-0.018,-0.203,0.028,---,-0.087,0.352
liveness,-0.09,0.026,-0.07,0.07,-0.007,-0.033,-0.143,-0.087,---,-0.129
acousticness,-0.013,-0.24,,,-0.359,-0.243,-0.135,0.352,-0.129,---


### **Key Insights**  

A significant takeaway is that **tempo** does **not** exhibit strong correlations with other features. While there is a slight positive relationship with **danceability** (0.169) and **speechiness** (0.215), the weak connection suggests that a song's speed alone is not the defining factor in making it more danceable or speech-heavy. This **contradicts** the assumption that faster songs are inherently more rhythmic or engaging, reinforcing the idea that **groove**, beat structure, and instrumental arrangement play a more crucial role than tempo alone.  

Another notable finding is that **loudness** shows weak connections to **energy** (0.075) and **danceability** (0.167). While extreme cases show a strong link between loudness and energy, the weak overall correlation suggests that **not all** loud songs are highly energetic, nor are all danceable songs **necessarily** mastered at high volumes. This highlights how production choices beyond loudness—such as **rhythm**, **bassline**, and instrumentation—impact a song's energy and dance appeal.  

A particularly interesting observation is that **acousticness** does **not** strongly correlate with most features. While it has a moderate negative correlation with **danceability** (-0.359) and **valence** (-0.359), its relationship with loudness, energy, and speechiness remains weak. This suggests that acoustic elements can be incorporated across a **wide spectrum** of musical styles, from soft ballads to high-energy folk songs, without following a strict pattern.  

Additionally, **instrumentalness** and **danceability** show no significant relationship (-0.018). This finding **challenges** the assumption that instrumental tracks are inherently **less danceable**, indicating that instrumental songs can still possess rhythmic structures that make them suitable for dancing.  

### **Implications for Music Production**  

The lack of strong correlations in this matrix underscores the **versatility and diversity of modern music production**. Unlike loudness, energy, or danceability, which often show clear relationships, these weak correlations indicate that **some musical elements function independently, allowing for greater creative freedom**.  

For example, an artist could create a **highly danceable slow-tempo** track or a **loud**, **yet non-energetic** song without contradicting traditional patterns. Similarly, acoustic elements can blend into **both energetic** and **mellow compositions**, demonstrating the genre-blending tendencies in contemporary music.  

&nbsp;

---

# **| &nbsp; 6 &nbsp; How Musical Features Vary Across Genres**

Music genres are more than just stylistic labels: they influence how songs are structured, produced, and experienced by listeners. Different genres emphasize distinct musical characteristics, shaping their **rhythmic patterns**, **emotional tone**, and **production techniques**. 

In this section, we examine how key musical attributes vary across four major genres: **Pop**, **Hip-Hop/Rap**, **Dance/Electronic**, and **Alternative/Indie**. By analyzing **danceability**, **loudness**, **acousticness**, **energy**, and **valence**, we gain insights into how each genre prioritizes movement, sonic intensity, organic instrumentation, emotional tone, and overall impact.  

### **Why This Analysis Matters?** 

Understanding genre-based differences in musical features provides a **data-driven perspective** on:  
- How production choices define a genre's sonic identity** (e.g., the high-energy nature of Dance/Electronic vs. the dynamic range of Alternative/Indie).  
- Which genres are more rhythmically engaging, emotionally uplifting, or acoustically rich.   

## **6.1 &nbsp; Comparing Danceability**  

In this section, we analyze **danceability** statistics to identify whether certain genres consistently score higher or if there is significant variation within each category.  

This analysis provides insights into how different musical styles balance rhythm and movement, helping us understand **which genres** are more structured for **dancing** and which prioritize other musical elements.

In [111]:
filtered_df = df[df["genre"].isin([
    "Pop",
    "Hip-Hop/Rap",
    "Dance/Electronic",
    "Alternative/Indie"
])]

genre_danceability = filtered_df.groupby("genre")["danceability"].describe().round(3)

genre_danceability.index.name = None

genre_danceability

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Alternative/Indie,4.0,0.662,0.211,0.459,0.49,0.663,0.834,0.862
Dance/Electronic,5.0,0.755,0.095,0.647,0.674,0.785,0.789,0.88
Hip-Hop/Rap,13.0,0.766,0.085,0.598,0.726,0.774,0.83,0.896
Pop,14.0,0.678,0.11,0.464,0.616,0.69,0.763,0.806


### **Key Insights**  

- **Hip-Hop/Rap** and **Dance/Electronic** have the **most consistently hih** danceability scores with relatively low standard deviation, reinforcing their rhythmic and beat-driven nature. These genres are commonly designed for clubs, parties, and active listening, explaining their emphasis on strong, consistent beats.
  
- **Pop music** exhibits a slightly lower but still **relatively high** danceability score. With a median of 0.690 and a minimum score of 0.464, it suggests that while most of the pop songs are dance-friendly, the genre also includes ballads and mid-tempo tracks that reduce overall rhythmic consistency.

- **Alternative/Indie** has the **lowest** danceability score, but the **highes** standard deviation and a maximum score of 0.862, showing the **most variation** within the genre. This suggests that some Alternative/Indie tracks are highly danceable, while others focus on complex arrangements, unconventional rhythms, or introspective compositions that are less suited for dancing.    

## **6.2 &nbsp; Comparing Loudness**  

In this section, we analyze **loudness** statistics across the same major genres to understand how different musical styles balance volume and dynamics, aiming to identify whether certain genres tend to be **consistently** louder or if they exhibit greater **variation** in loudness levels. 

In [112]:
genre_loudness = filtered_df.groupby("genre")["loudness"].describe().round(2)

genre_loudness.index.name = None

genre_loudness

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Alternative/Indie,4.0,-5.42,0.77,-6.4,-5.86,-5.27,-4.83,-4.75
Dance/Electronic,5.0,-5.34,1.48,-7.57,-5.65,-5.46,-4.26,-3.76
Hip-Hop/Rap,13.0,-6.92,1.89,-8.82,-8.52,-7.65,-5.62,-3.37
Pop,14.0,-6.46,3.01,-14.45,-7.18,-6.64,-3.87,-3.28


### **Key Insights**  
- **Dance/Electronic** and **Alternative/Indie** are the **loudest** genres on average. This suggests that both genres favor **strong, present soundscapes**, though for different reasons:  
   - **Dance/Electronic** music is designed for **clubs** and **festivals**, where loudness enhances rhythmic intensity.  
   - **Alternative/Indie** often includes **rock-influenced production**, where punchy sound is favored while still maintaining some dynamic range.  
&nbsp;

- **Pop** music shows the **highest variability** in loudness (std = 3.01), with a minimum of -14.45 dB and a maximum of -3.28 dB. This suggests that Pop is the most sonically diverse genre, encompassing both soft ballads with lower loudness and radio-friendly hits that are **heavily compressed** for loudness consistency.  

- **Hip-Hop/Rap** has the **lowest** average loudness, indicating that rap tracks may allow for more dynamic range compared to highly compressed pop or electronic tracks. However, the maximum value of -3.37 dB suggests that some tracks are mastered **much louder** than others, likely depending on **subgenres** and production styles.  
 
From the music production perspective, while high loudness enhances **energy** and **presence**, it can also reduce **dynamic range**, making the balance between volume and musical detail an important factor to consider.

## **6.3 &nbsp; Comparing Acousticness**   

Now it's time to analyze **acousticness** statistics across the four genres, to understand how different musical styles balance live instrumentation, acoustic arrengements and artificial sound elements. 

In [113]:
genre_acousticness = filtered_df.groupby("genre")["acousticness"].describe().round(3)

genre_acousticness.index.name = None

genre_acousticness

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Alternative/Indie,4.0,0.584,0.204,0.291,0.526,0.646,0.704,0.751
Dance/Electronic,5.0,0.099,0.096,0.014,0.015,0.069,0.177,0.223
Hip-Hop/Rap,13.0,0.189,0.186,0.005,0.067,0.145,0.234,0.731
Pop,14.0,0.324,0.318,0.021,0.06,0.259,0.348,0.902


### **Key Insights**  
- **Alternative/Indie** has the **highest** average acousticness (0.584), confirming that this genre leans towards **organic instrumentation**, **live recordings**, and minimal electronic manipulation. The interquartile range (0.526 - 0.704) suggests that most tracks in this category maintain a moderate-to-high level of acoustic elements.

- **Dance/Electronic** has the **lowest** acousticness, reinforcing its **heavily synthesized**, **digitally-produced** nature. The extremely low interquartile range (0.015 - 0.177) confirms that **nearly all** tracks in this genre are electronic, with little to no acoustic instrumentation.  

- **Hip-Hop/Rap** also scores **low** on acousticness, but it has a slightly wider range than Dance/Electronic. While most tracks in this genre rely on **beat-driven production** and **digital elements**, some outliers (Max: 0.731) incorporate live instruments, **[sampling](https://www.centralmusicinstitute.com/blog/what-is-sampling-in-music#:~:text=Sampling%20in%20music%20involves%20taking,creation%20of%20a%20new%20track.)**, or acoustic elements. This suggests that while Hip-Hop/Rap is generally electronic-heavy, it allows for more **diversity** in production styles than pure Dance music.  

- **Pop** sits between these extremes, showcasing a **broad mix** of acoustic and electronic production. The **high standard deviation** (0.318) and wide range (Min: 0.021, Max: 0.902) confirms that Pop music is the most **sonically versatile** genre.

As the music industry evolves, it will be interesting to see whether technology-driven production continues to dominate or if there is a resurgence of acoustic, live-recorded music in mainstream hits.  

## **6.4 &nbsp; Comparing Energy**    

It the turn of **energy**, across the same genres. The goal is to understand how different musical styles balance **high-impact**, **adrenaline-driven** production production and more **restrained**, **mellow** compositions.  

In [114]:
genre_energy = filtered_df.groupby("genre")["energy"].describe().round(3)

genre_energy.index.name = None

genre_energy

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Alternative/Indie,4.0,0.551,0.1,0.405,0.532,0.584,0.602,0.631
Dance/Electronic,5.0,0.762,0.051,0.72,0.721,0.751,0.774,0.844
Hip-Hop/Rap,13.0,0.583,0.112,0.427,0.49,0.574,0.69,0.745
Pop,14.0,0.587,0.186,0.225,0.481,0.557,0.724,0.855


### **Key Insights**  

- **Dance/Electronic** has the **highest** average energy, confirming its focus on **high-impact, rhythmic intensity**. With a minimum energy score of 0.720, this genre maintains a consistently high level of energy, which is essential for club, festival, and **high-tempo** listening experiences.
  
- **Hip-Hop/Rap** and **Pop** show similar energy levels, suggesting that both genres balance high-energy production with more mellow, dynamic variations. However, Pop exhibits the highest standard deviation (0.186), meaning that it covers a **broader range** of energy levels. This reflects Pop’s **genre-blending** tendencies, incorporating elements from both acoustic and electronic production styles.
On the Hip-Hop/Rap side, it reflects differences in subgenres such as **trap, boom-bap, and melodic rap**.

- **Alternative/Indie** has the **lowest** average energy (0.551), reinforcing its tendency towards more **laid-back**, **introspective**, and dynamically diverse compositions. The **narrow range** suggests that most Alternative/Indie tracks maintain moderate energy levels, avoiding extreme highs or lows.  

## **6.5 &nbsp; Comparing Valence**    

Finally, we analyze **valence** across the four major genres to understand how different musical styles convey emotions. This analysis aims to determine whether certain genres tend to be **consistently upbeat** or if they exhibit a **wider emotional spectrum**, revealing which one prioritize **feel-good**, **energetic vibes** and which lean toward **introspection** and emotional depth.   

In [115]:
genre_valence = filtered_df.groupby("genre")["valence"].describe().round(3)

genre_valence.index.name = None

genre_valence

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Alternative/Indie,4.0,0.502,0.272,0.183,0.38,0.493,0.615,0.841
Dance/Electronic,5.0,0.704,0.23,0.33,0.664,0.746,0.884,0.894
Hip-Hop/Rap,13.0,0.499,0.226,0.06,0.349,0.457,0.642,0.845
Pop,14.0,0.563,0.171,0.218,0.542,0.571,0.636,0.874


### **Key Insights**  

- **Dance/Electronic** has the highest average valence (0.704), suggesting that this genre is generally the **most upbeat** and **feel-good**. The interquartile range (0.664 - 0.884) confirms that most Dance/Electronic tracks maintain a high level of positivity, aligning with their purpose as energetic, uplifting music for clubs and social settings.  

- **Pop music** also leans towards higher valence, though it shows more variation (Std: 0.171). This suggests that while many Pop songs are upbeat, the genre also includes more **emotionally diverse** tracks, ranging from **cheerful anthems** to **mid-tempo** and introspective compositions.  

- **Alternative/Indie** and **Hip-Hop/Rap** exhibit **lower** valence levels, meaning they feature a more **balanced mix of happy and melancholic songs**. Alternative/Indie, in particular, has the **widest spread (Std: 0.272)**, reflecting its reputation for emotional depth and **storytelling**.  

- **Hip-Hop/Rap** shows the **lowest minimum** valence (0.060), indicating that it contains the most emotionally intense and **darker** tracks. However, with a maximum valence of 0.845, the genre also includes upbeat, **celebratory songs**, reinforcing the diverse emotional range within Hip-Hop/Rap.   


# **| &nbsp; 7 &nbsp; Final Conclusion: Insights from the Spotify Top 50 Tracks Dataset**

This analysis has provided a comprehensive, data-driven exploration of the key attributes that define the most popular songs on Spotify. From dataset cleaning and structuring to deep dives into correlations, genre-based differences, and extreme values, we have uncovered key trends in music production, genre preferences, and listener engagement.  

## **7.1 &nbsp; Main Findings From the Analysis**  

#### **1. Extreme Values in Musical Features**  

- Songs with the **highest** and **lowest** danceability confirmed that rhythmic appeal is a key factor in hit songs, though a few highly successful tracks defied this trend.  
- **Loudness analysis** showed that most hit songs are heavily compressed for loudness consistency, reinforcing the modern mastering trend of high-volume production.  
- The **longest** and **shortest tracks** indicated that while concise, radio-friendly formats dominate, some longer compositions still achieve success.  

#### **2. Correlation Analysis: How Musical Features Interact**  

- **Loudness** and **energy** had the strongest correlation, emphasizing that high-energy tracks tend to be louder and more dynamically intense.  
- **Danceability** and **valence** showed a moderate correlation, suggesting that happier songs are often more danceable, though exceptions exist.  
- **Tempo** had little correlation with **danceability**, challenging the assumption that faster songs are inherently more engaging for movement.  
- Negative correlations such as **loudness** vs. **instrumentalness** confirmed that highly produced, loud tracks tend to be more vocal-heavy, while softer, instrumental-based compositions retain more dynamic range.  

#### **3. Genre-Based Musical Feature Analysis**  

- **Dance/Electronic** leads in danceability and energy, reinforcing its role as a club-oriented genre.  
- **Alternative/Indie** has the highest acousticness, relying heavily on live instrumentation and raw production.  
- **Hip-Hop/Rap** shows the widest emotional contrast, ranging from high-energy, celebratory anthems to introspective, moody tracks.  
- **Pop music** remains the most sonically diverse genre, incorporating elements from multiple styles, allowing it to balance high-energy hits with softer, acoustic-driven songs.  



## **7.2 &nbsp; Next Steps: Further Areas for Exploration**  

While this analysis uncovered meaningful insights into the structure and composition of hit songs, several areas remain open for deeper exploration:  

#### **1. Temporal Trends in Music Evolution**  

- How have **danceability**, **loudness**, **energy**, and **track length** evolved over time?  
- Has the dominance of certain **genres** changed in response to listener preferences?  
- Are songs becoming **shorter**, **louder**, or more **rhythmically engaging** over the years?  

#### **2. Listener Engagement and Streaming Performance**  

- How do features like **speechiness**, **valence**, and **instrumentalness** affect playlist placement and viral success?  
- Are certain musical characteristics more likely to succeed on **social media** platforms like *TikTok*?  

#### **3. Regional and Cultural Differences**  

- Do **different regions** favor certain musical characteristics over others?  
- Are some **genres** more dominant in specific **markets**, and how does this influence **feature distributions**?  
- How do **collaborations** between artists from different genres or regions impact a song’s musical attributes?  

#### **4. Predictive Modeling for Hit Songs**  

- Can a **machine learning model** predict a song’s success based on its musical features?  
- What are the most significant **predictors** of chart performance?  
- Can we build a model that suggests **optimal feature combinations** for maximizing listener engagement?  

&nbsp;

---

&nbsp;

As the music industry continues evolving, future research into **listener behavior**, **AI-driven recommendations**, and **cross-genre experimentation** will further shape how music is created, discovered, and consumed in the streaming era.  


&nbsp;
