🎧 Spotify Tracks EDA — Exploratory Data Analysis

A detailed exploratory data analysis on Spotify track data to uncover musical trends, popularity patterns, and the characteristics that make songs successful.

📘 Project Overview

This project focuses on Exploratory Data Analysis (EDA) of Spotify track data. It aims to identify relationships between audio features (like energy, valence, danceability) and a song’s popularity, uncovering trends across decades and genres.

The study reveals how modern music differs from older eras — becoming shorter, more energetic, less acoustic, and more dance-focused — while offering insights useful for music producers, analysts, and data scientists.

🎯 Objectives

The primary goals of this analysis were:

To understand the distribution of key musical features (popularity, energy, danceability, valence, etc.)
To explore relationships between popularity and musical characteristics.
To examine temporal trends (year-wise analysis).
To identify top-performing tracks and artists.
To derive recommendations for model design or music success strategies.

📊 Dataset Summary

Feature	Description
`track_id`	Unique Spotify track identifier
`track_name`	Song title
`artist_name`	Artist name
`year`	Year of release
`popularity`	Popularity score (0–100)
`danceability`	Suitability for dancing
`energy`	Intensity of the song
`valence`	Positiveness or musical mood
`acousticness`	Degree of acoustic elements
`instrumentalness`	Likelihood of instrumental-only
`speechiness`	Spoken word presence
`tempo`	Beats per minute (BPM)
`duration_ms`	Song length in milliseconds
`language`	Detected lyric language
`loudness`, `mode`, `key`, `time_signature`	Musical composition features

🔍 Univariate Analysis

Popularity → Highly skewed; most songs have low scores, few are global hits.
Danceability → Left-skewed; majority are rhythmic and danceable.
Energy → Concentrated near 1.0; most tracks are high-energy.
Valence → Bimodal; both happy and neutral/sad songs exist widely.
Tempo → Bimodal peaks around 120 and 140 BPM.
Duration → Mostly 3–5 mins; extreme outliers exist.
Loudness → Unreliable feature due to data errors.
Instrumentalness → Dominated by vocal tracks (≈0).

🔗 Bivariate Analysis

Popularity vs Danceability → Popular songs have Danceability > 0.4.
Popularity vs Energy → Low Energy = low popularity; high Energy ≠ guaranteed success.
Popularity vs Duration → Negative correlation; short songs perform better.
Popularity vs Valence → Mood doesn’t predict hits.
Energy vs Loudness → Highly correlated; redundant.
Danceability vs Tempo → Weak relationship; tempo not a predictor.
Energy vs Popularity Category → High/Medium hits have higher energy.
Danceability vs Popularity Category → Hits more danceable and consistent.
Valence vs Popularity Category → Mood uniform across categories.

🧠 Correlation Insights

Feature	Correlation with Popularity
Energy	0.15
Valence	0.15
Danceability	0.14
Acousticness	-0.14
Instrumentalness	-0.13

✅ Recommendation: Drop redundant pairs — Energy-Loudness (0.90), Valence-Danceability (0.64).

📅 Temporal Trends

Energy ↑ — Music became more intense since 1970s.
Valence ↓ — Songs became less positive post-1995.
Danceability ↑ — Spike after Disco era (1980s).
Speechiness ↑ (2020s) — Rise in rap/spoken content.
Acousticness ↓ — Shift to electronic music post-1990s.
Duration ↓ — Songs shorter in streaming age.
Popularity Distribution — >80% of tracks are low popularity.

🌍 Language & Artist Analysis

Korean tracks have highest average popularity (~27.5).
Hindi ranks next (~18.5); Malayalam lowest (~7).
Artists by Volume: Shankar Mahadevan, Ramin Djawadi, Alan Silvestri (film composers).
Artists by Popularity: Rihanna, Taylor Swift, Kendrick Lamar, LE SSERAFIM (mainstream hits).

💡 Key Takeaways

✅ Use Classification Models: Predict “Hit” or “Miss” instead of regression on scores.
🧹 Data Cleaning: Remove soundtrack composers and outliers.
⚡ Apply Feature Filters: Exclude songs with very low energy/danceability.
🎭 Combine Mood Features: Use interaction of energy × valence.
🔁 Feature Reduction: Drop redundant variables (e.g., Loudness).

⚙️ Tech Stack

Language: Python 3.11
Libraries: Pandas, NumPy, Matplotlib, Seaborn, Plotly
Environment: Jupyter Notebook

🚀 Run Instructions

git clone https://github.com/yourusername/spotify-eda.git
cd spotify-eda
pip install -r requirements.txt
jupyter notebook

📈 Future Work

Build classification model for hit prediction.
Add time-series trend forecasting.
Sentiment analysis using lyrics.
Create interactive dashboard (Streamlit/Dash).

👤 Author

MANISH KUMAR ROY

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
Spotify.pptx		Spotify.pptx
spotify_data_description.csv		spotify_data_description.csv
spotify_eda.ipynb		spotify_eda.ipynb
spotify_tracks.csv		spotify_tracks.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎧 Spotify Tracks EDA — Exploratory Data Analysis

📘 Project Overview

🎯 Objectives

📊 Dataset Summary

🔍 Univariate Analysis

🔗 Bivariate Analysis

🧠 Correlation Insights

📅 Temporal Trends

🌍 Language & Artist Analysis

💡 Key Takeaways

⚙️ Tech Stack

🚀 Run Instructions

📈 Future Work

👤 Author

About

Uh oh!

Releases

Packages

Languages

manishslg/Spotify_EDA

Folders and files

Latest commit

History

Repository files navigation

🎧 Spotify Tracks EDA — Exploratory Data Analysis

📘 Project Overview

🎯 Objectives

📊 Dataset Summary

🔍 Univariate Analysis

🔗 Bivariate Analysis

🧠 Correlation Insights

📅 Temporal Trends

🌍 Language & Artist Analysis

💡 Key Takeaways

⚙️ Tech Stack

🚀 Run Instructions

📈 Future Work

👤 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages