# Decoding the Beat
## A Data-Driven Insight into Spotify's Recommendation System

### Objective
- Analyze Spotify's track features
- Clean and preprocess the data
- Explore patterns in genres and audio features
- Generate visual insights for recommendation logic

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

### Load Dataset

In [None]:
df = pd.read_csv("SpotifyFeatures.csv")
df.head()

### Data Cleaning
- Drop unnecessary columns
- Remove missing values and duplicates

In [None]:
df.drop(['track_id', 'track_name', 'artist_name'], axis=1, inplace=True)
df.dropna(inplace=True)
df.drop_duplicates(inplace=True)

### Feature Engineering
- One-hot encoding genres

In [None]:
df = pd.get_dummies(df, columns=['genre'], drop_first=True)

### Handle Outliers using IQR for 'tempo'

In [None]:
Q1 = df['tempo'].quantile(0.25)
Q3 = df['tempo'].quantile(0.75)
IQR = Q3 - Q1
df = df[(df['tempo'] >= Q1 - 1.5 * IQR) & (df['tempo'] <= Q3 + 1.5 * IQR)]

### Summary Statistics

In [None]:
df.describe()

### Visualizations

In [None]:
genre_cols = [col for col in df.columns if col.startswith('genre_')]
genre_sums = df[genre_cols].sum().sort_values(ascending=False)
plt.figure(figsize=(10, 6))
genre_sums.plot(kind='bar')
plt.title('Genre Distribution')
plt.xlabel('Genre')
plt.ylabel('Number of Tracks')
plt.show()

In [None]:
plt.figure(figsize=(14, 10))
sns.heatmap(df.corr(), annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

### Conclusion
- Dataset cleaned and engineered
- Genre and feature patterns analyzed
- Visual insights created for understanding Spotify recommendations