# ðŸŽµ Music Genre Classification using Neural Networks

## Project Overview
This project focuses on classifying music tracks into different genres using
machine learning techniques. By extracting meaningful audio features from music
files and training a neural network model, we aim to predict the genre of a given
audio sample accurately.

### Objectives
- Extract relevant audio features from music files
- Preprocess and normalize the dataset
- Train a neural network for multi-class classification
- Evaluate model performance using accuracy and confusion matrix
- Predict the genre of unseen audio samples

## Import Required Libraries

This cell imports all necessary Python libraries used throughout the project:

- **NumPy & Pandas**: data manipulation and numerical operations
- **Librosa**: audio processing and feature extraction
- **Scikit-learn**: preprocessing, encoding, and evaluation
- **TensorFlow / Keras**: building and training the neural network
- **Matplotlib & Seaborn**: data visualization

These libraries together form a standard stack for audio-based machine learning.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import confusion_matrix, classification_report

import tensorflow as tf
from tensorflow.keras import models, layers

print("Libraries Loaded Successfully!")

## Load the Dataset

The dataset contains extracted audio features from 30-second music clips.
Each row represents one audio track, and each column corresponds to a specific
audio feature or metadata.

We load the dataset using Pandas to enable easy data exploration and preprocessing.

In [None]:
#2: Load and Clean the Data
# Path to the dataset on Kaggle
data_path = '/kaggle/input/gtzan-dataset-music-genre-classification/Data/features_3_sec.csv'
df = pd.read_csv(data_path)

# Drop columns that aren't 'features'
# We drop 'filename' and 'length' (they don't help identify genre)
data = df.drop(columns=['filename', 'length'])

print(f"Dataset Shape: {data.shape}")
data.head() # Shows the first 5 rows

## Separate Features (X) and Labels (y)

- **X** contains all numerical audio features
- **y** contains the target variable: music genre

Separating features and labels is a standard practice in supervised learning
and prepares the data for encoding and scaling.

In [None]:
#3: Feature Scaling and Encoding

# 1. Encode the Labels (Genres)
genre_list = data.iloc[:, -1]
encoder = LabelEncoder()
y = encoder.fit_transform(genre_list)

# 2. Scale the Features
scaler = StandardScaler()
X = scaler.fit_transform(np.array(data.iloc[:, :-1], dtype = float))

# 3. Split into Training (80%) and Testing (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training samples: {len(X_train)}")
print(f"Testing samples: {len(X_test)}")

## Neural Network Architecture

We construct a **fully connected neural network** with the following structure:

- Input layer: audio feature vector
- Hidden layers with ReLU activation for non-linearity
- Dropout layers to reduce overfitting
- Output layer with Softmax activation for multi-class classification

This architecture allows the model to learn complex, non-linear patterns
present in music audio features.

In [None]:
#4: Building the Neural Network

model = models.Sequential([
    # Input Layer
    layers.Dense(512, activation='relu', input_shape=(X_train.shape[1],)),
    layers.Dropout(0.3),
    
    # Hidden Layers
    layers.Dense(256, activation='relu'),
    layers.Dropout(0.3),
    
    layers.Dense(128, activation='relu'),
    layers.Dense(64, activation='relu'),
    
    # Output Layer (10 neurons for 10 genres)
    layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.summary() # This shows the "map" of your AI

In [None]:
# 5: Training the Model

history = model.fit(X_train, y_train, 
                    epochs=50, 
                    batch_size=32, 
                    validation_data=(X_test, y_test))

## Training and Validation Performance

This cell visualizes:
- Training accuracy vs. validation accuracy
- Training loss vs. validation loss

These plots help diagnose:
- Overfitting
- Underfitting
- Training stability

In [None]:
# 6: Visualizing Results

plt.figure(figsize=(12, 5))

# Plot Accuracy
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Train Accuracy', color='blue')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy', color='orange')
plt.title('Accuracy Evolution')
plt.legend()

# Plot Loss
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Train Loss', color='blue')
plt.plot(history.history['val_loss'], label='Validation Loss', color='orange')
plt.title('Loss Evolution')
plt.legend()

plt.show()

## Confusion Matrix Analysis

The confusion matrix shows how well the model predicts each genre.
It highlights:
- Correct classifications (diagonal)
- Misclassifications between similar genres

This analysis helps identify which genres are most difficult
for the model to distinguish.

In [None]:
# 7: The Confusion Matrix

y_pred = np.argmax(model.predict(X_test), axis=1)
cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=encoder.classes_, 
            yticklabels=encoder.classes_)
plt.title('Confusion Matrix: Predicted vs Actual Genres')
plt.xlabel('Predicted Genre')
plt.ylabel('Actual Genre')
plt.show()

In [None]:
def predict_genre(audio_path):
    # 1. Load the audio (first 3 seconds to match training)
    y, sr = librosa.load(audio_path, duration=3)
    
    # 2. Extract MFCCs (The features our model expects)
    mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=20)
    
    # 3. Calculate the mean of these features across time
    mfcc_scaled = np.mean(mfcc.T, axis=0)
    
    # 4. We need to match the exact same number of features used in our training CSV
    # The GTZAN CSV has 57 features (Chroma, RMS, Spectral Centroid, etc.)
    # For a quick test, we will grab a random sample from our Test Set instead:
    print("Processing audio features...")

# Let's pick a random song from your test set to see it in action!
import random
random_index = random.randint(0, len(X_test)-1)
sample_to_test = X_test[random_index].reshape(1, -1)

# Predict
prediction = model.predict(sample_to_test)
predicted_index = np.argmax(prediction)
actual_index = y_test[random_index]

print(f"\n--- TEST RESULT ---")
print(f"Model Prediction: {encoder.classes_[predicted_index]}")
print(f"Actual Genre: {encoder.classes_[actual_index]}")

In [None]:
import IPython.display as ipd

# Path to a random song in the dataset
sample_song = "/kaggle/input/samples/test_dj.mp3"

# Play it!
print("Playing Sample...")
ipd.display(ipd.Audio(sample_song))

# Note: In a real presentation, should run your prediction code right after this!

## Predict Genre for New Audio Samples

This section demonstrates how to:
1. Load a new audio file
2. Extract the same audio features used during training
3. Apply feature scaling
4. Predict the music genre using the trained model

This step simulates a real-world application of the system.

In [None]:
import librosa
import numpy as np
import IPython.display as ipd
import matplotlib.pyplot as plt

def final_system_predict(file_path):
    # 1. Load the audio
    y, sr = librosa.load(file_path, offset=10, duration=3)
    
    # 2. Extract Basic Features
    chroma_stft = librosa.feature.chroma_stft(y=y, sr=sr)
    rms = librosa.feature.rms(y=y)
    spec_cent = librosa.feature.spectral_centroid(y=y, sr=sr)
    spec_bw = librosa.feature.spectral_bandwidth(y=y, sr=sr)
    rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr)
    zcr = librosa.feature.zero_crossing_rate(y)
    
    # Use np.mean().item() to ensure we get a single float, not a numpy array
    features = [
        np.mean(chroma_stft).item(), np.var(chroma_stft).item(), 
        np.mean(rms).item(), np.var(rms).item(),
        np.mean(spec_cent).item(), np.var(spec_cent).item(), 
        np.mean(spec_bw).item(), np.var(spec_bw).item(),
        np.mean(rolloff).item(), np.var(rolloff).item(), 
        np.mean(zcr).item(), np.var(zcr).item()
    ]
    
    # 3. Harmonic, Percussive, and Tempo
    harmony, perceptr = librosa.effects.hpss(y)
    tempo, _ = librosa.beat.beat_track(y=y, sr=sr)
    
    # Ensure tempo is a single float
    if isinstance(tempo, np.ndarray):
        tempo = tempo[0]

    features.extend([
        np.mean(harmony).item(), np.var(harmony).item(),
        np.mean(perceptr).item(), np.var(perceptr).item(),
        float(tempo) # Force tempo to be a single number
    ])
    
    # 4. MFCCs (20 MFCCs * 2 = 40 features)
    mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=20)
    for m in mfcc:
        features.append(np.mean(m).item())
        features.append(np.var(m).item())

    # Check for consistency
    print(f"Features extracted: {len(features)}") 

    # 5. Convert to clean Numpy array and Scale
    features_np = np.array(features, dtype=float).reshape(1, -1)
    features_scaled = scaler.transform(features_np) 
    
    # 6. Predict
    prediction = model.predict(features_scaled, verbose=0)
    predicted_genre = encoder.classes_[np.argmax(prediction)]
    confidence = np.max(prediction) * 100
    
   # print(f"\n--- RESULT: {predicted_genre.upper()} ({confidence:.2f}%) ---")
   # ipd.display(ipd.Audio(file_path))
    
    print("-" * 30)
    print(f"SYSTEM ANALYSIS COMPLETE")
    print(f"PREDICTED GENRE: {predicted_genre.upper()}")
    print(f"CONFIDENCE: {confidence:.2f}%")
    print("-" * 30)
    
    # Play the audio so the professor can hear it
    ipd.display(ipd.Audio(file_path))

# TEST IT LIVE
# You can use any file path from the dataset or your own uploaded .wav file
my_song = "/kaggle/input/samples/hiphop.mp3"
final_system_predict(my_song)