# **Music Genre Classification**

## **Introduction**

**_This project involves building a music genre classifier using various machine learning techniques. The goal is to classify audio tracks into different genres such as Classical, Metal, Pop, Hip-Hop, and Rock. The project leverages multiple machine learning classifiers, including K-Nearest Neighbors, Random Forest, CatBoost, and XGBoost, along with simple Neural Networks. The performance and accuracy of these models are evaluated and visualized through various graphs._**

## **Libraries and Modules**

In [None]:
import numpy as np  # type: ignore 
import pandas as pd  # type: ignore 
import sklearn  # type: ignore 
import matplotlib  # type: ignore 
import matplotlib.pyplot as plt  # type: ignore 
import seaborn as sns  # type: ignore 
import plotly  # type: ignore 
import plotly.graph_objs as go  # type: ignore 
import librosa  # type: ignore 
import librosa.display  # type: ignore 
import catboost as cb  # type: ignore 
import tensorflow.keras as keras  # type: ignore 

from IPython.display import Audio  # type: ignore 
from sklearn import preprocessing  # type: ignore 
from sklearn.model_selection import train_test_split  # type: ignore 
from sklearn.metrics import accuracy_score  # type: ignore 
from sklearn.neighbors import KNeighborsClassifier  # type: ignore  
from sklearn.ensemble import RandomForestClassifier  # type: ignore  
from xgboost import XGBClassifier  # type: ignore 
from tensorflow.keras import Sequential  # type: ignore 
from tensorflow.keras.layers import Flatten, Dense, BatchNormalization, Dropout  # type: ignore 

print("NumPy Version:", np.__version__)  # type: ignore 
print("Pandas Version:", pd.__version__)  # type: ignore 
print("Matplotlib Version:", matplotlib.__version__)  # type: ignore 
print("Seaborn Version:", sns.__version__)  # type: ignore 
print("Plotly Version:", plotly.__version__)  # type: ignore 
print("Librosa Version:", librosa.__version__)  # type: ignore
print("Scikit-learn Version:", sklearn.__version__)  # type: ignore  

import warnings
warnings.filterwarnings("ignore")

## **Importing Dataset**

In [None]:
data = pd.read_csv('Data/file.csv') 
data.head(5)

## **Exploratory Data Analysis**

In [None]:
data['label'].value_counts()

### **Classical**

In [None]:
path = 'Data/genres_original/classical/classical.00000.wav'
x, sr = librosa.load(path)

fig = go.Figure()

fig.add_trace(go.Scatter(
    x = np.arange(len(x)) / sr,
    y = x,
    mode = 'lines',
    line = dict(color = 'blue'),
    name = 'Waveform'
))

fig.update_layout(
    title = 'Waveform of classical.00000.wav',
    xaxis_title = 'Time (s)',
    yaxis_title = 'Amplitude',
    width = 1000,
    height = 400
)

fig.show()
display(Audio(path))

### **Hip-Hop**

In [None]:
path = 'Data/genres_original/hiphop/hiphop.00000.wav'
x, sr = librosa.load(path)

fig = go.Figure()

fig.add_trace(go.Scatter(
    x = np.arange(len(x)) / sr,
    y = x,
    mode = 'lines',
    line = dict(color = 'green'),
    name = 'Waveform'
))

fig.update_layout(
    title = 'Waveform of hiphop.00000.wav',
    xaxis_title = 'Time (s)',
    yaxis_title = 'Amplitude',
    width = 1000,
    height = 400
)

fig.show()
display(Audio(path))

### **Metal**

In [None]:
path = 'Data/genres_original/metal/metal.00000.wav'
x, sr = librosa.load(path)

fig = go.Figure()

fig.add_trace(go.Scatter(
    x = np.arange(len(x)) / sr,
    y = x,
    mode = 'lines',
    line = dict(color = 'orange'),
    name = 'Waveform'
))

fig.update_layout(
    title = 'Waveform of metal.00000.wav',
    xaxis_title = 'Time (s)',
    yaxis_title = 'Amplitude',
    width = 1000,
    height = 400
)

fig.show()
display(Audio(path))

### **Pop**

In [None]:
path = 'Data/genres_original/pop/pop.00000.wav'
x, sr = librosa.load(path)

fig = go.Figure()

fig.add_trace(go.Scatter(
    x = np.arange(len(x)) / sr,
    y = x,
    mode = 'lines',
    line = dict(color = 'purple'),
    name = 'Waveform'
))

fig.update_layout(
    title = 'Waveform of pop.00000.wav',
    xaxis_title = 'Time (s)',
    yaxis_title = 'Amplitude',
    width = 1000,
    height = 400
)

fig.show()
display(Audio(path))


### **Rock**

In [None]:
path = 'Data/genres_original/rock/rock.00000.wav'
x, sr = librosa.load(path)

fig = go.Figure()

fig.add_trace(go.Scatter(
    x = np.arange(len(x)) / sr,
    y = x,
    mode = 'lines',
    line = dict(color = 'red'),
    name = 'Waveform'
))

fig.update_layout(
    title = 'Waveform of rock.00000.wav',
    xaxis_title = 'Time (s)',
    yaxis_title = 'Amplitude',
    width = 1000,
    height = 400
)

fig.show()
display(Audio(path))

### **Heatmap**

In [None]:
spike_cols = [col for col in data.columns if 'mean' in col] 
f, ax = plt.subplots(figsize=(16, 11)); 

sns.heatmap(data[spike_cols].corr(), cmap = 'YlGn') 
plt.title('Heatmap for MEAN variables', fontdict = {'fontname': 'Arial', 'color': 'black', 'fontsize': 20, 'fontweight': 'bold'})
plt.xticks(fontsize = 10) 
plt.yticks(fontsize = 10)
plt.show()

## **Data Preprocessing**

### **Encoding**

**_Encoding performed using Label Encoder to convert labels into integer._**

In [None]:
label_encoder = preprocessing.LabelEncoder() 
data['label'] = label_encoder.fit_transform(data['label'])

In [None]:
x = data.drop(['label', 'filename'],axis = 1) 
y = data['label']

### **Scaling**

**_Scaling is performed using MinMax Scaler to make the model more stable and train faster._**

In [None]:
cols = x.columns 
minmax = preprocessing.MinMaxScaler() 
np_scaled = minmax.fit_transform(x) 

x = pd.DataFrame(np_scaled, columns = cols)

## **Model Training**

#### **_Classifiers:_**
* **_K-Nearest Neighbor Classifier_**
* **_Random Forest Classifier_**
* **_CatBoost Classifier_**
* **_XGBoost Classifier_**

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 111) 
x_train.shape, x_test.shape, y_train.shape, y_test.shape

### **Accuracy Calculation**

In [None]:
knn = KNeighborsClassifier(n_neighbors = 3)
rf = RandomForestClassifier(n_estimators = 1000, max_depth = 10, random_state = 0) 
cbc = cb.CatBoostClassifier(verbose = 0, eval_metric = 'Accuracy', loss_function = 'MultiClass') 
xgb = XGBClassifier(n_estimators = 1000, learning_rate = 0.05) 
  
for clf in (knn, rf, cbc, xgb):
    clf.fit(x_train, y_train) 
    preds = clf.predict(x_test) 
    print(clf.__class__.__name__,accuracy_score(y_test, preds))

**_Based on the aforementioned accuracy score, we may conclude that the CatBoost Classifier is the most accurate classifier. Ensemble Learning techniques perform better than the Supervised Learning techniques._**

### **Neural Networks**

**_Evaluation using simple neural network._**

In [None]:
model = Sequential() 

model.add(Flatten(input_shape=(58,))) 
model.add(Dense(256, activation='relu')) 
model.add(BatchNormalization()) 
model.add(Dense(128, activation='relu')) 
model.add(Dropout(0.3)) 
model.add(Dense(10, activation='softmax')) 
model.summary()

In [None]:
adam = keras.optimizers.Adam(learning_rate = 1e-4)
model.compile(optimizer = adam, 
              loss = "sparse_categorical_crossentropy", 
              metrics = ["accuracy"])

hist = model.fit(x_train, y_train, 
                 validation_data =  (x_test, y_test), 
                 epochs = 100, 
                 batch_size = 32)

In [None]:
test_error, test_accuracy = model.evaluate(x_test, y_test, verbose = 1) 
print(f"Test Accuracy: {test_accuracy}")

In [None]:
fig, axs = plt.subplots(2, figsize = (10, 10)) 

# Accuracy
axs[0].plot(hist.history["accuracy"], label = "Train") 
axs[0].plot(hist.history["val_accuracy"], label = "Test")	 
axs[0].set_ylabel("Accuracy", fontdict = {'fontname': 'Arial', 'color': 'black', 'fontsize': 10, 'fontweight': 'bold'})
axs[0].legend() 
axs[0].set_title("Accuracy", fontdict = {'fontname': 'Arial', 'color': 'black', 'fontsize': 15, 'fontweight': 'bold'}) 
	
# Error
axs[1].plot(hist.history["loss"], label = "Train") 
axs[1].plot(hist.history["val_loss"], label = "Test")	 
axs[1].set_ylabel("Error", fontdict = {'fontname': 'Arial', 'color': 'black', 'fontsize': 10, 'fontweight': 'bold'}) 
axs[1].legend() 
axs[1].set_title("Error", fontdict = {'fontname': 'Arial', 'color': 'black', 'fontsize': 15, 'fontweight': 'bold'}) 
	
plt.show()

**_With an accuracy rate of over 80%, ensemble learning and neural networks have been shown to be the most effective method for classifying genres._**

## **Conclusion**

**_To sum up, this research offers insightful information about the potential of several machine learning algorithms for the classification of musical genres. The results show how different machine learning approaches can be applied to classify music tracks into distinct genres. The research offers a thorough assessment of model performance in the context of music genre categorisation by utilising a variety of classifiers, including K-Nearest Neighbours, Random Forest, CatBoost, and XGBoost, in addition to basic neural networks._**