# Music Genre Classification Model
Done by Low Zhe Kai and Marc Chern as part of the Mini-Project for the SC1015 module in NTU. <hr>
# Introduction and Goals of this model
In music, there exists a limitless number of genres, from Heavy Metal to Reggae to Jazz. Through the years, new genres and sub-genres emerge, and oftentimes the lines between these genres are in the sand. <br><br>
For a long time, experts have been trying to quantify this difference in sound, and what differentiates one genre from another. However, given the subjective quality of music, it is hard to quantify such. <br><br>
Utilising the following dataset, we aim to undergo an in depth exploratory analysis of the sounds of different genres. We will extract useful features, visualise, classify, and ultimately understand the differences in the genres. Do follow through this well-documented Jupyter Notebook to understand our thought process and insights derived.

# Part 0: Importing the necessary libraries

In [1]:
import pandas as pd
import librosa as lr
import numpy as np
import matplotlib.pyplot as plt
import math
import sklearn
import seaborn as sns
from IPython.display import Audio
from IPython.core.display import display
import os
from glob import glob
import scipy.io.wavfile as siw
from scipy.signal import stft

  from IPython.core.display import display


<hr>

# Part 1: Importing the dataset
### About the dataset
<a href="https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification">Dataset Used: GTZAN Dataset - Music Genre Classification</a> <br><br>
This data set contains:
- 1000 .wav files: A collection of 10 genres with 100 audio files each, all having a length of 30 seconds.
- 999 .png files: A visual representation of each audio file in the form of Mel Spectrograms
- 2 .csv files: Containing features of the audio files

### Exploring the CSV

In [None]:
pd.set_option('display.max_columns', None)
df_30s = pd.read_csv('Data/features_30_sec.csv')

In [None]:
print(np.shape(df_30s))
df_30s.head()

In [None]:
df_30s.describe()

### Exploring the audio files

In [11]:
genre_list = list(os.listdir("Data/genres_original/")) #genre_list contains a list of the music genres in the dataset

audioDir = [] #audioDir is a list of tuples holding the genre and address and the audio files
for genre in genre_list:
    dir = glob(f"Data/genres_original/{genre}/*.wav")
    for address in dir:
        audioDir.append((genre, address))
audioDir.sort()
print(audioDir)

[('blues', 'Data/genres_original/blues/blues.00000.wav'), ('blues', 'Data/genres_original/blues/blues.00001.wav'), ('blues', 'Data/genres_original/blues/blues.00002.wav'), ('blues', 'Data/genres_original/blues/blues.00003.wav'), ('blues', 'Data/genres_original/blues/blues.00004.wav'), ('blues', 'Data/genres_original/blues/blues.00005.wav'), ('blues', 'Data/genres_original/blues/blues.00006.wav'), ('blues', 'Data/genres_original/blues/blues.00007.wav'), ('blues', 'Data/genres_original/blues/blues.00008.wav'), ('blues', 'Data/genres_original/blues/blues.00009.wav'), ('blues', 'Data/genres_original/blues/blues.00010.wav'), ('blues', 'Data/genres_original/blues/blues.00011.wav'), ('blues', 'Data/genres_original/blues/blues.00012.wav'), ('blues', 'Data/genres_original/blues/blues.00013.wav'), ('blues', 'Data/genres_original/blues/blues.00014.wav'), ('blues', 'Data/genres_original/blues/blues.00015.wav'), ('blues', 'Data/genres_original/blues/blues.00016.wav'), ('blues', 'Data/genres_origina

In [None]:
fig, ax = plt.subplots(nrows = 10, figsize=(20,35))
for i in range(0, 1000, 100): # display one song of every genre
    y, sr = lr.load(audioDir[i][1])
    display(Audio(audioDir[i][1]))
    print(audioDir[i])
    y, _ = lr.effects.trim(y) # trims leading and trailing silence from an audio signal
    time_array = np.arange(0, len(y)) / sr
    print(np.shape(y))
    print("Sampling rate (KHz):", sr)
    print("Audio file length:", "%.2f" % (len(y)/sr) + "s")
    ax[i//100].plot(time_array, y, lw=0.3)
    ax[i//100].set_title(audioDir[i][0])

<hr>

# Part 2: Exploratory data analysis

### What is the Fourier Transform
The Fourier Transform (FT) is a mathematical tool that decomposes a signal into its frequency components. It takes a time-domain signal, such as a sound wave or a voltage signal, and transforms it into a frequency-domain representation, where the amplitude and phase of each component are shown. <br><br>
However, one main disadvantage of the FT is the inherent compromise betwen frequency and time resolution. The FT assumes that the signal analysed is stationary, meaning its statistical properties (mean and variance) do not change over time. This is not the case for many real-world signals (music, speech etc.).
### How does the Short Time Fourier Transform overcome this
The Short-Time Fourier Transform (STFT) is a modified version of the FT that is designed to analyse the frequency content of a signal over time. It breaks the signal into small segments by applying a sliding window to the signal, and applies the FT to each segment. Doing so provides a time-frequency representation of the signal, which is useful for analysing signals that change over time.

In [None]:
# Helper function to plot spectrogram
def plot_spectrogram(Y, sr, hop_length, y_axis="log"):
    plt.figure(figsize=(20,6))
    if (y_axis == 'log'):
        plt.title("Power Spectrogram")
    elif (y_axis == 'mel'):
        plt.title("Mel Spectrogram")
    lr.display.specshow(Y, sr=sr, hop_length=hop_length, x_axis="time", y_axis=y_axis)
    plt.colorbar(format="%+2.f dB")

### STFT on the audio file and the resultant spectrogram

In [None]:
y, sr = lr.load(audioDir[0][1])
audio_stft = lr.core.stft(y, hop_length=512, n_fft=2048)
fig, ax = plt.subplots(figsize=(20,4))
time_array = np.arange(0, len(y)) / sr
ax.plot(time_array, y, lw=0.3)
plt.show()
audio_stft = np.abs(audio_stft)
audio_stft = lr.amplitude_to_db(audio_stft, ref=np.max)
plot_spectrogram(audio_stft, sr, 512)

### Using the Mel scale
The mel scale (named after the word melody) is a perceptual scale of pithces judged by listeners to be equal in distance from one another. <br><br>
It utilises the formula: mels = 2595 * log10(1 + (f_Hz / 700)).

In [None]:
y, sr = lr.load(audioDir[0][1])
tempo, _ = lr.beat.beat_track(y=y, sr=sr)
print("Tempo:", tempo)
display(Audio(audioDir[0][1]))
mel_audio = lr.feature.melspectrogram(y=y, sr=sr, hop_length=1024, n_fft=2048)
mel_audio = np.abs(mel_audio)
mel_audio = lr.amplitude_to_db(mel_audio, ref=np.max)
plot_spectrogram(mel_audio, sr, 1024, y_axis='mel')

# Add documentation / explanation

In [None]:
d = 'Data/CMajScale.mp3'
y, sr = lr.load(d)
display(Audio(d))
mel_audio = lr.feature.melspectrogram(y=y, sr=sr, hop_length=1024, n_fft=2048)
mel_audio = np.abs(mel_audio)
mel_audio = lr.amplitude_to_db(mel_audio, ref=np.max)
plot_spectrogram(mel_audio, sr, 1024, y_axis='mel')


# Boxplot of BPM across genres

In [None]:
""" 
1. Get the spectrogram of all 1000 things
2. Find an appropriate model for image classification (Find and experiment)
3. Feed data in and test results
4. Free A
"""
pd.read_csv('Data/features_30_sec.csv')

In [None]:
# Increase or decrease hop_length to change how granular you want your data to be
hop_length = 2000

audio_file, sr = lr.load('Data/CMajScale.mp3')
# Chromogram
chromagram = lr.feature.chroma_stft(y=audio_file, sr=sr, hop_length=hop_length)
print('Chromogram shape:', chromagram.shape)

plt.figure(figsize=(16, 6))
lr.display.specshow(chromagram, x_axis='time', y_axis='chroma', hop_length=hop_length, cmap='coolwarm')

# Can try extract variance, mean??, mode, median

In [None]:
# Add wavelet coefficients

# Spectral Centroids

In [None]:

audio_file, sr = lr.load('Data/genres_original/blues/blues.00000.wav')
# Calculate the Spectral Centroids
spectral_centroids = lr.feature.spectral_centroid(y=audio_file, sr=sr)[0]



# Shape is a vector
print('Shape of Spectral Centroids:', spectral_centroids.shape, '\n')
print('Centroids mean:', np.mean(spectral_centroids),'\n')
print('Centroids variance:', np.var(spectral_centroids),'\n')
print('Centroids:', spectral_centroids, '\n')

# Computing the time variable for visualization
frames = range(len(spectral_centroids))

# Converts frame counts to time (seconds)
t = lr.frames_to_time(frames)

print('frames:', frames, '\n')
print('t:', t)

# Function that normalizes the Sound Data

def normalize(x, axis=0):
    return sklearn.preprocessing.minmax_scale(x, axis=axis)


# Spectral Rolloff

In [None]:


# Spectral RollOff Vector
spectral_rolloff = lr.feature.spectral_rolloff(y=audio_file, sr=sr)[0]

print("Spectral Rolloff mean: ", np.mean(spectral_rolloff))
print("Spectral Rolloff variance: ", np.var(spectral_rolloff))

# The plot
plt.figure(figsize = (16, 6))
plt.plot(t, normalize(spectral_rolloff), color='#FFB100')
lr.display.waveshow(audio_file, sr=sr, alpha=0.4, color = '#A300F9')



# MFCC

In [None]:
mfccs = lr.feature.mfcc(y=audio_file,sr=sr)
print('MFCC shape: ', mfccs.shape)
for i in range(len(mfccs)):
    print('MFCC',i+1,'mean: ', mfccs[i].mean())
    print('MFCC',i+1,'variance: ', mfccs[i].var())
mfccs = sklearn.preprocessing.scale(mfccs, axis=1)
# Perform Feature Scaling
plt.figure(figsize = (16, 6))
lr.display.specshow(mfccs, sr=sr, x_axis='time', cmap = 'cool')

In [14]:
#Putting it all together

data = []
start = 0
end = len(audioDir)
for i in range(start,end): # display one song of every genre
    record = []
    filename = audioDir[i][1].split('/')[-1]
    genre = audioDir[i][1].split('/')[-2]
    record.append(filename)
    record.append(genre)
    y, sr = lr.load(audioDir[i][1])
    y, _ = lr.effects.trim(y) # trims leading and trailing silence from an audio signal
    spectral_centroids = lr.feature.spectral_centroid(y=y,sr=sr)[0]
    record.append(np.mean(spectral_centroids))
    record.append(np.var(spectral_centroids))
    spectral_rolloff = lr.feature.spectral_rolloff(y=y, sr=sr)[0]
    record.append(np.mean(spectral_rolloff))
    record.append(np.var(spectral_rolloff))
    mfccs = lr.feature.mfcc(y=y,sr=sr)
    for j in range(20):
        record.append(np.mean(mfccs[j]))
        record.append(np.var(mfccs[j]))
    data.append(record)
df = pd.DataFrame(data=data, columns=['Filename','Genre','Spec_Centroid_Mean','Spec_Centroid_Var',
                                    'Spec_Rolloff_Mean','Spec_Rolloff_var',
                                    'MFCC1_Mean','MFCC1_Var','MFCC2_Mean','MFCC2_Var','MFCC3_Mean','MFCC3_Var','MFCC4_Mean','MFCC4_Var','MFCC5_Mean','MFCC5_Var','MFCC6_Mean','MFCC6_Var','MFCC7_Mean','MFCC7_Var','MFCC8_Mean','MFCC8_Var','MFCC9_Mean','MFCC9_Var','MFCC10_Mean','MFCC10_Var',
                                    'MFCC11_Mean','MFCC11_Var','MFCC12_Mean','MFCC12_Var','MFCC13_Mean','MFCC13_Var','MFCC14_Mean','MFCC14_Var','MFCC15_Mean','MFCC15_Var','MFCC16_Mean','MFCC16_Var','MFCC17_Mean','MFCC17_Var','MFCC18_Mean','MFCC18_Var','MFCC19_Mean','MFCC19_Var','MFCC20_Mean','MFCC20_Var'])
df.head()


Unnamed: 0,Filename,Genre,Spec_Centroid_Mean,Spec_Centroid_Var,Spec_Rolloff_Mean,Spec_Rolloff_var,MFCC1_Mean,MFCC1_Var,MFCC2_Mean,MFCC2_Var,...,MFCC16_Mean,MFCC16_Var,MFCC17_Mean,MFCC17_Var,MFCC18_Mean,MFCC18_Var,MFCC19_Mean,MFCC19_Var,MFCC20_Mean,MFCC20_Var
0,blues.00000.wav,blues,1784.122641,129745.484539,3805.72303,901252.9,-113.598824,2569.369385,121.570671,295.847107,...,0.751706,52.424534,-1.687854,36.535866,-0.40873,41.603168,-2.302677,55.053654,1.222467,46.941349
1,blues.00001.wav,blues,1530.261767,375915.508522,3550.713616,2978311.0,-207.523834,7769.104492,123.985138,559.913391,...,0.929294,55.337963,-0.728403,60.231407,0.296872,48.133213,-0.28243,51.106014,0.530645,45.7887
2,blues.00002.wav,blues,1552.832481,156471.011012,3042.410115,784130.9,-90.757164,3317.886963,140.440872,508.85614,...,2.448305,40.641678,-7.72484,47.629646,-1.819024,52.393597,-3.440458,46.643398,-2.238127,30.653151
3,blues.00003.wav,blues,1070.153418,184366.009385,2184.879029,1493078.0,-199.575134,5508.266602,150.086105,456.309082,...,0.773994,44.432896,-3.324069,50.218452,0.636311,37.325726,-0.615968,37.257774,-3.405046,31.965254
4,blues.00004.wav,blues,1835.128513,343249.495747,3579.957471,1572336.0,-160.354172,5199.103516,126.20948,853.27124,...,-4.515863,85.995201,-5.451786,75.276741,-0.915952,53.633236,-4.408018,62.882492,-11.704386,55.190254


In [13]:
df.describe()

Unnamed: 0,Spec_Centroid_Mean,Spec_Centroid_Var,Spec_Rolloff_Mean,Spec_Rolloff_var,MFCC1_Mean,MFCC1_Var,MFCC2_Mean,MFCC2_Var,MFCC3_Mean,MFCC3_Var,...,MFCC16_Mean,MFCC16_Var,MFCC17_Mean,MFCC17_Var,MFCC18_Mean,MFCC18_Var,MFCC19_Mean,MFCC19_Var,MFCC20_Mean,MFCC20_Var
count,999.0,999.0,999.0,999.0,999.0,999.0,999.0,999.0,999.0,999.0,...,999.0,999.0,999.0,999.0,999.0,999.0,999.0,999.0,999.0,999.0
mean,2202.598387,469294.6,4573.289295,1841687.0,-144.486847,3733.131836,99.539177,706.134644,-8.929626,468.363831,...,1.150557,60.736191,-3.966004,62.626392,0.509648,63.704361,-2.329462,66.230988,-1.095519,70.085648
std,716.110283,400630.5,1575.093587,1423638.0,100.298195,2718.427734,31.326109,438.844208,21.699053,287.120758,...,4.581091,33.797421,4.551957,33.484776,3.871117,34.404305,3.757688,37.182396,3.839315,45.246468
min,570.349904,7996.725,749.740169,14951.35,-552.15863,175.288483,-1.471578,93.098694,-89.865089,35.535614,...,-15.69388,9.200758,-17.237364,13.870571,-11.981186,15.401896,-18.505384,13.417371,-19.928354,7.871336
25%,1626.527014,184463.9,3380.209117,772649.5,-200.912971,1848.784119,76.776798,397.96843,-24.217862,270.534164,...,-1.86345,40.324924,-7.207529,40.807056,-2.011057,41.857904,-4.67242,41.704102,-3.372875,42.296545
50%,2215.267219,338143.2,4663.012639,1474650.0,-120.147491,3135.946045,98.435829,607.153809,-10.736386,405.279449,...,1.218398,52.412281,-4.065216,54.734497,0.672431,54.718277,-2.392914,57.473801,-1.16373,59.11227
75%,2691.695237,610114.5,5534.116472,2547859.0,-73.818314,4971.186523,119.790993,882.556549,5.540233,596.32309,...,4.370312,71.494682,-0.844253,75.047356,3.120285,75.4431,0.154474,78.577366,1.307755,85.170296
max,4435.732059,3033959.0,8677.730976,8656689.0,42.09145,28254.175781,193.074463,4028.914062,56.64645,2924.110596,...,13.46546,393.005737,11.470961,405.928711,15.394898,332.782043,14.700517,393.373779,15.36167,506.306793
