# Music Genre Classification

This notebook contains the proposed solution sketch for classification of musical genres.

Pre-processing routines applied, feature extraction techniques and the models used for identification are enclosed in the notebook.

Authors:
Suryank Tiwari - MT19019
Rose Verma - MT19052

### Library Imports

In [1]:
import os
import csv
import librosa
import threading 
import numpy as np
import pandas as pd
from pydub import AudioSegment 
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_selection import f_classif
from sklearn.feature_selection import SelectKBest
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import VarianceThreshold



## Part 1: Dataset Processing

[GZTAN Dataset](http://marsyas.info/downloads/datasets.html) has been used for this project.

This dataset contains <b>10 genres</b> in total and each genre has a 100 tracks in it for a <b>total 1000 tracks.</b>

The tracks are all 22050Hz Mono 16-bit audio files in .wav format.

This congrous nature of the dataset makes it a suitable choice for proceeding with the problem, but <font color='red'>the small size of the dataset restricts proper learning.</font>

<b>This is resolved by creating non overlapping samples of shorter length from this dataset.</b> This increases the number of samples and decreases sample length for better representation while taking mean of features.

<b><i>path</i></b> contains the path to original GZTAN dataset, and <b><i>out_path</i></b> should contain the path to an existing directory where the new oversampled dataset will be generated

In [2]:
path = 'E:\IIITD\Semester 2\SML\Project\genres\\'
out_path = 'E:\IIITD\Semester 2\SML\Project\go\\'

GTZAN dataset has the following structure.

Main Dataset Directory: Genres

>Genres<br>
>>Blues<br>
>>Classical<br>
>>Country<br>
>>Disco<br>
>>Hiphop<br>
>>Jazz<br>
>>Metal<br>
>>Pop<br>
>>Reggae<br>
>>Rock<br>

The following snippet loads all the genres into <b><i>genre</i></b> list.

In [3]:
genres = []
for folder in os.listdir(path):
    if os.path.isdir(path+folder):
        genres.append(folder)
print('Genre:', genres)

Genre: ['blues', 'classical', 'country', 'disco', 'hiphop', 'jazz', 'metal', 'pop', 'reggae', 'rock']


The following snippet splits each 30 second long tracks to multiple <b><i>sub_sample_length</i></b> samples.

If <i> sub_sample_length = 5 </i> then each 30 second track is split into 6 non -overlapping sub tracks. Size of dataset becomes 6000 tracks in total with 600 tracks in each genre.

In [4]:
sub_sample_length = 5    # in seconds

'''
Don't alter the following parameters
'''
sample_length = 30      # Length of each track in GTZAN
total_samples = 1000    # Number of tracks

The following snippet creates the oversampled dataset:

In [5]:
for g in genres:
    genrepath = path+'\\'+g
    if not os.path.isdir(genrepath):
        print('\n', g, 'folder created')
        os.mkdir(out_path+g)     
        for filename in os.listdir(genrepath):
            print('.', end='')
            songname = genrepath+'\\'+filename
            audio = AudioSegment.from_wav(songname)
            n = len(audio) 
            partition = 1

            interval = sub_sample_length * 1000  # n*1000 miliseconds
            start = 0
            end = interval

            i=0
            start = 0
            end = interval 
            while i<sample_length*total_samples:
                chunk = audio[start:end] 

                fln = filename.split('.')[1][-2:]+'_'+str(partition)+'.wav'          
                chunk.export(out_path+g+'\\'+fln, format ="wav") 

                partition += 1
                i+=interval

                start = end
                end = start + interval  
                if end >= n: 
                    end = n 
print('Oversampled Dataset Processing Complete')

Oversampled Dataset Processing Complete


Oversampled dataset has been created at <i>out_path</i>. 
This dataset will now be used to extract features

## Part 2: Feature Extraction

The following features have been explored using the <b>Librosa</b> library:
* mfcc
* chroma_stft
* chroma_cens
* chroma_cq
* melspec
* flatness
* tempogram
* poly_features order 0
* poly_features order 1
* poly_features order 2
* spec_cent
* spectral_contrast
* spec_bw
* rmse
* rolloff
* zcr
* tonnetz


For each feature listed above, we take the <b>mean</b>, <b>variance</b> and <b>standard deviation</b> of the feature value and add it to a CSV as our feature extraction policy.

This section of the code can be slow since a lot of features are calculated, the code has been <font color='red'><b>multi-threaded</b></font> for <b>one thread per genre</b> to compute the features faster. Once computed, we can refer the CSV generated to access our features at a go.

<b> Give the path to feature CSV below. If CSV has been generated once before, it will simply be loaded and won't be computed again. </b>

In [6]:
csv_file = 'Features\songdata5.csv'

Processing features with mean, variance and standard deviation.

In [7]:
def process_features(features):
    res = ''
    for feature in features:
        res+=str(np.mean(feature))+' '+str(np.var(feature))+' '+str(np.std(feature))+' '
    return res

Each thread runs concurrently on the following function. Snippet for genre feature computation:

In [8]:
def genrepution(g, lock):
    genrepath = out_path+'\\'+g
    print(g)
    for filename in os.listdir(genrepath):
        print('.', end='')
        songname = genrepath+'\\'+filename
        y, sr = librosa.load(songname, mono=True)
        
        chroma_stft = librosa.feature.chroma_stft(y=y, sr=sr)
        chroma_cens = librosa.feature.chroma_cens(y=y, sr=sr)
        chroma_cq = librosa.feature.chroma_cqt(y=y, sr=sr)
        melspec = librosa.feature.melspectrogram(y=y, sr=sr)
        flatness = librosa.feature.spectral_flatness(y=y)
        hop_length=512
        oenv = librosa.onset.onset_strength(y=y, sr=sr, hop_length=hop_length)
        tempogram = librosa.feature.tempogram(onset_envelope=oenv, sr=sr,hop_length=hop_length)
        S = np.abs(librosa.stft(y))
        p0 = librosa.feature.poly_features(S=S, order=0)
        p1 = librosa.feature.poly_features(S=S, order=1)
        p2 = librosa.feature.poly_features(S=S, order=2)
        spec_cent = librosa.feature.spectral_centroid(y=y, sr=sr)
        spectral_contrast = librosa.feature.spectral_contrast(y=y, sr=sr)
        spec_bw = librosa.feature.spectral_bandwidth(y=y, sr=sr)
        mfcc = librosa.feature.mfcc(y=y, sr=sr)
        rmse = librosa.feature.rms(y=y)
        rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr)
        zcr = librosa.feature.zero_crossing_rate(y)
        y = librosa.effects.harmonic(y)
        tonnetz = librosa.feature.tonnetz(y=y, sr=sr)

        to_append = filename+' '
        features = [chroma_stft, chroma_cens, chroma_cq, melspec, flatness, tempogram, p0, p1, p2, spec_cent, spectral_contrast, spec_bw, rmse, rolloff, zcr, tonnetz]
        to_append += process_features(features)
        for e in mfcc:
            to_append += str(np.mean(e))+' '+str(np.var(e))+' '+str(np.std(e))+' '
        to_append += g
        lock.acquire()
        file = open(csv_file, 'a', newline='')
        with file:
            writer = csv.writer(file)
            writer.writerow(to_append.split())
        lock.release()
    print(g, 'finished')

This is the driver function that fills the dataframe with features using threads. The features are either computed or loaded from CSV.

In [9]:
def fill():
    if not os.path.isfile(csv_file):
        
        # Inserting column names into the CSV
    
        features = ['chroma_stft', 'chroma_cens', 'chroma_cq', 'melspec', 'flatness', 'tempogram', 'p0', 'p1', 'p2', 'spec_cent', 'spectral_contrast', 'spec_bw', 'rmse', 'rolloff', 'zcr', 'tonnetz']
        col_names=[]
        
        total = len(features)*3+20*3+2
        for i in range(total):
            col_names.append(str(i))
    
        i=1
        for f in features:
            col_names[i]=f+'_mean'
            col_names[i+1]=f+'_var'
            col_names[i+2]=f+'_std'
            i+=3
        i=len(features)*3+1
        j=0
        while i+3<total:
            col_names[i]='mfcc_'+str(j)+'_mean'
            col_names[i+1]='mfcc_'+str(j)+'_var'
            col_names[i+2]='mfcc_'+str(j)+'_std'
            i+=3
            j+=1
        col_names[0] = 'filename'
        col_names[-1] = 'label'
        file = open(csv_file, 'a', newline='')
        with file:
            writer = csv.writer(file)
            writer.writerow(col_names)
            
        # Starting Feature Extraction Process
        threads = []
        lock = threading.Lock() 
        for g in genres:
            t = threading.Thread(target=genrepution, args=(g, lock,)) 
            t.start()
            threads.append(t)
        for t in threads:
            t.join()
            
    # Loading the feature data file
    df = pd.read_csv(csv_file)
    #df = df.sample(frac=1).reset_index(drop=True)   #Shuffling the dataset improves performance for RF
    return df

## Part 3: Classification

This section deals with classification and feature selection.

In [10]:
data = fill()
print(data.shape)

(6000, 110)


### Feature Selection

Apply three techniques for feature selection: 
Feature Set Size = 110
1. Variance Thresholding with threshold value 0.1: Feature Set Size = 84
2. Remove Correlated Features : Feature Set Size = 56
3. Apply F-ANOVA test using fclassif score function in Select KBest Algorithm : Feature Set Size = 40


In [11]:
label_encoder = LabelEncoder()
Y = label_encoder.fit_transform(data['label'])
X = data.drop(columns=["filename", "label"])

p_train, p_test, q_train, q_test = train_test_split(X, Y, shuffle=True, test_size=0.3, random_state=42)

train = p_train.copy()
#train['label'] = y_train 

test = p_test.copy()
#test['label'] = y_test

print(train.shape, test.shape)

zero_filter = VarianceThreshold(threshold=0.1)
zero_filter.fit(train)

features_left = train.columns[zero_filter.get_support()]
print("Non Constant Features:",len(features_left))
train_filtered = pd.DataFrame(zero_filter.transform(train))
test_filtered = pd.DataFrame(zero_filter.transform(test))

train_filtered.columns = features_left
test_filtered.columns = features_left

(4200, 108) (1800, 108)
Non Constant Features: 84


Remove Correlated Features and Apply F-ANOVA test

In [12]:
train = train_filtered.copy()
train['label'] = q_train 

test = test_filtered.copy()
test['label'] = q_test

corr_matrix = train.corr().abs()
upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool))
to_drop = [column for column in upper.columns if any(upper[column] > 0.95)]

train = train.drop(data[to_drop], axis=1)
test= test.drop(data[to_drop], axis=1)

print("Features after removal of correlated features: ",len(train.columns))

train_labels = train['label']
test_labels = test['label']
train = train.drop(columns= ['label'])
test = test.drop(columns= ['label'])
fvalue_selector = SelectKBest(f_classif, k=40).fit(train, train_labels)
remaining_features = train.columns[fvalue_selector.get_support()]
print("Final Feature Set: ",remaining_features)
x_train = pd.DataFrame(fvalue_selector.transform(train))
x_test = pd.DataFrame(fvalue_selector.transform(test))

x_train.columns = remaining_features
x_test.columns = remaining_features

y_train = train_labels
#x_train = Train_.drop(columns=['label'])

y_test = test_labels
#x_test = Test_.drop(columns=['label'])

full_dataset= pd.concat([x_train, x_test], sort=False)
print(len(full_dataset.columns))
full_labels = pd.concat([y_train, y_test], sort=False)

Features after removal of correlated features:  56
Final Feature Set:  Index(['melspec_mean', 'melspec_var', 'melspec_std', 'p0_mean', 'p0_var',
       'p0_std', 'p1_var', 'spec_cent_mean', 'spec_cent_var',
       'spectral_contrast_mean', 'spectral_contrast_var', 'spec_bw_mean',
       'spec_bw_var', 'rolloff_var', 'mfcc_0_mean', 'mfcc_0_std',
       'mfcc_1_mean', 'mfcc_1_var', 'mfcc_2_mean', 'mfcc_2_var', 'mfcc_3_mean',
       'mfcc_3_var', 'mfcc_4_mean', 'mfcc_4_var', 'mfcc_5_mean', 'mfcc_5_var',
       'mfcc_6_mean', 'mfcc_6_var', 'mfcc_7_mean', 'mfcc_7_var', 'mfcc_8_mean',
       'mfcc_8_var', 'mfcc_9_mean', 'mfcc_9_var', 'mfcc_10_mean',
       'mfcc_11_mean', 'mfcc_12_mean', 'mfcc_13_mean', 'mfcc_14_mean',
       'mfcc_16_mean'],
      dtype='object')
40


In [13]:
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from lightgbm import LGBMClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier
from xgboost import XGBClassifier
from sklearn.svm import SVC
from sklearn import svm
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score
from sklearn.experimental import enable_hist_gradient_boosting
from sklearn.ensemble import HistGradientBoostingClassifier
import seaborn as sns

def draw_conf_mat(cm):
    labels = label_encoder.inverse_transform(full_labels.unique())
    sns.heatmap(cm, xticklabels=labels, yticklabels=labels, cmap="RdYlGn", annot = True)
    

  import pandas.util.testing as tm


<b>clfs</b> is a list of classifiers to be applied to the problem. Train test split accuracy and cross validation score for 5 folds is computed and displayed.

In [14]:
from sklearn.metrics import confusion_matrix
from sklearn import metrics
from sklearn.metrics import precision_recall_fscore_support

num_trees = 4000
max_features='sqrt'
max_depth=12
criterion='entropy'
seed      = 9

clfs = [#XGBClassifier(), 
        #GaussianNB(), 
        #BernoulliNB(), 
        #LGBMClassifier(objective='multiclass', random_state=9),
        #LogisticRegression(max_iter=1000),  
        #AdaBoostClassifier(n_estimators=1000),
        #svm.SVC(decision_function_shape='ovo') ,
        #svm.LinearSVC(),
        #GradientBoostingClassifier(n_estimators=1000, random_state=0),
        HistGradientBoostingClassifier(l2_regularization=0.1, learning_rate=0.1,
                               loss='auto', max_bins=255, max_depth=12,
                               max_iter=1000, max_leaf_nodes=31,
                               min_samples_leaf=20, n_iter_no_change=None,
                               random_state=42, scoring=None, tol=1e-07,
                               validation_fraction=0.1, verbose=0,
                               warm_start=False),
        #BaggingClassifier(XGBClassifier(), max_samples=0.5), 
        RandomForestClassifier(n_estimators=num_trees, random_state=seed, criterion=criterion, max_features=max_features, max_depth=max_depth)
        ]

for clf in clfs:
    clf.fit(x_train, y_train)
    prediction = clf.predict(x_test)
    
    print("\n",clf)
    print('Testing Accuracy :%.3f' % accuracy_score(prediction, y_test))
    cm = confusion_matrix(y_test, prediction)
    draw_conf_mat(cm)
    scores = cross_val_score(clf, full_dataset,full_labels , cv=3)
    print('Cross Validation Accuracy :%.3f' % np.mean(scores))


KeyboardInterrupt: 