# Music Genera Classification With the GTZAN dataset
CS345 Fall 2024 Project   
Wade McCaulley  
Jacob Ingraham  

The topic we wish to address with this project is Music Genre Classification. Music Genre Classification has applications ranging from improved music recommendations to audio tagging for large audio libraries, which can help create more personalized music choices on platforms like Spotify. Since machine learning models specialize in extracting information from highly dimensional data, we believe that music data presents a strong opportunity. Since a song’s genre is one of the most defining aspects of a song, this classification is very well suited for machine learning. A genre is determined by many aspects such as the tempo, rhythm, instrumentation, and overall tone. 

This topic is interesting because music is an expression of emotion and a work of art. Many people prefer certain music genres because of how the songs impact their emotions. The idea of transforming songs into a collection of data points and accurately classifying them into genres is intriguing. This topic will also be challenging because audio data is very complex. Our models will have to find patterns hidden within the noise and complexity.

By comparing many different machine learning approaches we hope to learn more about this topic. The models we are comparing are K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Random Forest, and Convolutional Neural Network (CNN).

In [None]:
# All necessary imports for the following code
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from sklearn.model_selection import GridSearchCV
import time
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
from sklearn.neighbors import KNeighborsClassifier
from sklearn import svm
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorboard.plugins.hparams import api as hp
from sklearn.dummy import DummyClassifier
from sklearn.metrics import confusion_matrix, f1_score, precision_score, recall_score

In [None]:
# Place these files in the appropriate directory: Data
# Otherwise, change these filepaths
# The files can be downloaded from https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification/data
features_30_seconds_filepath = "../Data/features_30_sec.csv"
features_3_seconds_filepath = "../Data/features_3_sec.csv"
mel_spectrograms_filepath = "../Data/images_original"

genres = ["blues", "classical" , "country", "disco", "hiphop", "jazz", "metal", "pop", "reggae", "rock"]

### Datasets

The dataset consists of 10 genres, each containing 100 audio files, with each file having a duration of 30 seconds. This dataset is often referred to as "The MNIST of sounds," drawing a comparison to the well-known MNIST database of handwritten digits frequently used in CS345.

We did not manually extract features from the audio files. Instead, we utilized pre-processed data.

Music Generes:
* Blues
* Classical
* Country
* Disco
* Hiphop
* Jazz
* Metal
* Pop
* Reggae
* Rock

### Images

Images are visual representations of each audio file in the form of Mel Spectrograms. A Mel Spectrogram converts audio signals into a visual format that emphasizes frequency and amplitude over the duration of the 30-second audio file. This representation aligns with human perception of sound, as the frequency axis is transformed into the Mel scale. The Mel scale captures frequencies in a way that reflects how humans perceive pitch, allowing the audio to be digitally represented as a waveform. This visual format can then be utilized for further analysis or processing

### Reading data
The following cells read and format the data. They also split the data into train/validate/test sets to be used when choosing hyperparameters and evaluating results.

In [None]:
# Loads the CSVs. Features are everything but the first col(filename), and the lables. The lables are the last column
def loadCSVs(filepath):
    data = pd.read_csv(filepath, dtype = object, delimiter = ',').values
    X = data[:,2:-1]
    y = data[:,-1:]
    return X, y

In [None]:
# This will turn the genere lables into np.array of ints
def lable_to_int(lables, genres):
    lable_int = np.array(lables)
    for i in range(len(genres)):
        lable_int[lable_int==genres[i]]=i
    return lable_int 


In [None]:
# Loads the mel spectrograms into a np array of images. Each image is 288, 432 pixels, and each pixel is represented by four values
def load_mel_spectrograms():
    image_features = []
    image_lables = []
    for genre in genres:
        print("Loading", genre)
        images_file_path = mel_spectrograms_filepath + "/" + genre
        png_files = [f for f in os.listdir(images_file_path) if f.endswith('.png')]

        for file in png_files:
            file_path = images_file_path +"/"+ file
            image = plt.imread(file_path)  # Load the image
            image_features.append(image)
            image_lables.append(genre)

    return np.array(image_features), np.array(image_lables)

In [None]:
# Read from data file as string, then convert to a usable datatype (float)
string_X_30sec, y_30sec = loadCSVs(features_30_seconds_filepath)
X_30sec = string_X_30sec.astype(np.float64)
string_X_3sec, y_3sec = loadCSVs(features_3_seconds_filepath)
X_3sec = string_X_3sec.astype(np.float64)

# reshape y to a 1d array
y_30sec = y_30sec.ravel()
y_3sec = y_3sec.ravel()

print(X_30sec.shape, y_30sec.shape)
print(X_3sec.shape, y_3sec.shape)


In [None]:
# Read the spectrogram data
X_images, y_images = load_mel_spectrograms()
y_images = lable_to_int(y_images,genres)
X_images.shape, y_images.shape

In [None]:
# Split spectrogram data into train/validate/test - fulltrain will be used in the last cell to train best classifiers
#    with train and validate sets
X_images_fulltrain, X_images_test, y_images_fulltrain, y_images_test = train_test_split(X_images, y_images, test_size=0.1, shuffle=True, random_state=7)
X_images_train, X_images_val, y_images_train, y_images_val = train_test_split(X_images_fulltrain, y_images_fulltrain, test_size=0.2, shuffle=True, random_state=7)

In [None]:
# Convert label vectors to usable datatype (int)
y_30sec_int = lable_to_int(y_30sec, genres)
y_3sec_int = lable_to_int(y_3sec, genres)
y_images_int = lable_to_int(y_images, genres)
y_30sec_int.shape, y_3sec_int.shape, y_images_int.shape

In [None]:
# Create normalized and standardized versions of data
X_30sec_norm = (X_30sec-np.min(X_30sec, axis=0))/(np.max(X_30sec,axis=0)-np.min(X_30sec,axis=0))
X_3sec_norm = (X_3sec-np.min(X_3sec, axis=0))/(np.max(X_3sec,axis=0)-np.min(X_3sec,axis=0))
X_30sec_std = (X_30sec-np.mean(X_30sec, axis=0))/(np.std(X_30sec, axis=0))
X_3sec_std = (X_3sec-np.mean(X_3sec, axis=0))/(np.std(X_3sec, axis=0))

In [None]:
# Check normalized and standardized data
print(np.max(X_30sec_norm)==1,np.min(X_30sec_norm)==0)
print(np.max(X_3sec_norm)==1,np.min(X_3sec_norm)==0)
print(np.mean(X_30sec_std), np.std(X_30sec_std))
print(np.mean(X_3sec_std), np.std(X_3sec_std))

In [None]:
# Split csv data into train/validate/test - fulltrain will be used in the last cell to train best classifiers
#    with train and validate sets
X_30sec_fulltrain, X_30sec_test, y_30sec_fulltrain, y_30sec_test = train_test_split(X_30sec, y_30sec, test_size=0.1, shuffle=True, random_state=7)
X_30sec_train, X_30sec_val, y_30sec_train, y_30sec_val = train_test_split(X_30sec_fulltrain, y_30sec_fulltrain, test_size=0.2, shuffle=True, random_state=7)

X_30sec_norm_fulltrain, X_30sec_norm_test = train_test_split(X_30sec_norm, test_size=0.1, shuffle=True, random_state=7)
X_30sec_norm_train, X_30sec_norm_val = train_test_split(X_30sec_norm_fulltrain, test_size=0.2, shuffle=True, random_state=7)

X_30sec_std_fulltrain, X_30sec_std_test = train_test_split(X_30sec_std, test_size=0.1, shuffle=True, random_state=7)
X_30sec_std_train, X_30sec_std_val = train_test_split(X_30sec_std_fulltrain, test_size=0.2, shuffle=True, random_state=7)

X_3sec_fulltrain, X_3sec_test, y_3sec_fulltrain, y_3sec_test = train_test_split(X_3sec, y_3sec, test_size=0.1, shuffle=True, random_state=7)
X_3sec_train, X_3sec_val, y_3sec_train, y_3sec_val = train_test_split(X_3sec_fulltrain, y_3sec_fulltrain, test_size=0.2, shuffle=True, random_state=7)

X_3sec_norm_fulltrain, X_3sec_norm_test = train_test_split(X_3sec_norm, test_size=0.1, shuffle=True, random_state=7)
X_3sec_norm_train, X_3sec_norm_val = train_test_split(X_3sec_norm_fulltrain, test_size=0.2, shuffle=True, random_state=7)

X_3sec_std_fulltrain, X_3sec_std_test = train_test_split(X_3sec_std, test_size=0.1, shuffle=True, random_state=7)
X_3sec_std_train, X_3sec_std_val = train_test_split(X_3sec_std_fulltrain, test_size=0.2, shuffle=True, random_state=7)

## Model Selection

For Model Selection we used stratified K-Fold cross-validation with shuffling to ensure that each fold is not composed of consecutive samples. This approach helps maintain approximately the same number of samples from each genre in each fold, preventing some classes from being underrepresented.

We used accuracy_score as the metric for cross-validation. When accuracy_score is selected, it evaluates the model's performance based on its accuracy.

To automate the search for the best hyperparameters, we utilized GridSearchCV. This step was extremely slow, as we aimed to set hyperparameters for each model in a consistent manner.

The runtime for k-fold cross-validation is proportional to (O(n \times k)), where (n) is the sample size and (k) is the number of folds. We selected 5 folds, which reduces computational complexity but increases runtime. There is a risk of overfitting if one split performs significantly better than others. Fewer folds may result in less reliable estimates.

The results were outputted in a DataFrame for easy analysis. We chose the hyperparameters for the dataset with the best accuracy among all datasets.

GridSearchCV and cross-validation are computationally intensive because they systematically test multiple parameter combinations and validate performance across data subsets. During this process, the model is trained multiple times—once for each fold of cross-validation for each parameter combination.

A high-dimensional parameter grid increases the number of combinations exponentially. For example, if you have 5 hyperparameters A and 5 hyperparameters B, this requires 25 (5x5) different combinations. Adding a third hyperparameter C increases this to 125 (5x5x5) combinations.

In [None]:
# Define gridsearch function which will be used by knn, svm, random forest to choose hyperparameters
def girdSearchClassifier(model, features, labels, paramgrid):
    start_time = time.time()
    cv = StratifiedKFold(n_splits=5, random_state=0, shuffle=True)
    classifier = GridSearchCV(model, paramgrid)
    classifier.fit(features, labels)
    accuracies = cross_val_score(classifier.best_estimator_, features, labels, cv=cv, 
                           scoring='accuracy')
    accuracy = np.mean(accuracies)
    run_time = time.time() - start_time


    return classifier.best_estimator_, accuracy, run_time

In [None]:
features_datasets_30sec = [
    (X_30sec, y_30sec, "Features 30 sec"),
    (X_30sec_norm, y_30sec, "Features 30 sec norm"),
    (X_30sec_std, y_30sec, "Features 30 sec std")
]

features_datasets_3sec = [
    (X_3sec, y_3sec, "Features 3 sec"),
    (X_3sec_norm, y_3sec, "Features 3 sec norm"),
    (X_3sec_std, y_3sec, "Features 3 sec std")
]

features_datasets = [
    (X_30sec, y_30sec, "Features 30 sec"),
    (X_30sec_norm, y_30sec, "Features 30 sec norm"),
    (X_30sec_std, y_30sec, "Features 30 sec std"),
    (X_3sec, y_3sec, "Features 3 sec"),
    (X_3sec_norm, y_3sec, "Features 3 sec norm"),
    (X_3sec_std, y_3sec, "Features 3 sec std")
]

## Nearest Neighbor 
The Nearest Neighbor classifier operates by identifying the example in the training dataset whose features most closely match those of the data point that needs to be classified. It then assigns the label of this closest example to the new data point

#### Nearest Neighbor Hyperparameters

The number of neighbors, denoted as n_neighbors, is the most critical hyperparameter in the Nearest Neighbor classifier. It determines how many neighbors are considered when making a prediction.

The performance of the nearest neighbor classifier can be enhanced by basing the classification on multiple neighbors. For this testing, we are using the k-Nearest Neighbors (k-NN) classifier, which compares the (k) nearest neighbors to make a decision.


#### Nearest Neighbor Running Time
The heares neighbor classifier has a running time of $O(N * d)$ where n is the number of training examples and d is the number of dimensions in the dataset

In [None]:
# Define function for choosing KNN hyperparameters on various datasets
def testKNN(features, labels, paramgrid, valFeatures, valLabels):

    model = KNeighborsClassifier()
    best_estimator, accuracy, run_time = girdSearchClassifier(model, features, labels, paramgrid)
    y_pred = best_estimator.predict(valFeatures)
    val_accuracy = np.mean(y_pred == valLabels)
    
    return best_estimator.get_params()['n_neighbors'], accuracy, val_accuracy, run_time
    

In [None]:
# Create grid and choose hyperparameters for each dataset
knn_param_grid = {
    'n_neighbors': [1,2,4,8,16,32,64,128,256,512]
}

print("knn run csv 30sec")
knn_csv30sec_best_estimator, knn_csv30sec_accuracy, knn_csv30sec_val_accuracy, knn_csv30sec_time = testKNN(X_30sec_train, y_30sec_train, knn_param_grid, X_30sec_val, y_30sec_val)

print("knn run csv 30sec norm")
knn_csv30sec_norm_best_estimator, knn_csv30sec_norm_accuracy, knn_csv30sec_norm_val_accuracy, knn_csv30sec_norm_time = testKNN(X_30sec_norm_train, y_30sec_train, knn_param_grid, X_30sec_norm_val, y_30sec_val)

print("knn run csv 30sec std")
knn_csv30sec_std_best_estimator, knn_csv30sec_std_accuracy, knn_csv30sec_std_val_accuracy, knn_csv30sec_std_time = testKNN(X_30sec_std_train, y_30sec_train, knn_param_grid, X_30sec_std_val, y_30sec_val)

print("knn run csv 3sec")
knn_csv3sec_best_estimator, knn_csv3sec_accuracy, knn_csv3sec_val_accuracy, knn_csv3sec_time = testKNN(X_3sec_train, y_3sec_train, knn_param_grid, X_3sec_val, y_3sec_val)

print("knn run csv 3sec norm")
knn_csv3sec_norm_best_estimator, knn_csv3sec_norm_accuracy, knn_csv3sec_norm_val_accuracy, knn_csv3sec_norm_time = testKNN(X_3sec_norm_train, y_3sec_train, knn_param_grid, X_3sec_norm_val, y_3sec_val)

print("knn run csv 3sec std")
knn_csv3sec_std_best_estimator, knn_csv3sec_std_accuracy, knn_csv3sec_std_val_accuracy, knn_csv3sec_std_time = testKNN(X_3sec_std_train, y_3sec_train, knn_param_grid, X_3sec_std_val, y_3sec_val) 

In [None]:
# Display hyperparameter choice results
knndf = pd.DataFrame({
    "Dataset": ["Features 30 sec", "Features 30 sec norm", "Features 30 sec std", "Features 3 sec", "Features 3 sec norm", "Features 3 sec std"],
    "Best n_neighbors": [
        knn_csv30sec_best_estimator,
        knn_csv30sec_norm_best_estimator,
        knn_csv30sec_std_best_estimator,
        knn_csv3sec_best_estimator,
        knn_csv3sec_norm_best_estimator,
        knn_csv3sec_std_best_estimator
    ],
    "Accuracy": [
        knn_csv30sec_accuracy,
        knn_csv30sec_norm_accuracy,
        knn_csv30sec_std_accuracy,
        knn_csv3sec_accuracy,
        knn_csv3sec_norm_accuracy,
        knn_csv3sec_std_accuracy
    ],
    "Validation Set Accuracy": [
        knn_csv30sec_val_accuracy,
        knn_csv30sec_norm_val_accuracy,
        knn_csv30sec_std_val_accuracy,
        knn_csv3sec_val_accuracy,
        knn_csv3sec_norm_val_accuracy,
        knn_csv3sec_std_val_accuracy
    ],
    "Run Time": [
        knn_csv30sec_time,
        knn_csv30sec_norm_time,
        knn_csv30sec_std_time,
        knn_csv3sec_time,
        knn_csv3sec_norm_time,
        knn_csv3sec_std_time
    ]
})
knndf

In [None]:
# Calculate mean accuracy and mean runtime for all runs
knn_hp_avg_accuracy = knndf["Accuracy"].mean()
knn_avg_runtime = knndf["Run Time"].mean()

knn_n_hyperparameters = 10

The results indicate that using 1 neighbor in the k-Nearest Neighbors (k-NN) classifier yielded the best performance, achieving the highest accuracy. A smaller k can lead to a model that is sensitive to noise in the data and may overfit, while a larger k can smooth out the decision boundary too much, potentially missing important patterns which may result in under fitting. Since the validation set accuracy is even higher than that of the test set, it is safe to assume that the model is not overfitting. 

The performance of k-NN improved when it was run on normalized and standardized data. This may be due to the algorithm's reliance on distance calculations. Scaling ensures that all features contribute equally to these calculations, preventing any single feature from disproportionately influencing the results. This leads to more accurate identification of nearest neighbors and better overall model performance.


## SVM

The SVM or Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks. It is typically used for binary classification problems, where the goal is to separate data points into two distinct classes. In SVM, the algorithm finds the optimal hyperplane that separates the data points of different classes. In a two-dimensional space, this hyperplane is a line, while in higher dimensions, it becomes a plane or a hyperplane

For this experiment we ran two different SVM models. The first was a svm with a linear kernal and tested for the optimal C paramater. We also tested the rbf kernel and looked for the optimal c and gamma parameters. 

- C Parameter: The C parameter is a regularization parameter that controls the trade-off between achieving a low training error and a low testing error. A large value of C aims to classify all training examples correctly, which might lead to overfitting. Conversely, a small value of C allows for a larger margin, potentially at the cost of some misclassifications, which may result in under fitting.
- Gamma Parameter: In the RBF kernel, the gamma parameter defines the influence of a single training example. A low gamma value means a large influence (far-reaching), resulting in a smoother decision boundary. A high gamma value means a small influence (close), leading to a more complex decision boundary that can adapt to the training data but may result in over fitting.

Linear Kernel: In a linear kernel the  decision boundary is a hyperplane in the feature space. This means it separates data points from different classes in a linear fashion. They are most effective in data that is seperated by a straight line in 2d data or a plan in higher dimensions. 

RBF Kernel: In an RBF Kernel the decision boundary allows for more complex decision boundaries. These can be curves, circles, or more intricate shape. The RBF Kernel is also known as a Gaussian kernel

In [None]:
# Define function for choosing SVM hyperparameters on various datasets
def testSVM(features, labels, paramgrid, valFeatures, valLabels):
    X_standard_scaler = StandardScaler().fit(features)
    features = X_standard_scaler.transform(features)
    model = svm.SVC()
    best_estimator, accuracy, run_time = girdSearchClassifier(model, features, labels, paramgrid)
    y_pred = best_estimator.predict(valFeatures)
    val_accuracy = np.mean(y_pred == valLabels)
    
    return best_estimator.get_params()['C'], best_estimator.get_params()['gamma'], accuracy, val_accuracy, run_time

### Features 3 seconds to large a dataset. Quadratic run time for larger data sets. 




The computational complexity of SVM is between $On^2$ and $O(n^3)$ for training. This became computational expensive for this experiment. When combined with the kfold cross vaildation and gridsearch cv this became extreemly slow for the 10,000 example dataset with 3 seconds songs. Becuase of this we only examined the 1000 example 30 second data set. Below is an example of the exponential increase in training time based on the number of examples. 

In [None]:
# Evaluate training time for SVM using data subsets of varying size
sizes = [100, 250, 500, 1000, 2500, 5000, 9000]

linear_runtimes = []
rbf_runtimes = []

print("testing SVM run time on large datasets")
X_train, X_test, y_train, y_test = train_test_split(X_3sec, y_3sec, test_size = 0.01, random_state = 1)

X_standard_scaler = StandardScaler().fit(X_train)
X_train = X_standard_scaler.transform(X_train)
for s in sizes:
    start_time = time.time()
    classifier = svm.SVC(C = 10, kernel='linear')
    classifier.fit(X_train[:s], y_train[:s])
    run_time = time.time() - start_time
    linear_runtimes.append(run_time)

for s in sizes:
    start_time = time.time()
    classifier = svm.SVC(C = 10, gamma = .1, kernel='rbf')
    classifier.fit(X_train[:s], y_train[:s])
    run_time = time.time() - start_time
    rbf_runtimes.append(run_time)
print(linear_runtimes)
print(rbf_runtimes)
    
np_linear_runtimes = np.array(linear_runtimes)
np_rbf_runtimes = np.array(rbf_runtimes)

plt.figure(figsize=(10, 6))
plt.scatter(sizes, np_linear_runtimes, label = 'linear SVM')
plt.scatter(sizes, np_rbf_runtimes, label = 'rbf SVM')
plt.xlabel('training set size')
plt.ylabel('time to run grid search')
plt.legend()
plt.show()

Gammas Tested: Small gamma means each datapoint has a small influence, and a large gamma means each datapoint has a large influence. if gamma='scale' (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma. If auto is used it uses if ‘auto’, uses 1 / n_features for the value of gamma.   

C: Used high C values to attempt to classify each training example which may lead to overfitting. We also tested low c values which allows for some miscalssifications but reduces the risk of overfitting. 

In [None]:
''' 
    THIS CELL HAS A LONG RUNTIME - several minutes
    Choosing hyperparameter for a 3 sec dataset takes upward of 5 minutes, and there are 6 such calls in this cell
    The output of this cell usually shows the best accuracy with an rbf kernel using C=100 and gamma='scale'
'''
# Create grid and choose hyperparameters for each dataset
gammas = [.001, .01, 1, 10, 100, 'auto', 'scale']
Cs = [ .01, .1, 1, 10, 100]

rbf_svm_param_grid = [
  {'C': Cs, 
   'gamma': gammas, 
   'kernel': ['rbf']},
 ]
linear_svm_param_grid = [
  {'C': Cs, 
   'kernel': ['linear']},
 ]

print("run csv 30sec")
rbf_svm_csv30sec_best_estimator_c, rbf_svm_csv30sec_best_estimator_gamma, rbf_svm_csv30sec_accuracy, rbf_svm_csv30sec_val_accuracy, rbf_svm_csv30sec_time = testSVM(
    X_30sec_train, y_30sec_train, rbf_svm_param_grid, X_30sec_val, y_30sec_val)

print("run csv 30sec norm")
rbf_svm_csv30sec_norm_best_estimator_c, rbf_svm_csv30sec_norm_best_estimator_gamma, rbf_svm_csv30sec_norm_accuracy, rbf_svm_csv30sec_norm_val_accuracy, rbf_svm_csv30sec_norm_time = testSVM(
    X_30sec_norm_train, y_30sec_train, rbf_svm_param_grid, X_30sec_norm_train, y_30sec_train)

print("run csv 30sec std")
rbf_svm_csv30sec_std_best_estimator_c, rbf_svm_csv30sec_std_best_estimator_gamma, rbf_svm_csv30sec_std_accuracy, rbf_svm_csv30sec_std_val_accuracy, rbf_svm_csv30sec_std_time = testSVM(
    X_30sec_std_train, y_30sec_train, rbf_svm_param_grid, X_30sec_std_val, y_30sec_val)

print("run csv 3sec")
rbf_svm_csv3sec_best_estimator_c, rbf_svm_csv3sec_best_estimator_gamma, rbf_svm_csv3sec_accuracy, rbf_svm_csv3sec_val_accuracy, rbf_svm_csv3sec_time = testSVM(
    X_3sec_train, y_3sec_train, rbf_svm_param_grid, X_3sec_val, y_3sec_val)

print("run csv 3sec norm")
rbf_svm_csv3sec_norm_best_estimator_c, rbf_svm_csv3sec_norm_best_estimator_gamma, rbf_svm_csv3sec_norm_accuracy, rbf_svm_csv3sec_norm_val_accuracy, rbf_svm_csv3sec_norm_time = testSVM(
    X_3sec_norm_train, y_3sec_train, rbf_svm_param_grid, X_3sec_norm_val, y_3sec_val)

print("run csv 3sec std")
rbf_svm_csv3sec_std_best_estimator_c, rbf_svm_csv3sec_std_best_estimator_gamma, rbf_svm_csv3sec_std_accuracy, rbf_svm_csv3sec_std_val_accuracy, rbf_svm_csv3sec_std_time = testSVM(
    X_3sec_std_train, y_3sec_train, rbf_svm_param_grid, X_3sec_std_val, y_3sec_val)

print("run csv 30sec linear")
svm_csv30sec_best_estimator_c, svm_csv30sec_best_estimator_gamma, svm_csv30sec_accuracy, svm_csv30sec_val_accuracy, svm_csv30sec_time = testSVM(
    X_30sec_train, y_30sec_train, linear_svm_param_grid, X_30sec_val, y_30sec_val)

print("run csv 30sec norm linear")
svm_csv30sec_norm_best_estimator_c, svm_csv30sec_norm_best_estimator_gamma, svm_csv30sec_norm_accuracy, svm_csv30sec_norm_val_accuracy, svm_csv30sec_norm_time = testSVM(
    X_30sec_norm_train, y_30sec_train, linear_svm_param_grid, X_30sec_norm_val, y_30sec_val)

print("run csv 30sec std linear")
svm_csv30sec_std_best_estimator_c, svm_csv30sec_std_best_estimator_gamma, svm_csv30sec_std_accuracy, svm_csv30sec_std_val_accuracy, svm_csv30sec_std_time = testSVM(
    X_30sec_std_train, y_30sec_train, linear_svm_param_grid, X_30sec_std_val, y_30sec_val)

print("run csv 3sec linear")
svm_csv3sec_best_estimator_c, svm_csv3sec_best_estimator_gamma, svm_csv3sec_accuracy, svm_csv3sec_val_accuracy, svm_csv3sec_time = testSVM(
    X_3sec_train, y_3sec_train, linear_svm_param_grid, X_3sec_val, y_3sec_val)

print("run csv 3sec norm linear")
svm_csv3sec_norm_best_estimator_c, svm_csv3sec_norm_best_estimator_gamma, svm_csv3sec_norm_accuracy, svm_csv3sec_norm_val_accuracy, svm_csv3sec_norm_time = testSVM(
    X_3sec_norm_train, y_3sec_train, linear_svm_param_grid, X_3sec_norm_val, y_3sec_val)

print("run csv 3sec std linear")
svm_csv3sec_std_best_estimator_c, svm_csv3sec_std_best_estimator_gamma, svm_csv3sec_std_accuracy, svm_csv3sec_std_val_accuracy, svm_csv3sec_std_time = testSVM(
    X_3sec_std_train, y_3sec_train, linear_svm_param_grid, X_3sec_std_val, y_3sec_val)

In [None]:
# Display hyperparameter choice results
svmdf = pd.DataFrame({
    "Dataset": ["Features 30 sec", "Features 30 sec norm", "Features 30 sec std", "Features 3 sec", "Features 3 sec norm", "Features 3 sec std", "Features 30 sec", "Features 30 sec norm", "Features 30 sec std", "Features 3 sec", "Features 3 sec norm", "Features 3 sec std",],
    "Kernel": ["rbf", "rbf", "rbf", "rbf", "rbf", "rbf", "linear", "linear", "linear", "linear", "linear", "linear"],
    "Best C": [
        rbf_svm_csv30sec_best_estimator_c,
        rbf_svm_csv30sec_norm_best_estimator_c,
        rbf_svm_csv30sec_std_best_estimator_c,
        rbf_svm_csv3sec_best_estimator_c,
        rbf_svm_csv3sec_norm_best_estimator_c,
        rbf_svm_csv3sec_std_best_estimator_c,
        svm_csv30sec_best_estimator_c,
        svm_csv30sec_norm_best_estimator_c,
        svm_csv30sec_std_best_estimator_c
        svm_csv3sec_best_estimator_c,
        svm_csv3sec_norm_best_estimator_c,
        svm_csv3sec_std_best_estimator_c
    ],
    "Best gamma": [
        rbf_svm_csv30sec_best_estimator_gamma,
        rbf_svm_csv30sec_norm_best_estimator_gamma,
        rbf_svm_csv30sec_std_best_estimator_gamma,
        rbf_svm_csv3sec_best_estimator_gamma,
        rbf_svm_csv3sec_norm_best_estimator_gamma,
        rbf_svm_csv3sec_std_best_estimator_gamma,
        'na',
        'na',
        'na',
        'na',
        'na',
        'na'
    ],
    "Accuracy": [
        rbf_svm_csv30sec_accuracy,
        rbf_svm_csv30sec_norm_accuracy,
        rbf_svm_csv30sec_std_accuracy,
        rbf_svm_csv3sec_accuracy,
        rbf_svm_csv3sec_norm_accuracy,
        rbf_svm_csv3sec_std_accuracy,
        svm_csv30sec_accuracy,
        svm_csv30sec_norm_accuracy,
        svm_csv30sec_std_accuracy
        svm_csv3sec_accuracy,
        svm_csv3sec_norm_accuracy,
        svm_csv3sec_std_accuracy
    ],
    "Run Time": [
        rbf_svm_csv30sec_time,
        rbf_svm_csv30sec_norm_time,
        rbf_svm_csv30sec_std_time,
        rbf_svm_csv3sec_time,
        rbf_svm_csv3sec_norm_time,
        rbf_svm_csv3sec_std_time,
        svm_csv30sec_time,
        svm_csv30sec_norm_time,
        svm_csv30sec_std_time
        svm_csv3sec_time,
        svm_csv3sec_norm_time,
        svm_csv3sec_std_time
    ]
})
svmdf

In [None]:
# Calculate mean accuracy and mean runtime for all runs
svm_hp_avg_accuracy = svmdf["Accuracy"].mean()
svm_hp_avg_run_time = svmdf["Run Time"].mean()

svm_n_hyperparameters_c = 5
svm_n_hyperparameters_gamma = 7

The choice of (C = 10) and (\gamma = 0.01) for your SVM model with the RBF kernel provided the best results by effectively balancing the complexity of the decision boundary with the need to generalize well to new data. It is impoortant to note that for this dataset we only used the 30second features and not 3 second due to the exmponential increases when using the 3 second set. 

The accuracy was the same acros all datasets when using the same kerenl. This was due to the use of the SandardScaler which alwayse standardizes features by removing the mean and scaling to unit variance. It transforms the data so that it has a mean of 0 and a standard deviation of 1. This is particularly useful when the features have different units or variances

## Random Forest 

A Random Forest Classifier is a supervised machine learning algorithm that is used for both classification and regression tasks. For this experiment we used it as a classifier.  It operates by constructing a multitude of decision trees during training and outputting the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random Forest is known for its high accuracy and robustness, especially in classification tasks. It reduces the risk of overfitting by averaging the results of multiple trees, which lowers the variance and prediction error.

While Random Forest is generally efficient, it can be computationally intensive with a large number of trees, which may slow down predictions. The algorithm can require significant memory, especially with large datasets and many trees

In [None]:
# Define function for choosing SVM hyperparameters on various datasets
def testRandomForest(features, labels, paramgrid, valFeatures, valLabels):
    
    model = RandomForestClassifier()
    best_estimator, accuracy, run_time = girdSearchClassifier(model, features, labels, paramgrid)
    y_pred = best_estimator.predict(valFeatures)
    val_accuracy = np.mean(y_pred == valLabels)
    
    return best_estimator.get_params()['n_estimators'], best_estimator.get_params()['max_depth'], best_estimator.get_params()['bootstrap'], accuracy, val_accuracy, run_time

#### Random Forest Parameters 

n-estimators: specifies the number of decision trees in the random forest. increasing the number of trees improves the model's performance because it reduces variance and helps the model generalize better. However, after a certain point, adding more trees yields diminishing returns in terms of accuracy and increases computational cost

max_depth: determines the maximum depth of each tree in the forest A deeper tree can capture more information about the data, but it can also lead to overfitting if the depth is too high. Conversely, a shallow tree might underfit the data. A max depth of n allows all the trees to grow until all leaves are pur or until they contain fewer than min_samples_split. We used default min sample split of 2. This was primirly to keep the complexity of the hyper parameter selection to a reasonalble speed for this experiment.

bootstrap: Determines weahter whether bootstrap samples are used when building trees. When bootstrapping the next classifier tree will attempt to focus on training examples that were miss classified. It can ensures diversity among trees by using random subsets of data, which helps in reducing overfitting.

In [None]:
# Create grid and choose hyperparameters for each dataset
random_forest_param_grid = {
    'n_estimators': [1, 10, 50, 100, 200],
    'max_depth': [10, 20, 30, None],
    'bootstrap': [True, False],
    'n_jobs': [-1]
    
}

print("rf run csv 30sec")
rf_csv30sec_best_estimator_nfeatures, rf_csv30sec_best_estimator_maxdepth, rf_csv30sec_best_estimator_bootstrap, rf_csv30sec_accuracy, rf_csv30sec_val_accuracy, rf_csv30sec_time = testRandomForest(
    X_30sec_train, y_30sec_train, random_forest_param_grid, X_30sec_val, y_30sec_val)

print("rf run csv 30sec norm")
rf_csv30sec_norm_best_estimator_nfeatures, rf_csv30sec_norm_best_estimator_maxdepth, rf_csv30sec_norm_best_estimator_bootstrap, rf_csv30sec_norm_accuracy, rf_csv30sec_norm_val_accuracy, rf_csv30sec_norm_time = testRandomForest(
    X_30sec_norm_train, y_30sec_train, random_forest_param_grid, X_30sec_norm_val, y_30sec_val)

print("rf run csv 30sec std")
rf_csv30sec_std_best_estimator_nfeatures, rf_csv30sec_std_best_estimator_maxdepth, rf_csv30sec_std_best_estimator_bootstrap, rf_csv30sec_std_accuracy, rf_csv30sec_std_val_accuracy, rf_csv30sec_std_time = testRandomForest(
    X_30sec_std_train, y_30sec_train, random_forest_param_grid, X_30sec_std_val, y_30sec_val)

print("rf run csv 3sec ")
rf_csv3sec_best_estimator_nfeatures, rf_csv3sec_best_estimator_maxdepth, rf_csv3sec_best_estimator_bootstrap, rf_csv3sec_accuracy, rf_csv3sec_val_accuracy, rf_csv3sec_time = testRandomForest(
    X_3sec_train, y_3sec_train, random_forest_param_grid, X_3sec_val, y_3sec_val)

print("rf run csv 3sec norm ")
rf_csv3sec_norm_best_estimator_nfeatures, rf_csv3sec_norm_best_estimator_maxdepth, rf_csv3sec_norm_best_estimator_bootstrap, rf_csv3sec_norm_accuracy, rf_csv3sec_norm_val_accuracy, rf_csv3sec_norm_time = testRandomForest(
    X_3sec_norm_train, y_3sec_train, random_forest_param_grid, X_3sec_norm_val, y_3sec_val)

print("rf run csv 3sec std")
rf_csv3sec_std_best_estimator_nfeatures, rf_csv3sec_std_best_estimator_maxdepth, rf_csv3sec_std_best_estimator_bootstrap, rf_csv3sec_std_accuracy, rf_csv3sec_std_val_accuracy, rf_csv3sec_std_time = testRandomForest(
    X_3sec_std_train, y_3sec_train, random_forest_param_grid, X_3sec_std_val, y_3sec_val)

In [None]:
# Display hyperparameter choice results
rfdf = pd.DataFrame({
    "Dataset": ["Features 30 sec", "Features 30 sec norm", "Features 30 sec std", "Features 3 sec", "Features 3 sec norm", "Features 3 sec std"],
    "Best n estimators": [
        rf_csv30sec_best_estimator_nfeatures,
        rf_csv30sec_norm_best_estimator_nfeatures,
        rf_csv30sec_std_best_estimator_nfeatures,
        rf_csv3sec_best_estimator_nfeatures,
        rf_csv3sec_norm_best_estimator_nfeatures,
        rf_csv3sec_std_best_estimator_nfeatures
    ],
    "Best Max Depth": [
        rf_csv30sec_best_estimator_maxdepth,
        rf_csv30sec_norm_best_estimator_maxdepth,
        rf_csv30sec_std_best_estimator_maxdepth,
        rf_csv3sec_best_estimator_maxdepth ,
        rf_csv3sec_norm_best_estimator_maxdepth,
        rf_csv3sec_std_best_estimator_maxdepth
    ],
    "Bootstrap": [
        rf_csv30sec_best_estimator_bootstrap,
        rf_csv30sec_norm_best_estimator_bootstrap,
        rf_csv30sec_std_best_estimator_bootstrap,
        rf_csv3sec_best_estimator_bootstrap,
        rf_csv3sec_norm_best_estimator_bootstrap,
        rf_csv3sec_std_best_estimator_bootstrap
    ],
    "Accuracy": [
        rf_csv30sec_accuracy,
        rf_csv30sec_norm_accuracy,
        rf_csv30sec_std_accuracy,
        rf_csv3sec_accuracy,
        rf_csv3sec_norm_accuracy,
        rf_csv3sec_std_accuracy
    ],
    "Run Time": [
        rf_csv30sec_time,
        rf_csv30sec_norm_time,
        rf_csv30sec_std_time,
        rf_csv3sec_time,
        rf_csv3sec_norm_time,
        rf_csv3sec_std_time
    ]
})
rfdf

In [None]:
# Calculate mean accuracy and mean runtime for all runs
rfdf_hp_avg_accuracy = rfdf["Accuracy"].mean()
rfdf_hp_avg_run_time = rfdf["Run Time"].mean()

rf_n_hyperparameters_n_estimators = 5
rf_n_hyperparameters_max_depth = 4
rf_n_hyperparameters_bootstrap = 2

The results indicate that the best performance was boostrap = false, max_depth = none, and n_estimators  = none. When bootstrap=False, the model does not use bootstrapped samples (sampling with replacement) to train each tree. Instead, it uses the entire dataset for each tree. The max depth setting  can capture complex patterns in the data by allowing trees to grow fully. It is useful when the dataset is complex and requires deep trees to model intricate relationships. It can potentially increase the risk of overfitting. 

200 trees were used to keep the runtime down, exploring more trees could potentially improve performance further, especially if computational resources allow for it

## Convolutional Neural Network

In [None]:
'''Check TensorFlow setup'''
print("TensorFlow version:", tf.__version__)

In [None]:
''' 
    THIS CELL HAS A LONG RUNTIME - upwards of several minutes
    By using this command we determined that the model seems to work best when
    using kernel size 32 for layer 1, kernel size 16 for layer 2, and no layer 3.
    Moving forward we will use this CNN configuration to cross-validate and choose hyperparameters
'''
def choose_CNN(l1, l2, l3):
    model = models.Sequential()
    model.add(layers.Conv2D(l1, (3, 3), activation='relu', input_shape=(288, 432, 4)))
    if l2 != None:
        model.add(layers.MaxPooling2D((2, 2)))
        model.add(layers.Conv2D(l2, (3, 3), activation='relu'))
    if l3 != None:
        model.add(layers.MaxPooling2D((2, 2)))
        model.add(layers.Conv2D(l3, (3, 3), activation='relu'))
    model.add(layers.Flatten())
    model.add(layers.Dense(64, activation='relu'))
    model.add(layers.Dense(10))
    model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])
    return model

'''Train and test CNN'''
X_images_train_tens = tf.convert_to_tensor(X_images_train, dtype=float)
y_images_train_tens = tf.convert_to_tensor(y_images_train.astype(np.float32), dtype=float)
X_images_val_tens = tf.convert_to_tensor(X_images_val, dtype=float)
y_images_val_tens = tf.convert_to_tensor(y_images_val.astype(np.float32), dtype=float)
model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])
history = model.fit(X_images_train_tens, y_images_train_tens, epochs=1, validation_data=(X_images_val_tens, y_images_val_tens))

layer1 = [16,32,64]
layer2 = [None,16,32]
layer3 = [None,16,32]
allModels = []

for l1 in layer1:
    for l2 in layer2:
        for l3 in layer3:
            allModels.append(choose_CNN(l1,l2,l3))

for m in allModels:
    m.evaluate(X_images_val_tens,  y_images_val_tens, verbose=2)

cell_stop = time.time()

cnn_run_time = cell_stop-cell_start

In [None]:
''' 
    THIS CELL HAS A LONG RUNTIME - upwards of several minutes
    By using this command we determined that the best optimizer for nearly all amounts of epochs was adamax
    Adamax performed best at 10 epochs
'''

cnnstart = time.time()
def train_test_cnn(opt,epch):
    model = tf.keras.models.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu', input_shape=(288, 432, 4)),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(16, (3, 3), activation='relu'),
        layers.Flatten(),
        layers.Dense(64, activation='relu'),
        layers.Dense(10),
    ])
    model.compile(optimizer=opt,loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),metrics=['accuracy'])
    model.fit(X_images_train_tens, y_images_train_tens, epochs=epch)
    _, accuracy = model.evaluate(X_images_val_tens, y_images_val_tens)
    return accuracy

# Convert spectrograms to tensors
X_images_train_tens = tf.convert_to_tensor(X_images_train, dtype=float)
y_images_train_tens = tf.convert_to_tensor(y_images_train.astype(np.float32), dtype=float)
X_images_val_tens = tf.convert_to_tensor(X_images_val, dtype=float)
y_images_val_tens = tf.convert_to_tensor(y_images_val.astype(np.float32), dtype=float)

# Set up cross-validation and choose hyperparameters
OPTIMIZER = ['adam','adamax','ftrl','rmsprop','sgd']
EPOCHS = [5,10,20,30,40]

session_num = 0

outputs = []
cnntime = 0
cnnstart = time.time()
for optimizer in OPTIMIZER:
    for epoch in EPOCHS:
        tempAcc = train_test_cnn(optimizer, epoch)
        outputs.append('Trial Number: ' + str(session_num) + 
                    '\nOptimizer: ' + optimizer + 
                    '\nEpochs: ' + str(epoch) +
                    '\nAccuracy: ' + str(tempAcc))
        session_num += 1

cnnstop = time.time()
cnntime = cnnstop-cnnstart
for result in outputs:
    print(result)

## Classifier Compairsion 

#### Baseline Classifier the Random Classifier

The Random classifier is a baseline classification model that will predict the occuring class randomly, and unifomly in the dataset for all inputs, regardless of the features. If the other classifiers can not perform better than the random classifer they are underperforming and not estimating results. Naive classifiers give a minimum performance threshold. If your sophisticated model can't outperform a naive approach, it indicates that the model might be overfitting or underfitting. It also might indicate that features or the overall dataset might not be informative enough.

In [None]:
# Run a dummy classifier to generate a baseline for model performance
dummy_results = []

random_classifier = DummyClassifier(strategy="uniform", random_state=42)
for X, y, data_set in features_datasets:
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
    fit_start_time = time.time()
    random_classifier.fit(X_train, y_train)
    fit_stop_time = time.time()
    y_pred=random_classifier.predict(X_test)
    predict_stop_time = time.time()
    classifier_accuracy = np.mean(y_pred == y_test)
    fit_time = fit_stop_time - fit_start_time
    predict_time = predict_stop_time - fit_stop_time
    dummy_results.append([data_set, random_classifier.__class__.__name__, fit_time, predict_time, classifier_accuracy])

dummy_df = pd.DataFrame(dummy_results, columns=["Dataset", "Classifier", "Fit Time (s)", "Predict Time (s)", "Accuracy"])
dummy_df

The random classifier achieved an average accuracy of approximately 10% across all datasets. This outcome aligns with expectations, given the fact that the each lable has a uniform distribution that represent 10% of the samples from each class

## Metric Comparison

In [None]:
# Create all best models
knn_classifier = KNeighborsClassifier(n_neighbors=1)
svm_classifier = svm.SVC(C=100, gamma = 'scale', kernel="rbf")
rf_classifier = RandomForestClassifier(n_estimators = 200, max_depth = 20, bootstrap = False, n_jobs = -1)
cnn_classifier = tf.keras.models.Sequential([
                    layers.Conv2D(32, (3, 3), activation='relu', input_shape = (288, 432, 4)),
                    layers.MaxPooling2D((2, 2)),
                    layers.Conv2D(16, (3, 3), activation='relu'),
                    layers.Flatten(),
                    layers.Dense(64, activation='relu'),
                    layers.Dense(10),
                ])
cnn_classifier.compile(optimizer='adamax',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),metrics=['accuracy'])

# Compare runtimes during hyperparameter selection
knn_hypertimes = [knn_csv30sec_time,
                knn_csv30sec_norm_time,
                knn_csv30sec_std_time,
                knn_csv3sec_time,
                knn_csv3sec_norm_time,
                knn_csv3sec_std_time]
svm_hypertimes = [svm_csv30sec_time,
                       svm_csv30sec_norm_time,
                       svm_csv30sec_std_time,
                       svm_csv3sec_time,
                       svm_csv3sec_norm_time,
                       svm_csv3sec_std_time]
rbf_hypertimes = [rbf_svm_csv30sec_time,
                    rbf_svm_csv30sec_norm_time,
                    rbf_svm_csv30sec_std_time,
                    rbf_svm_csv3sec_time,
                    rbf_svm_csv3sec_norm_time,
                    rbf_svm_csv3sec_std_time,]
rf_hypertimes = [rf_csv30sec_time,
               rf_csv30sec_norm_time,
               rf_csv30sec_std_time,
               rf_csv3sec_time,
               rf_csv3sec_norm_time,
               rf_csv3sec_std_time]

# Compare train times
knn_traintimes = []
svm_traintimes = []
rf_traintimes = []
for i in [(X_30sec_fulltrain, y_30sec_fulltrain),(X_30sec_norm_fulltrain, y_30sec_fulltrain),(X_30sec_std_fulltrain, y_30sec_fulltrain),(X_3sec_fulltrain, y_3sec_fulltrain),(X_3sec_norm_fulltrain, y_3sec_fulltrain),(X_3sec_std_fulltrain, y_3sec_fulltrain)]:
    temp = %timeit -o knn_classifier.fit(i[0],i[1])
    knn_traintimes.append(int(temp.average * 1e9))
    temp = %timeit -o svm_classifier.fit(i[0],i[1])
    svm_traintimes.append(int(temp.average * 1e9))
    temp = %timeit -o rf_classifier.fit(i[0],i[1])
    rf_traintimes.append(int(temp.average * 1e9))
cnn_traintimes = []
X_images_fulltrain_tens = tf.convert_to_tensor(X_images_fulltrain, dtype=float)
y_images_fulltrain_tens = tf.convert_to_tensor(y_images_fulltrain.astype(np.float32), dtype=float)
X_images_test_tens = tf.convert_to_tensor(X_images_test, dtype=float)
y_images_test_tens = tf.convert_to_tensor(y_images_test.astype(np.float32), dtype=float)
temp = %timeit -r 1 -n 1 -o cnn_classifier.fit(X_images_fulltrain_tens, y_images_fulltrain_tens, epochs=10)
cnn_traintimes.append(int(temp.average))

# Print class names
print('\nClass names: ')
print(genres)

# Compare confusion matrices and other metrics
knn_classifier = KNeighborsClassifier(n_neighbors=1)
svm_classifier = svm.SVC(C=100, gamma = .001, kernel="rbf")
rf_classifier = RandomForestClassifier(n_estimators = 200, max_depth = 20, bootstrap = False, n_jobs = -1)

knn_classifier.fit(X_3sec_norm_fulltrain, y_3sec_fulltrain)
y_pred = knn_classifier.predict(X_3sec_norm_test)
knn_confusion = confusion_matrix(y_3sec_test, y_pred)
FP = knn_confusion.sum(axis=0) - np.diag(knn_confusion)
FN = knn_confusion.sum(axis=1) - np.diag(knn_confusion)
TP = np.diag(knn_confusion)
knn_precision = np.mean(TP/(TP+FP))
knn_recall = np.mean(TP/(TP+FN))
knn_f1 = (2*knn_precision*knn_recall)/(knn_precision+knn_recall)
knn_acc = np.mean(y_3sec_test == y_pred)
print('\nKNN Confusion Matrix')
print(knn_confusion)
print('KNN F1 Score - ' + str(knn_f1))
print('KNN Precision Score - ' + str(knn_precision))
print('KNN Recall Score - ' + str(knn_recall))
print('KNN Accuracy - ' + str(knn_acc))

svm_classifier.fit(X_3sec_std_fulltrain, y_3sec_fulltrain)
y_pred = svm_classifier.predict(X_3sec_std_test)
svm_confusion = confusion_matrix(y_3sec_test, y_pred)
FP = svm_confusion.sum(axis=0) - np.diag(svm_confusion)
FN = svm_confusion.sum(axis=1) - np.diag(svm_confusion)
TP = np.diag(svm_confusion)
svm_precision = np.mean(TP/(TP+FP))
svm_recall = np.mean(TP/(TP+FN))
svm_f1 = (2*svm_precision*svm_recall)/(svm_precision+svm_recall)
svm_acc = np.mean(y_3sec_test == y_pred)
print('\nSVM Confusion Matrix')
print(svm_confusion)
print('SVM F1 Score - ' + str(svm_f1))
print('SVM Precision Score - ' + str(svm_precision))
print('SVM Recall Score - ' + str(svm_recall))
print('SVM Accuracy - ' + str(svm_acc))

rf_classifier.fit(X_3sec_std_fulltrain, y_3sec_fulltrain)
y_pred = rf_classifier.predict(X_3sec_std_test)
rf_confusion = confusion_matrix(y_3sec_test, y_pred)
FP = rf_confusion.sum(axis=0) - np.diag(rf_confusion)
FN = rf_confusion.sum(axis=1) - np.diag(rf_confusion)
TP = np.diag(rf_confusion)
rf_precision = np.mean(TP/(TP+FP))
rf_recall = np.mean(TP/(TP+FN))
rf_f1 = (2*rf_precision*rf_recall)/(rf_precision+rf_recall)
rf_acc = np.mean(y_3sec_test == y_pred)
print('\nRandom Forest Confusion Matrix')
print(rf_confusion)
print('Random Forest F1 Score - ' + str(rf_f1))
print('Random Forest Precision Score - ' + str(rf_precision))
print('Random Forest Recall Score - ' + str(rf_recall))
print('Random Forest Accuracy - ' + str(rf_acc))

y_pred = cnn_classifier.predict(X_images_test)
y_pred_classes = np.array([])
for i in range(len(y_pred)):
    y_pred_classes = np.append(y_pred_classes, np.argmax(y_pred[i]))
cnn_confusion = tf.math.confusion_matrix(y_images_test.astype(int), y_pred_classes.astype(int)).numpy()
FP = np.sum(cnn_confusion.sum(axis=0) - np.diag(cnn_confusion))
FN = np.sum(cnn_confusion.sum(axis=1) - np.diag(cnn_confusion))
TP = np.sum(np.diag(cnn_confusion))
cnn_precision = np.mean(TP/(TP+FP))
cnn_recall = np.mean(TP/(TP+FN))
cnn_f1 = (2*cnn_precision*cnn_recall)/(cnn_precision+cnn_recall)
cnn_acc = np.mean(y_images_test.astype(int) == y_pred_classes.astype(int))
print('\nConvolutional Neural Network Confusion Matrix')
print(cnn_confusion)
print('Convolutional Neural Network F1 Score - ' + str(cnn_f1))
print('Convolutional Neural Network Precision Score - ' + str(cnn_precision))
print('Convolutional Neural Network Recall Score - ' + str(cnn_recall))
print('Convolutional Neural Network Accuracy - ' + str(cnn_acc))

# Plot results
bar1 = np.arange(len(knn_traintimes))
bar2 = [x + 0.2 for x in bar1]
bar3 = [x + 0.2 for x in bar2]
bar4 = [x + 0.2 for x in bar3]

fig = plt.subplots(figsize = (12, 8))
plt.bar(bar1, knn_hypertimes, color = 'r', width = 0.2, edgecolor = 'black', label = 'knn')
plt.bar(bar2, svm_hypertimes, color = 'b', width = 0.2, edgecolor = 'black', label = 'linear svm')
plt.bar(bar3, rbf_hypertimes, color = 'g', width = 0.2, edgecolor = 'black', label = 'rbf svm')
plt.bar(bar4, rf_hypertimes, color = 'm', width = 0.2, edgecolor = 'black', label = 'rf')
plt.title('Hyperparameter selection times', fontsize = 15)
plt.ylabel('Time to Run (seconds)', fontsize = 18)
plt.xlabel('Dataset', fontsize = 18)
plt.xticks([x + 0.3 for x in range(len(bar1))], ['X_30sec', 'X_30sec_norm', 'X_30sec_std', 'X_3sec', 'X_3sec_norm', 'X_3sec_std'])
plt.legend()
plt.show()

fig1 = plt.subplots(figsize = (12, 8))
plt.bar(bar1, knn_traintimes, color = 'r', width = 0.2, edgecolor = 'black', label = 'knn')
plt.bar(bar2, svm_traintimes, color = 'b', width = 0.2, edgecolor = 'black', label = 'svm')
plt.bar(bar3, rf_traintimes, color = 'g', width = 0.2, edgecolor = 'black', label = 'rf')
plt.title('Training times', fontsize = 15)
plt.ylabel('Time to Train (milliseconds)', fontsize = 18)
plt.xlabel('Dataset', fontsize = 18)
plt.xticks(bar2, ['X_30sec', 'X_30sec_norm', 'X_30sec_std', 'X_3sec', 'X_3sec_norm', 'X_3sec_std'])
plt.legend()
plt.show()

print('Convolutional Neural Network Train Time: ' + str(cnn_traintimes[0]) + ' (seconds)')

In [None]:
for X, y, data_set in features_datasets:
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
    for classifier in features_classifiers:
        fit_start_time = time.time()
        classifier.fit(X_train, y_train)
        fit_stop_time = time.time()
        y_pred=classifier.predict(X_test)
        predict_stop_time = time.time()
        classifier_accuracy = np.mean(y_pred == y_test)
        fit_time = fit_stop_time - fit_start_time
        predict_time = predict_stop_time - fit_stop_time
        features_results.append([data_set, classifier.__class__.__name__, fit_time, predict_time, classifier_accuracy])


In [None]:
name_mapping = {
    "RandomForestClassifier": "RF",
    "SupportVectorClassifier": "SVM",
    "KNeighborsClassifier": "KNN",
    "DummyClassifier": "Random",
    "ConvolutionalNeuralNetwork": "CNN",	
}
results_df["Classifier"] = results_df["Classifier"].replace(name_mapping)
columns = ["Dataset", "Classifier", "Fit Time (s)", "Predict Time (s)", "Accuracy"]
results_df = pd.DataFrame(features_results, columns=columns)

results_df

In [None]:
dataset_averages = {}
for dataset_name in results_df["Dataset"].unique():
    filtered_df = results_df[results_df["Dataset"] == dataset_name]
    averages = filtered_df[["Fit Time (s)", "Predict Time (s)", "Accuracy"]].mean()
    dataset_averages[dataset_name] = averages
dataset_averages_df = pd.DataFrame.from_dict(dataset_averages, orient="index")
dataset_averages_df

In [None]:
classifier_averages = {}
for classifier_name in results_df["Classifier"].unique():
    filtered_df = results_df[results_df["Classifier"] == classifier_name]
    averages = filtered_df[["Fit Time (s)", "Predict Time (s)", "Accuracy"]].mean()
    classifier_averages[classifier_name] = averages.to_dict()  # Convert Series to dict
classifier_averages_df = pd.DataFrame.from_dict(classifier_averages, orient="index")
classifier_averages_df 