## Import Modules and Prerequisites
---
Please use this section to import any necessary modules that will be required later in this notebook like the example given.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# add any needed libraries
from pathlib import Path
from audioblock import *
from model import *
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.utils.class_weight import compute_class_weight
from sklearn.preprocessing import StandardScaler
import tensorflow.keras as keras
from keras.callbacks import EarlyStopping
%matplotlib inline

## Automatic Speech Recognition
---
#### Note: There is no expectation of coding a highly sophisticated solution in this current small time period. Each question can be answered either with a short code example along with a possible written explaination of a more elaborate approach or with not highly tuned models, due to lack of available resources and time.

A common task in Acoustics is to predict the speaker from corresponding audio signals (speaker identification). In the provided corpus (see the project description), you can find transcripts under various speech settings and speaking conditions. 

### 1. Train a classifier on the Solo Speech condition dataset that will reach an acceptable accuracy score.
---
Feel free to follow any design choices you feel fit the problem best. Briefly describe your approach in markdown cells, along with any necessary comments on your choices. Explain your choices with the appropriate evaluation plots - analysis

In [2]:
#Path initialization
ROOT_PATH = Path.cwd()
ROOT_DATASET_PATH = ROOT_PATH.joinpath('dataset')
SOLO_DATASET_PATH = ROOT_DATASET_PATH.joinpath('data').joinpath('solo')
TRAIN_FEATURES_EXPORT_PATH = ROOT_DATASET_PATH.joinpath('train_ftrs.pickle')
OUTPUT_PATH = ROOT_PATH.joinpath('output')

Audio Feature Extraction

In [None]:
#Create a list which contains all the audio files' paths in solo dataset
audiofiles = list((SOLO_DATASET_PATH).glob('**/*.wav'))

# Extract audio features for SOLO dataset
create_features_dataset(filepaths=audiofiles, exportpath=TRAIN_FEATURES_EXPORT_PATH)

### Setting up the experiment

In [None]:
# Split dataset into train, validatio and test sets for experimentation purposes
def prepare_dataset(X, y, test_size=0.2, validation_size=0.2):
    """Creates train, validation and test sets."""

    # create train, validation, test split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size)
    X_train, X_validation, y_train, y_validation = train_test_split(X_train, y_train, test_size=validation_size)

    return X_train, y_train, X_validation, y_validation, X_test, y_test

In [None]:
MAX_EPOCHS = 100
BATCH_SIZE = 16

#Load the extracted feature set for the SOLO dataset
X, y = load_data(data_path = TRAIN_FEATURES_EXPORT_PATH)

#Label encoding
le = LabelEncoder()
y_enc = le.fit_transform(y)

#Split the dataset in train, val, test sets
X_train, y_train, X_val, y_val, X_test, y_test = prepare_dataset(X, y_enc)

#Calculate the class weights (not totally necessary, the dataset is balanced in general)
class_weights = compute_class_weight(
                                        class_weight = "balanced",
                                        classes = np.unique(y_train),
                                        y = y_train                                                   
                                    )
class_weights = dict(zip(np.unique(y_train), class_weights))

# Create the deep neural network
input_shape = (X_train.shape[1], X_train.shape[2]) # timesteps=16, features=39
n_classes = len(list(le.classes_))
model = build_model(input_shape, n_classes)

# Train model using earling stopping as we have a validation set
callback = EarlyStopping(monitor='val_loss', mode='min', patience=3)
history = model.fit(
    X_train, y_train, 
    validation_data=(X_val, y_val), 
    batch_size=BATCH_SIZE, 
    epochs=MAX_EPOCHS, 
    class_weight=class_weights,
    callbacks=[callback]
)

# plot accuracy/error for training and validation
plot_history(history, fullpath=OUTPUT_PATH.joinpath('train_history.jpg'))

# evaluate model on test set
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)
print('Test accuracy:', test_acc)

#Make predictions
y_prob = model.predict(X_test)
y_pred = np.argmax(y_prob, axis=1)

#Store results
y_pred = le.inverse_transform(y_pred)
y_test = le.inverse_transform(y_test)
results = {
    'label': y_test,
    'prediction': y_pred,
    'probabilities': y_prob.tolist()
}

# save results
filename = OUTPUT_PATH.joinpath('experimentation_results.pickle')
filename.parent.mkdir(parents=True, exist_ok=True)
with open(filename, 'wb') as f:
    pickle.dump(results, f)

plot_confusion_matrix(results['label'], results['prediction'], norm=False, fullpath=OUTPUT_PATH.joinpath('cm.jpg'))
plot_confusion_matrix(results['label'], results['prediction'], norm=True, fullpath=OUTPUT_PATH.joinpath('normalized_cm.jpg'))

### 2. Assuming that you needed to apply the learned rules / models on the Fast Speech condition dataset, without having that (test) dataset beforehand, what you would do?
---
The goal is to approach the classification accuracy obtained on the train dataset to the test dataset, without using the latter for training. Describe any challenges (if they exist) and code your solution below following the same guidelines 

### 3. Another important task is to perform gender classification on the same datasets, but there are no available labels. You can use the entirety of data you have at your disposal. Describe possible approaches to this problem and code the most robust solution of your choice. 

## Thank you in advance. Good luck!