# Training the model

We utilize Kaggle's free GPU to train a model with 1 convolutional layer, 2 GRU layers and a dense layer. Because there is so much laughter in the show, we are able to use accuracy as our metric to optimize. However, we also pay close attention to precision and recall as well as F1 score. 

In [None]:
import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

### Checking our data sources

In [None]:
import os
print(os.listdir("../input/"))

### Loading X and y inputs (we created these in the friendsaudio notebook)

In [None]:
# Load X and Y files
prexfolder = '/kaggle/input/friendsaudio/prex/'
preyfolder = '/kaggle/input/friendsaudio/prey/'

loadedX = np.load(prexfolder + 'prex.txt.npy')
loadedY = np.load(preyfolder + 'prey.txt.npy')
print("Shape of X is " + str(loadedX.shape))
print("Shape of Y is " + str(loadedY.shape))



### Getting train, dev, test sets

We split X and y into 60% train, 20% dev and 20% test sets. 

In [None]:
# Getting train, dev, test sets

X = loadedX
y = loadedY

# splitting train (80%) and test (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# Taking train (80%) and removing 25% to create val (20% overall) and leaves train at 60% overall
X_train, X_dev, y_train, y_dev = train_test_split(X_train, y_train, test_size=0.25, random_state=1)

print("Shape of X_train and y_train are " + str(X_train.shape) + ", " +  str(y_train.shape))
print("Shape of X_dev and y_dev are " + str(X_dev.shape) + ", " +  str(y_dev.shape))
print("Shape of X_test and y_test are " + str(X_test.shape) + ", " +  str(y_test.shape))

In [None]:
from keras.callbacks import ModelCheckpoint
from keras.models import Model, load_model, Sequential
from keras.layers import Dense, Activation, Dropout, Input, Masking, TimeDistributed, LSTM, Conv1D
from keras.layers import GRU, Bidirectional, BatchNormalization, Reshape
from keras.optimizers import Adam
from keras.metrics import Precision, Recall

### Define model

Here we define our model and some hyperparameters. Did a lot of guess-and-check work with different ordering, different hyperparameters and different layers. This model turned out to be the best out of all the models tested. 

In [None]:
def model(input_shape):
    
    X_input = Input(shape = input_shape)
    
    # Convolution layer
    X = Conv1D(filters=256,kernel_size=15,strides=1)(X_input)
    X = BatchNormalization()(X)
    X = output_x = Activation("relu")(X)
    X = Dropout(rate=0.8)(X)
    
    # GRU Layer 1
    X = GRU(units=256, return_sequences = True)(X)
    X = Dropout(rate=0.8)(X)
    X = BatchNormalization()(X)
    
    # GRU Layer 2
    X = GRU(units=256, return_sequences = True)(X)
    X = Dropout(rate=0.8)(X)
    X = BatchNormalization()(X)
    X = Dropout(rate=0.8)(X)
    
    # Time-Distributed Dense Layer with Sigmoid
    X = TimeDistributed(Dense(1, activation = "sigmoid"))(X)
    
    model = Model(inputs = X_input, outputs = X)
    
    return model
    

### Telling the model what the input shape of the data will be.

In [None]:
model = model(input_shape = (X_train.shape[1], X_train.shape[2]))

### Making sure we are getting the right input/output shapes we expect in every layer.

In [None]:
model.summary()

### Using Adam for gradient descent optimization

In [None]:
opt = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.01)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=["accuracy", Precision(), Recall()])

### Training the model

For our final model weights, we trained for 100 epochs (took around 5-6 hours). But for testing different layers and hyperparameters, 15 epochs was plenty to give a sense of how a model would perform. 

In [None]:
# Train model
model.fit(X_train, y_train, batch_size = 32, epochs = 15)

### Dev set metrics

Accuracy was the metric we optimized for. But it was helpful to see precision and recall to understand whether we were missing a lot of "true" laughter or labeling things that weren't laughter as laughter.  

In [None]:
# Test on dev set
loss, acc, prec, recall = model.evaluate(X_dev, y_dev)
F1 = 2 * ((prec * recall) / (prec + recall))
print("Dev set accuracy = ", acc)
print("Dev set precision = ", prec)
print("Dev set recall = ", recall)
print("Dev set F1 = ", F1)


### Saving down model weights for predicting in another notebook

In [None]:
savefolder = '/kaggle/working/'

model.save_weights(savefolder + 'rawaudiomodelweights.h5')

### Looking at specific dev set examples

It was helpful to look at specific 10-second clip examples to see exactly how the model was performing. Early on, it was getting all the laughter correct but it was labeling lots of things that weren't laughter as well. 

In [None]:
numpick = 1
example = X_dev[numpick]
example = np.expand_dims(example, axis=0)
preds = model.predict(example)
probs = preds[0, :, 0]

# probabilities graph
plt.subplot(1, 1, 1)
plt.plot(probs)
plt.ylabel('probability')
plt.show()

binary = np.where(probs > 0.5, 1, 0)

# binary preds graph
plt.subplot(1, 1, 1)
plt.plot(binary)
plt.ylabel('binary preds')
plt.show()

actual = y_dev[numpick]

# actuals graph
plt.subplot(1, 1, 1)
plt.plot(actual)
plt.ylabel('actuals')
plt.show()

### Looking at specific examples in more depth

This code allowed us to see the spectrogram alongside the labeled laughter. It became clear that when we inverted the audio track, it was much easier for the model to correctly label laughter. It was also much easier for the human eye to see laughter instances with an inverted audio track.

In [None]:
from scipy.io import wavfile
import IPython

testclipfolder = '/kaggle/input/randomtestclips/'
cliplist = os.listdir(testclipfolder)

testnum = 7

for i, clip in enumerate(cliplist):
    if i == testnum:
        filepath = testclipfolder + clip
        IPython.display.display(IPython.display.Audio(filepath))
        
        FS, data = wavfile.read(filepath)
        pxx, freqs, bins, im = plt.specgram(data, Fs=FS, NFFT=512, noverlap=0)
        plt.show()
        pxxtransposed = pxx.T
        cliptopredict = np.expand_dims(pxxtransposed, axis=0)
        testpreds = model.predict(cliptopredict)
        testprobs = testpreds[0, :, 0]

        # probabilities graph
        plt.subplot(2, 1, 2)
        plt.plot(testprobs)
        plt.ylabel('something')
        plt.show()

        binary = np.where(testprobs > 0.5, 1, 0)

        # binary preds graph
        plt.subplot(2, 1, 2)
        plt.plot(binary)
        plt.ylabel('test binary preds')
        plt.show()
