# Neural networks for audio classification

## Part 1: Dataset inspection

The first step is always visualising our data. We have ignored this for the sake of having more time for audio processing so far. We will load a dataframe that contains metadata about our dataset as well as the file paths and investigate it in this section.

**For GPUs,**

In [None]:
## Activate gpu usage if available
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:    
    try:  
        tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=4096)])
    except RuntimeError as e:
        print(e)
else:
    print('no gpus found!')

In [None]:
%matplotlib inline

In [None]:
from config import *
import pandas as pd
from utility import keep_only_n_unknowns, pad_signal, augment_audio, get_callbacks
import matplotlib.pyplot as plt

In [None]:
## Load the dataframe
df_all = pd.read_pickle(data_dir + 'df_all.pkl')
df_all.info()

## Exercise 1
1. Visualize the dataset. How are the recordings distributed in terms of **"keyword"** and **"speaker_id"**? Are there many different speakers?
2. Would you adjust the class distribution? Set the "balance_out" flag to **True** or **False**.

## Hints
- Useful commands **-** df.describe, df['column'].value_counts
- Single columns can be selected by passing their name as a string: df['name']
- Columns of dataframes can be selected by passing a list of strings: df[name_list]
- A pandas series object (column of a data frame) has a **"plot"** method that can be helpful, **Use:**  `.plot(kind='bar')`.

## Solution

### E1 
1. The **"describe"** command below shows, that we have something like $2500$ unique speakers with roughly a maximum of $25$ (speak_ut unique value) utterances per keyword each. The **freq** column also tells us that our top recording speaker has $232$ recordings over all keywords.

In [None]:
## select the columns of interest and print some statistics
df_all[['keyword', 'speaker_id', 'speaker_ut']].describe()

In [None]:
## for all speakers, plot the keyword distribution as histogram
df_all['keyword'].value_counts().plot(kind='bar');

### E2
We can see from the bar chart that we have many examples of the **unknown** keyword and the other classes are well balanced. Well balanced datasets are always preferable since the model training might be influenced by the inbalanced data distribution. 

In [None]:
balance_out = True

## Part 2: Data loading

We need to set up a pipeline that loads the data into memory and provides it to Keras `model.fit()` function that will later perform the training. But first we will split our dataset into $3$ distinct sets. This will be useful for training later. 

## Train-test split
The dataset has already been split up for us into train, test and evaluation set. We will train the model on the training set and evaluate its performance on the evaluation set later.

In [None]:
## Balance out the dataset
if balance_out:
    df_all = keep_only_n_unknowns(df_all, 10)
    df_all.keyword.value_counts()

In [None]:
df_all['keyword'].value_counts().plot(kind='bar');

In [None]:
## split the dataset
df_train = df_all[(df_all.dataset == 'training')]
df_val   = df_all[df_all.dataset == 'validation']
df_test  = df_all[df_all.dataset == 'testing']

## Prepare the data loader

In anticipation of what's coming later, we will use a data loader. It is an object which can be called by the `model.fit()` method and returns the dataset in batches. We will load the data in two stages. The first is loading the audio signals (.wav files) from the hard drive. This will be done for the complete dataset and saved in memory (inside the dataloader). The second is converting the signals into mfcc features. We will see later in detail why this makes sense.

In [None]:
import numpy as np
import tensorflow as tf
from datetime import datetime
from tensorflow import keras
from tqdm.auto import tqdm  
import librosa

## activate tqdm for pandas
tqdm.pandas()

## fix random seeds for tensorflow
tf.random.set_seed(0)

In [None]:
## training hyperparameters
batch_size= 32 # size of the batches for training

In [None]:
## load the raw audio data into memory
signals_train = df_train.file_path.progress_apply(lambda x: pad_signal(librosa.load(data_dir + x, sr=fs)[0],
                                                                    fs)).values
signals_val   = df_val.file_path.progress_apply(  lambda x: pad_signal(librosa.load(data_dir + x, sr=fs)[0],
                                                                    fs)).values
signals_test  = df_test.file_path.progress_apply( lambda x: pad_signal(librosa.load(data_dir + x, sr=fs)[0],
                                                                    fs)).values

In [None]:
## loading the labels
keywords_test  = df_test.label_one_hot.apply(lambda x: np.asarray(x).astype('float32')).values
keywords_val   = df_val.label_one_hot.apply(lambda x: np.asarray(x).astype('float32')).values
keywords_train = df_train.label_one_hot.apply(lambda x: np.asarray(x).astype('float32')).values

In [None]:
## we need to treat silence utterances differently, so we need to pass the silence label to the loader
silence_label = df_all[df_all.keyword == 'silence'].label_one_hot.iloc[0]#.unique()

In [None]:
## create a loader that calculates mfccs and provides batches of data, especially important later
class GSCLoader(tf.keras.utils.Sequence):
    ''' Loader provides batches of size batchsize with features x' and labels y where x' = f(x) '''
    
    def __init__(self, batchsize, x, y, f=None, silence_label=None):
        
        self.x = np.stack(x)
        self.y = np.stack(y)
        self.batchsize = batchsize
        self.indices   = np.arange(self.x.shape[0])
        self.f         = f
        self.silence_label = np.argmax(silence_label)
    
    ## return the number of batches per epoch
    def __len__(self):
        return int(np.floor(len(self.x) / self.batchsize))

    ## return a batch of features, labels
    def __getitem__(self, idx):
        
        inds = self.indices[idx * self.batchsize:(idx + 1) * self.batchsize]
        features = np.array([self.f(silence=np.argmax(self.y[i]==self.silence_label), sig=self.x[i]) for i in inds])
        labels = np.array(self.y[inds])
        
        return features , labels

    ## shuffle the training data when done with one epoch
    def on_epoch_end(self):
        np.random.shuffle(self.indices)
        print('shuffling indices')

In [None]:
## Define the function to calculate mfccs from the audio signal
def f(silence, sig):
    return augment_audio(silence, mode = '', sig=sig, fs=fs, l=l, s=s, n_mfccs=n_mfccs, padd_audio_to_samples=fs)
    
    
## Create the loaders with a batchsize that returns the whole dataset when the loader is called
train_loader = GSCLoader(f = f, batchsize = len(keywords_train), y = keywords_train, x = signals_train, 
                         silence_label=silence_label)
val_loader   = GSCLoader(f = f        , batchsize = len(keywords_val) , y = keywords_val,   x = signals_val, 
                         silence_label=silence_label)
test_loader  = GSCLoader(f = f        , batchsize = len(keywords_test), y = keywords_test,  x = signals_test, 
                         silence_label=silence_label)

In [None]:
## Validation set
val_data = val_loader.__getitem__(0)

In [None]:
## Training set
train_data = train_loader.__getitem__(0)

Wait, what? We have created this dataloader, in the end just create a numpy array called training set? Couldn't we have arrived there without the loader? Yes we could have. For now we will just use the val_data and train_data arrays. The reason for creating the loader will become clear after the next lecture but its more reasonable to have it already prepared now.

# Neural networks

We will train a classifier (neural network) that predicts which keyword or class is present from the MFCC features of a one-second long audio clip.

## Part 3: Set up a model

In [None]:
from tensorflow.keras import layers

In [None]:
## Infer model size
n_max_frames     =  49  # leave this at 49 
n_output_neurons = len(df_all.keyword.unique())

print('features have the dimension:', n_max_frames, 'x', n_mfccs, 'and output neurons:', n_output_neurons)

## Exercise
We will use `tf.keras.models.Sequential()` and feed a list of layers to it to create our model. 

1. Create a feed forward network with $2$ hidden layers and **ReLU** activation functions, that has a softmax output layer. you can use `tf.keras.Input()` as the input layer before the Dense hidden layers. Use $64$, $128$ neurons for your **"Dense"** layers.

2. Check the dimensions and parameters of your model using `model.summary()` and try out the `model.predict` function on a training batch. 

## Hints:
- The input dimension is the dimension of the spectrogram image $(49 * 40)$. A hidden layer only accepts $1$ dimensional input. You could use the **reshape** layer to reshape the input to $1$ dimension. 
- You can use `np.random.random()` and pass it a tuple of $(batchsize, 49, 40)$ to create a random batch for testing the model with the `model.predict()` function.
- The predictions should sum up to $1$ because we have used a **Softmax** layer. You can check it with np.sum(prediction_vector).

## Solution:

### E1
Model architecture

In [None]:
model = tf.keras.models.Sequential(
    [
        tf.keras.Input(name='input_layer', shape=(n_max_frames, n_mfccs)),
        layers.Reshape((n_max_frames * n_mfccs, ), input_shape=(n_max_frames, n_mfccs)),
        layers.Dense(64, activation='relu'),
        layers.Dense(128, activation='relu'),
        layers.Dense(n_output_neurons, activation='softmax'),
    ]
)

### E2
Model summary

In [None]:
model.summary()
model.input_shape

Model predictions

In [None]:
prediction = model.predict(np.random.random((10,49,40)))[0]
print(prediction, '\n sum:', np.sum(prediction))

# Part 4: Train the model

In this part, we will compile the model by providing loss, metrics and an optimizer. We will use one set of parameters for the following trainings.

In [None]:
## Number of epochs to run the training for
n_epochs= 30

## Early stopping setting
patience= 25    

## Logging/debugging 
debugging_mode = False

In [None]:
## Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=[keras.metrics.CategoricalAccuracy()],
              run_eagerly=debugging_mode)

## Fit with training set unaugmented 1fold
Now we finally get to training a model. Keras has everything implemented inside the model class. While it is possible to write a custom training, we can just pass all the options to the `model.fit()` function and it will do everything for us. We need to pass:
- Training set as $x$ and $y$.
- **steps_per_epoch**, which are the number of batches inside the training set.
- **n_epochs**, which is the total number of epochs to train for.
- **Shuffle**, which automatically shuffles the dataset after each epoch (we set it to **False** for now for all our trainings).
- **validation_data**, which is the validation set. This will only be used to calculate loss and accuracy on itself.
- Callbacks, which is a collection of methods that are called throughout the training. We have provided a callback function for you that will write out certain metrics like **confusion matrix**, **roc curve** and so on. You should check if you can find those in your output_dir, sorted by the datetime when the training started.

In [None]:
import sys, importlib

importlib.reload(sys.modules['utility'])
from utility import get_callbacks

In [None]:
history = model.fit(x=train_data[0], y=train_data[1], 
                    steps_per_epoch=int(np.floor(len(train_data[0]) / batch_size)),
                    epochs=n_epochs, 
                    callbacks=get_callbacks(output_dir, val_data, model, patience=patience), 
                    validation_data=val_data, 
                    shuffle=False)

print('max val val_categorical_accuracy', np.max(history.history['val_categorical_accuracy']))

plt.plot(history.history['categorical_accuracy'])
plt.plot(history.history['val_categorical_accuracy'])

## Exercise
1. We should see a significant difference between the training and validation accuracies in the plots above. Why is this the case?

2. How do you judge the overall accuracy? Play with the number of hidden layers and the number of neurons in the hidden layers and retrain. Do you get a better result?

## Hints
- Whats the difference between validation and train data?
- Look at the confusion matrices we have dumped to your data folder. What can you see?

## Solution

### E1
The validation data are not used for parameter optimization. The phenomenon we encountered is called overfitting and it can have different causes like **data sparsity**, **outliers**, too many **degrees of freedom** etc. We will learn more about it in the next lecture. 

### E2
The confusion matrices show that some keywords get mixed up a lot. The accuracy tells us how many instances are classified correctly or how many keywords are recognized correctly. <br>
<br>
**Lets try some deeper architectures**,

In [None]:
model = tf.keras.models.Sequential(
    [
        tf.keras.Input(name='input_layer', shape=(n_max_frames, n_mfccs)),
        layers.Reshape((n_max_frames * n_mfccs, ), input_shape=(n_max_frames, n_mfccs)),
        layers.Dense(64, activation='relu'),
        layers.Dense(128, activation='relu'),
        layers.Dense(128, activation='relu'),
        layers.Dense(128, activation='relu'),    
        layers.Dense(n_output_neurons, activation='softmax'),
    ]
)

In [None]:
model.summary()

In [None]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=[keras.metrics.CategoricalAccuracy()],
              run_eagerly=debugging_mode)

In [None]:
history = model.fit(x=train_data[0], y=train_data[1], 
                    steps_per_epoch=int(np.floor(len(train_data[0]) / batch_size)),
                    epochs=n_epochs, 
                    callbacks=get_callbacks(output_dir, val_data, model, patience=patience), 
                    validation_data=val_data, 
                    shuffle=False)

print('max val val_categorical_accuracy', np.max(history.history['val_categorical_accuracy']))

plt.plot(history.history['categorical_accuracy'])
plt.plot(history.history['val_categorical_accuracy'])

We can see that improving the network's depth helped to increase the accuracy to $74\%$, so our model was not powerful enough so far. <br>
<br>
**Lets try more deeper model**, 

In [None]:
model = tf.keras.models.Sequential(
    [
        tf.keras.Input(name='input_layer', shape=(n_max_frames, n_mfccs)),
        layers.Reshape((n_max_frames * n_mfccs, ), input_shape=(n_max_frames, n_mfccs)),
        layers.Dense(64, activation='relu'),
        layers.Dense(128, activation='relu'),
        layers.Dense(128, activation='relu'),
        layers.Dense(128, activation='relu'),    
        layers.Dense(128, activation='relu'),
        layers.Dense(128, activation='relu'),
        layers.Dense(256, activation='relu'),    
        layers.Dense(256, activation='relu'),
        layers.Dense(512, activation='relu'),
        layers.Dense(256, activation='relu'),
        layers.Dense(n_output_neurons, activation='softmax'),
    ]
)

In [None]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=[keras.metrics.CategoricalAccuracy()],
              run_eagerly=debugging_mode)

In [None]:
model.summary()

In [None]:
history = model.fit(x=train_data[0], y=train_data[1], 
                    steps_per_epoch=int(np.floor(len(train_data[0]) / batch_size)),
                    epochs=n_epochs, 
                    callbacks=get_callbacks(output_dir, val_data, model, patience=patience), 
                    validation_data=val_data, 
                    shuffle=False)

print('max val val_categorical_accuracy', np.max(history.history['val_categorical_accuracy']))

plt.plot(history.history['categorical_accuracy'])
plt.plot(history.history['val_categorical_accuracy'])

We see that a deeper model can help to increase the accuracy, however only up to a point. After that, adding parameters might not help anymore.

Ideally we would like over $90\%$, so we have some room for improvement :-)

# STOP!
The following section is reserved for after we are through with lecture 3. If you got here you are done for now :-)

# Part 5: Improve the model

Our goal in this section is to try out some of the techniques we have learned to improve the model quality.
That means several things can be done and it is up to you to try out on your own what works. If you prefer a more structured approach you can strictly follow the exercises. 

In the following you are provided with a new training set in the form of **features** $X$ and **labels** $Y$. It is constructed using a random timeshift and has mixed in background noise. You can select how many times to repeat the augmentation process with the **Nfold** variable below. Keep in mind that we cannot use too big of a number here, since the dataset still needs to fit into memory. Otherwise we would need to train directly with the generator. Despite being possible, it will take considerably longer.

In [None]:
## Set augmentation criteria
time_shift_by_max = 0.1  # randomized time shift [s]
background_frequency = 0.8  # how often is background folded in? 1 = always, 0 = never
Ab= 0.1  # background amplitude

In [None]:
## For the training set signals will be augmented aka mode=='training'
from utility import load_all_wavs_in_dir
noise_data = load_all_wavs_in_dir(direc=brn_directory, sr=fs)

In [None]:
## For the training set signals will be augmented aka mode=='training'
def f_augment(silence, sig):
    mode = 'training'
    return augment_audio(silence, mode, sig, fs=fs, 
                              time_shift_by_max=time_shift_by_max,
                              background_frequency=background_frequency,
                              noise_data=noise_data,
                              Ab=Ab,
                              l=l, s=s, n_mfccs=n_mfccs, 
                              padd_audio_to_samples=fs)
    
## create a train loader that again returns the whole dataset in one batch, but applies f_augment this time
train_loader_augmented = GSCLoader(f = f_augment, batchsize = len(keywords_train), y = keywords_train, x = signals_train, 
                         silence_label=silence_label)

In [None]:
## create an nfold training set. X,Y will be the baseline (1 fold) and X_train, Y_train Nfold
Nfold = 3
X_train, Y_train = train_loader_augmented.__getitem__(0)

for i in tqdm(range(Nfold-1)):
    X,Y = train_loader_augmented.__getitem__(0)
    X_train = np.append(X_train, X, axis=0)
    Y_train = np.append(Y_train, Y, axis=0)

In [None]:
## keep X,Y as 1 fold augmented data and X_train, Y_train as Nfold augmented data
print(X_train.shape[0] / X.shape[0], 'fold')

#### TC-ResNet Architecture

In [None]:
## setup a new model architecture
from utility import ResBlock

def get_tc_resnet(n_max_frames, n_mfccs, n_output_neurons=12, dropout_rate=0.):

    T = n_max_frames 
    F = n_mfccs 

    n_channels = [16, 24, 32, 48]

    model = tf.keras.models.Sequential(
        [
            tf.keras.Input(name='input_layer', shape=(T, F, 1)),
            layers.Reshape((T, 1, F), input_shape=(T, F, 1,)),
            layers.Conv2D(filters=n_channels[0], kernel_size=[3, 1], activation=None, use_bias=False,
                          padding='same'),
            layers.Dropout(dropout_rate),
            ResBlock(n=n_channels[1], s=2),
            layers.Dropout(dropout_rate),
            ResBlock(n=n_channels[1], s=1),
            layers.Dropout(dropout_rate),
            layers.GlobalAveragePooling2D(),
            layers.Dropout(dropout_rate),
            layers.Dense(n_output_neurons, activation='softmax'),
        ]
    )
    
    return model

## Exercise
1. Take some time to review the network structure of the **TC resnet** above. How many parameters does the model have?

2. Do a baseline run with the train_data set. You can copy the important parts from above. What is the accuracy?

3. Lets try to further improve the accuracy by running with the new dataset. Be carefuful to adjust the steps_per_epoch part to the new length of the dataset. You might also want to scale down the number of epochs since the dataset effectively now contains **Nfold epochs**. Possible things to try out are:
    - Use the augmented data X_train, Y_train
    - Use the set $X$,$Y$ which are shifted and with background noise but just one fold version of the baseline set
    - Add dropout by passing the dropout_rate variable to the tc-resnet
4. Discuss your results. Which measures helped?

<!-- 5. Bonus: download a model from [keras.applications](https://keras.io/api/applications/mobilenet/#mobilenetv2-function) and train it for some epochs. -->

## Solution

### E1
- Reshape layer makes sure that we can apply the kernel over all frequencies simultaneously.
- Residual blocks $\rightarrow$ **resnet** like structure.
- Dropout layers are added.
- Global pooling reduces the size before the softmax layer.
- The softmax layer returns probabilities for the $12$ classes.

### E2 baseline: Fit with training set unaugmented 1fold

In [None]:
model = get_tc_resnet(n_max_frames, n_mfccs)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=[keras.metrics.CategoricalAccuracy()],
              run_eagerly=debugging_mode)

In [None]:
history = model.fit(x=train_data[0], y=train_data[1], 
                    steps_per_epoch=int(np.floor(len(train_data[0]) / batch_size)),
                    epochs=n_epochs, 
                    callbacks=get_callbacks(output_dir, val_data, model, patience=patience), 
                    validation_data=val_data, 
                    shuffle=False)

print('max val val_categorical_accuracy', np.max(history.history['val_categorical_accuracy']))

plt.plot(history.history['categorical_accuracy'])
plt.plot(history.history['val_categorical_accuracy'])

### E3: Fit with training set augmented 1fold

In [None]:
model = get_tc_resnet(n_max_frames, n_mfccs)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=[keras.metrics.CategoricalAccuracy()],
              run_eagerly=debugging_mode)

In [None]:
history = model.fit(x=X, y=Y, 
                    steps_per_epoch=int(np.floor(len(X) / batch_size)),
                    epochs=n_epochs, 
                    callbacks=get_callbacks(output_dir, val_data, model, patience=patience), 
                    validation_data=val_data, 
                    shuffle=False)
print('max val val_categorical_accuracy', np.max(history.history['val_categorical_accuracy']))

plt.plot(history.history['categorical_accuracy'])
plt.plot(history.history['val_categorical_accuracy'])

### E3: Fit with nfold data augmentation

In [None]:
model = get_tc_resnet(n_max_frames, n_mfccs)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=[keras.metrics.CategoricalAccuracy()],
              run_eagerly=debugging_mode)

In [None]:
n_epochs_corrected = int(n_epochs / 2)

history = model.fit(x=X_train, y=Y_train, 
                    steps_per_epoch=int(np.floor(len(X_train) / batch_size)),
                    epochs=n_epochs_corrected , 
                    callbacks=get_callbacks(output_dir, val_data, model, patience=patience), 
                    validation_data=val_data, 
                    shuffle=False)
print('max val val_categorical_accuracy', np.max(history.history['val_categorical_accuracy']))

plt.plot(history.history['categorical_accuracy'])
plt.plot(history.history['val_categorical_accuracy'])

### E3: Fit with nfold augmented + dropout

In [None]:
model = get_tc_resnet(n_max_frames, n_mfccs, dropout_rate=0.2)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=[keras.metrics.CategoricalAccuracy()],
              run_eagerly=debugging_mode)

In [None]:
n_epochs_corrected = int(n_epochs)

history = model.fit(x=X_train, y=Y_train, 
                    steps_per_epoch=int(np.floor(len(X_train) / batch_size)),
                    epochs=n_epochs_corrected , 
                    callbacks=get_callbacks(output_dir, val_data, model, patience=patience), 
                    validation_data=val_data, 
                    shuffle=False)
print('max val val_categorical_accuracy', np.max(history.history['val_categorical_accuracy']))

plt.plot(history.history['categorical_accuracy'])
plt.plot(history.history['val_categorical_accuracy'])

### E4: Discussion
Sorted by accuracy:
- 1fold augmented set with timeshift and background noise $\rightarrow$ $91.8\%$ 
- Baseline with unaugmented training set $\rightarrow$ $92.4\%$ (Makes sense, since validation set is not augmented!)
- 3fold augmented $\rightarrow$ $93.8\%$
- 3fold augmented with dropout of $0.2$ $\rightarrow$ $94.5\%$

The difference in accuracy is roughly 3 percentage points. That is, with an already optimized structure for keyword spotting. It is a sizeable effect, which could have been even bigger when starting from a different model architecture. 

# Part 6: Train using other pretrained models - MobileNetV2

In this section we will use a predefined model **(MobileNetV2)** from [keras.applications](https://keras.io/api/applications/) that is meant for **image classification** and try it out for **keyword spotting**. The model can easily be downloaded via $tf.keras.applications$.

In [None]:
model = tf.keras.applications.MobileNetV2(
    input_shape=(49,40,1),
    alpha=1.0,
    include_top=True,
    weights=None,
    input_tensor=None,
    pooling=None,
    classes=12,
    classifier_activation="softmax",
)

In [None]:
model.summary()

## Exercise
1. Compile and train the model as done above.
2. Compare the results to our previous ones in terms of parameters and accuracy. What can we learn?

## Solution

### E1

In [None]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=[keras.metrics.CategoricalAccuracy()],
              run_eagerly=debugging_mode)

In [None]:
n_epochs_corrected = int(n_epochs / 2)

history = model.fit(x=X_train, y=Y_train, 
                    steps_per_epoch=int(np.floor(len(X_train) / batch_size)),
                    epochs=n_epochs_corrected , 
                    callbacks=get_callbacks(output_dir, val_data, model, patience=patience), 
                    validation_data=val_data, 
                    shuffle=False)
print('max val val_categorical_accuracy', np.max(history.history['val_categorical_accuracy']))

plt.plot(history.history['categorical_accuracy'])
plt.plot(history.history['val_categorical_accuracy'])

In [None]:
print('max val val_categorical_accuracy', np.max(history.history['val_categorical_accuracy']))

plt.plot(history.history['categorical_accuracy'])
plt.plot(history.history['val_categorical_accuracy'])

### E2

- Model has more parameters.
- Performance is similar.

What can we learn?
- Loading pre-defined bigger convolutional models is a very good way to start your training and get a baseline accuracy. However it might be necessary and rewarding to tailor the model architecture to your needs.