# Appendix

## Packages

<div style="display:flex;">
  <div style="flex:50%; padding:10px;">

For audio:
    <ul>
      <li>Librosa (loading audio, generating spectograms)</li>
      <li>noisereduce (reducing noise)</li>
      <li>audiomentations (augmenting audio)</li>
      <li>PIL (handling spectogram images)</li>
    </ul>

Machine Learning:
    <ul>
      <li>tensorflow v2 (For building models) </li>
      <li>scikit-learn (Evaluating models)</li>
    </ul>
      
  </div>
    
  <div style="flex:50; padding:10px;">
Data:
    <ul>
      <li>numpy </li>
      <li>pandas</li>
      <li>scipy</li>
    </ul>

Visualization:
    <ul>
      <li>matplotlib</li>
      <li>seaborn</li>
    </ul>
      
Utility:
    <ul>
      <li>joblib</li>
      <li></li>
    </ul>

  </div>
</div>

## Data

## Optimization

<div style="display:flex;">
  <div style="flex:50%; padding:10px;">
      <h3>Multi-processing</h3>

Loading all the audio files, preprossesing them, and saving the final mel-spectograms takes a lot of time.

In order to speed things up, we used `joblib` as to enable multi-processing for the procedure. We chose this framework because it is also what is used internally in the Librosa package, and as such yielded us the best results.

Using 6 cores as simultaneous workers, we recuded time usage by a factor 3, from 12 hours to 4 hours.
  </div>

  <div style="flex:50%; padding:10px;">
      <h3>Hardware</h3>

In order to run our models more efficiently, we took advantage of Google Colab's GPU ressourcers. This sped things up significantly, by a factor 20 from 300 seconds per epoch to 15.

Additionally, we relied on Colab's large amount of RAM (25 GB) to train the mixed models (and even then it would sometimes crash).
  </div>
      
</div>

## Model summaries

### CNN

<pre style="height:600px;overflow-y:scroll;">
Model: "model_19"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_22 (InputLayer)        [(None, 48, 128, 1)]      0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 46, 126, 16)       160       
_________________________________________________________________
batch_normalization_8 (Batch (None, 46, 126, 16)       64        
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 23, 63, 16)        0         
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 21, 61, 32)        4640      
_________________________________________________________________
batch_normalization_9 (Batch (None, 21, 61, 32)        128       
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 10, 30, 32)        0         
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 8, 28, 64)         18496     
_________________________________________________________________
batch_normalization_10 (Batc (None, 8, 28, 64)         256       
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 4, 14, 64)         0         
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 2, 12, 128)        73856     
_________________________________________________________________
batch_normalization_11 (Batc (None, 2, 12, 128)        512       
_________________________________________________________________
max_pooling2d_11 (MaxPooling (None, 1, 6, 128)         0         
_________________________________________________________________
global_average_pooling2d_2 ( (None, 128)               0         
_________________________________________________________________
dense_54 (Dense)             (None, 512)               66048     
_________________________________________________________________
dropout_29 (Dropout)         (None, 512)               0         
_________________________________________________________________
dense_55 (Dense)             (None, 512)               262656    
_________________________________________________________________
dropout_30 (Dropout)         (None, 512)               0         
_________________________________________________________________
dense_56 (Dense)             (None, 128)               65664     
_________________________________________________________________
dropout_31 (Dropout)         (None, 128)               0         
_________________________________________________________________
dense_57 (Dense)             (None, 27)                3483      
=================================================================
Total params: 495,963
Trainable params: 495,483
Non-trainable params: 480
_________________________________________________________________
</pre>

**Classification report**
<pre style="height:600px;overflow-y:scroll;">
              precision    recall  f1-score   support

           0       0.63      0.76      0.69        70
           1       0.74      0.68      0.71       101
           2       0.60      0.63      0.62        70
           3       0.76      0.60      0.67        53
           4       0.75      0.73      0.74        78
           5       0.74      0.68      0.71        93
           6       0.85      0.83      0.84        48
           7       0.77      0.80      0.78       127
           8       0.69      0.57      0.62        70
           9       0.56      0.50      0.53       132
          10       0.68      0.79      0.73       132
          11       0.81      0.68      0.74        69
          12       0.80      0.86      0.83       139
          13       0.67      0.68      0.68       149
          14       0.78      0.70      0.74        83
          15       0.71      0.66      0.68        82
          16       0.67      0.50      0.57        66
          17       0.66      0.83      0.74       105
          18       0.60      0.74      0.66        77
          19       0.64      0.58      0.61        71
          20       0.67      0.77      0.72        57
          21       0.74      0.79      0.76        76
          22       0.53      0.60      0.56       112
          23       0.74      0.66      0.70       118
          24       0.59      0.53      0.56        55
          25       0.59      0.60      0.59        67
          26       0.76      0.72      0.74        67

    accuracy                           0.69      2367
   macro avg       0.69      0.68      0.69      2367
weighted avg       0.69      0.69      0.69      2367
</pre>

### MLP

<pre style="height:600px;overflow-y:scroll">
Model: "model_20"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_23 (InputLayer)        [(None, 15)]              0         
_________________________________________________________________
dense_58 (Dense)             (None, 64)                1024      
_________________________________________________________________
dense_59 (Dense)             (None, 64)                4160      
_________________________________________________________________
dense_60 (Dense)             (None, 64)                4160      
_________________________________________________________________
dropout_32 (Dropout)         (None, 64)                0         
_________________________________________________________________
dense_61 (Dense)             (None, 27)                1755      
=================================================================
Total params: 11,099
Trainable params: 11,099
Non-trainable params: 0
_________________________________________________________________
</pre>

**Classification report**

<pre style="height:600px;overflow-y:scroll">
              precision    recall  f1-score   support

           0       0.31      0.23      0.26        70
           1       0.54      0.19      0.28       101
           2       0.24      0.14      0.18        70
           3       0.18      0.09      0.12        53
           4       0.35      0.53      0.42        78
           5       0.32      0.78      0.46        93
           6       0.54      0.31      0.39        48
           7       0.43      0.34      0.38       127
           8       0.30      0.04      0.07        70
           9       0.43      0.55      0.49       132
          10       0.36      0.81      0.50       132
          11       0.44      0.17      0.25        69
          12       0.43      0.55      0.48       139
          13       0.27      0.13      0.17       149
          14       0.36      0.17      0.23        83
          15       0.23      0.17      0.19        82
          16       0.14      0.05      0.07        66
          17       0.46      0.30      0.37       105
          18       0.10      0.05      0.07        77
          19       0.22      0.07      0.11        71
          20       0.29      0.30      0.30        57
          21       0.38      0.33      0.35        76
          22       0.24      0.24      0.24       112
          23       0.28      0.74      0.41       118
          24       0.34      0.27      0.30        55
          25       0.38      0.22      0.28        67
          26       0.30      0.46      0.36        67

    accuracy                           0.34      2367
   macro avg       0.33      0.31      0.29      2367
weighted avg       0.34      0.34      0.31      2367
</pre>

### LSTM

<pre style="height:600px;overflow-y:scroll">
Model: "model_21"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_24 (InputLayer)        [(None, 48, 128)]         0         
_________________________________________________________________
lstm_61 (LSTM)               (None, 48, 36)            23760     
_________________________________________________________________
lstm_62 (LSTM)               (None, 48, 32)            8832      
_________________________________________________________________
lstm_63 (LSTM)               (None, 48, 28)            6832      
_________________________________________________________________
lstm_64 (LSTM)               (None, 48, 24)            5088      
_________________________________________________________________
lstm_65 (LSTM)               (None, 48, 20)            3600      
_________________________________________________________________
lstm_66 (LSTM)               (None, 16)                2368      
_________________________________________________________________
dense_62 (Dense)             (None, 64)                1088      
_________________________________________________________________
dropout_33 (Dropout)         (None, 64)                0         
_________________________________________________________________
dense_63 (Dense)             (None, 64)                4160      
_________________________________________________________________
dropout_34 (Dropout)         (None, 64)                0         
_________________________________________________________________
dense_64 (Dense)             (None, 32)                2080      
_________________________________________________________________
dropout_35 (Dropout)         (None, 32)                0         
_________________________________________________________________
dense_65 (Dense)             (None, 27)                891       
=================================================================
Total params: 58,699
Trainable params: 58,699
Non-trainable params: 0
_________________________________________________________________
</pre>

**Classification report**

<pre style="height:600px;overflow-y:scroll">
              precision    recall  f1-score   support

           0       0.48      0.29      0.36        70
           1       0.27      0.28      0.27       101
           2       0.46      0.17      0.25        70
           3       0.26      0.17      0.20        53
           4       0.34      0.54      0.42        78
           5       0.51      0.32      0.39        93
           6       0.64      0.73      0.68        48
           7       0.62      0.83      0.71       127
           8       0.54      0.27      0.36        70
           9       0.22      0.30      0.25       132
          10       0.35      0.35      0.35       132
          11       0.33      0.39      0.36        69
          12       0.40      0.56      0.47       139
          13       0.44      0.39      0.41       149
          14       0.65      0.57      0.61        83
          15       0.34      0.33      0.34        82
          16       0.00      0.00      0.00        66
          17       0.33      0.45      0.38       105
          18       0.20      0.16      0.18        77
          19       0.29      0.28      0.28        71
          20       0.53      0.60      0.56        57
          21       0.28      0.37      0.32        76
          22       0.24      0.33      0.28       112
          23       0.62      0.53      0.57       118
          24       0.48      0.22      0.30        55
          25       0.37      0.42      0.39        67
          26       0.40      0.34      0.37        67

    accuracy                           0.39      2367
   macro avg       0.39      0.38      0.37      2367
weighted avg       0.39      0.39      0.38      2367
</pre>

### DeepCNN

### GAN

### Merge layers

#### CNN & MLP
<pre>
Model: "model_27"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_27 (InputLayer)           [(None, 48, 128, 1)] 0                                            
__________________________________________________________________________________________________
conv2d_16 (Conv2D)              (None, 46, 126, 16)  160         input_27[0][0]                   
__________________________________________________________________________________________________
batch_normalization_16 (BatchNo (None, 46, 126, 16)  64          conv2d_16[0][0]                  
__________________________________________________________________________________________________
max_pooling2d_16 (MaxPooling2D) (None, 23, 63, 16)   0           batch_normalization_16[0][0]     
__________________________________________________________________________________________________
conv2d_17 (Conv2D)              (None, 21, 61, 32)   4640        max_pooling2d_16[0][0]           
__________________________________________________________________________________________________
batch_normalization_17 (BatchNo (None, 21, 61, 32)   128         conv2d_17[0][0]                  
__________________________________________________________________________________________________
max_pooling2d_17 (MaxPooling2D) (None, 10, 30, 32)   0           batch_normalization_17[0][0]     
__________________________________________________________________________________________________
conv2d_18 (Conv2D)              (None, 8, 28, 64)    18496       max_pooling2d_17[0][0]           
__________________________________________________________________________________________________
batch_normalization_18 (BatchNo (None, 8, 28, 64)    256         conv2d_18[0][0]                  
__________________________________________________________________________________________________
max_pooling2d_18 (MaxPooling2D) (None, 4, 14, 64)    0           batch_normalization_18[0][0]     
__________________________________________________________________________________________________
conv2d_19 (Conv2D)              (None, 2, 12, 128)   73856       max_pooling2d_18[0][0]           
__________________________________________________________________________________________________
batch_normalization_19 (BatchNo (None, 2, 12, 128)   512         conv2d_19[0][0]                  
__________________________________________________________________________________________________
max_pooling2d_19 (MaxPooling2D) (None, 1, 6, 128)    0           batch_normalization_19[0][0]     
__________________________________________________________________________________________________
global_average_pooling2d_4 (Glo (None, 128)          0           max_pooling2d_19[0][0]           
__________________________________________________________________________________________________
dense_74 (Dense)                (None, 512)          66048       global_average_pooling2d_4[0][0] 
__________________________________________________________________________________________________
dropout_41 (Dropout)            (None, 512)          0           dense_74[0][0]                   
__________________________________________________________________________________________________
input_28 (InputLayer)           [(None, 15)]         0                                            
__________________________________________________________________________________________________
dense_75 (Dense)                (None, 512)          262656      dropout_41[0][0]                 
__________________________________________________________________________________________________
dense_77 (Dense)                (None, 64)           1024        input_28[0][0]                   
__________________________________________________________________________________________________
dropout_42 (Dropout)            (None, 512)          0           dense_75[0][0]                   
__________________________________________________________________________________________________
dense_78 (Dense)                (None, 64)           4160        dense_77[0][0]                   
__________________________________________________________________________________________________
dense_76 (Dense)                (None, 128)          65664       dropout_42[0][0]                 
__________________________________________________________________________________________________
dense_79 (Dense)                (None, 64)           4160        dense_78[0][0]                   
__________________________________________________________________________________________________
dropout_43 (Dropout)            (None, 128)          0           dense_76[0][0]                   
__________________________________________________________________________________________________
dropout_44 (Dropout)            (None, 64)           0           dense_79[0][0]                   
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 192)          0           dropout_43[0][0]                 
                                                                 dropout_44[0][0]                 
__________________________________________________________________________________________________
dense_80 (Dense)                (None, 32)           6176        concatenate_1[0][0]              
__________________________________________________________________________________________________
dropout_45 (Dropout)            (None, 32)           0           dense_80[0][0]                   
__________________________________________________________________________________________________
dense_81 (Dense)                (None, 27)           891         dropout_45[0][0]                 
==================================================================================================
Total params: 508,891
Trainable params: 508,411
Non-trainable params: 480
__________________________________________________________________________________________________
</pre>

**Classification report**
<pre style="height:550px;overflow-y:scroll">
              precision    recall  f1-score   support

           0       0.62      0.77      0.69        70
           1       0.82      0.78      0.80       101
           2       0.64      0.69      0.66        70
           3       0.80      0.68      0.73        53
           4       0.84      0.72      0.77        78
           5       0.77      0.81      0.79        93
           6       0.88      0.88      0.88        48
           7       0.84      0.81      0.83       127
           8       0.62      0.57      0.59        70
           9       0.74      0.73      0.74       132
          10       0.76      0.86      0.81       132
          11       0.81      0.70      0.75        69
          12       0.89      0.90      0.90       139
          13       0.73      0.72      0.72       149
          14       0.81      0.76      0.78        83
          15       0.71      0.67      0.69        82
          16       0.67      0.53      0.59        66
          17       0.73      0.78      0.76       105
          18       0.66      0.68      0.67        77
          19       0.72      0.66      0.69        71
          20       0.75      0.82      0.78        57
          21       0.71      0.80      0.75        76
          22       0.56      0.71      0.62       112
          23       0.75      0.67      0.71       118
          24       0.73      0.60      0.66        55
          25       0.66      0.67      0.67        67
          26       0.71      0.75      0.73        67

    accuracy                           0.74      2367
   macro avg       0.74      0.73      0.73      2367
weighted avg       0.74      0.74      0.74      2367
</pre>

#### CNN & LSTM
<pre style="height:600px;overflow-y:scroll">
Model: "model_33"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_31 (InputLayer)           [(None, 48, 128, 1)] 0                                            
__________________________________________________________________________________________________
conv2d_24 (Conv2D)              (None, 46, 126, 16)  160         input_31[0][0]                   
__________________________________________________________________________________________________
batch_normalization_24 (BatchNo (None, 46, 126, 16)  64          conv2d_24[0][0]                  
__________________________________________________________________________________________________
max_pooling2d_24 (MaxPooling2D) (None, 23, 63, 16)   0           batch_normalization_24[0][0]     
__________________________________________________________________________________________________
conv2d_25 (Conv2D)              (None, 21, 61, 32)   4640        max_pooling2d_24[0][0]           
__________________________________________________________________________________________________
batch_normalization_25 (BatchNo (None, 21, 61, 32)   128         conv2d_25[0][0]                  
__________________________________________________________________________________________________
max_pooling2d_25 (MaxPooling2D) (None, 10, 30, 32)   0           batch_normalization_25[0][0]     
__________________________________________________________________________________________________
conv2d_26 (Conv2D)              (None, 8, 28, 64)    18496       max_pooling2d_25[0][0]           
__________________________________________________________________________________________________
input_32 (InputLayer)           [(None, 48, 128)]    0                                            
__________________________________________________________________________________________________
batch_normalization_26 (BatchNo (None, 8, 28, 64)    256         conv2d_26[0][0]                  
__________________________________________________________________________________________________
lstm_73 (LSTM)                  (None, 48, 36)       23760       input_32[0][0]                   
__________________________________________________________________________________________________
max_pooling2d_26 (MaxPooling2D) (None, 4, 14, 64)    0           batch_normalization_26[0][0]     
__________________________________________________________________________________________________
lstm_74 (LSTM)                  (None, 48, 32)       8832        lstm_73[0][0]                    
__________________________________________________________________________________________________
conv2d_27 (Conv2D)              (None, 2, 12, 128)   73856       max_pooling2d_26[0][0]           
__________________________________________________________________________________________________
lstm_75 (LSTM)                  (None, 48, 28)       6832        lstm_74[0][0]                    
__________________________________________________________________________________________________
batch_normalization_27 (BatchNo (None, 2, 12, 128)   512         conv2d_27[0][0]                  
__________________________________________________________________________________________________
lstm_76 (LSTM)                  (None, 48, 24)       5088        lstm_75[0][0]                    
__________________________________________________________________________________________________
max_pooling2d_27 (MaxPooling2D) (None, 1, 6, 128)    0           batch_normalization_27[0][0]     
__________________________________________________________________________________________________
lstm_77 (LSTM)                  (None, 48, 20)       3600        lstm_76[0][0]                    
__________________________________________________________________________________________________
global_average_pooling2d_6 (Glo (None, 128)          0           max_pooling2d_27[0][0]           
__________________________________________________________________________________________________
lstm_78 (LSTM)                  (None, 16)           2368        lstm_77[0][0]                    
__________________________________________________________________________________________________
dense_90 (Dense)                (None, 512)          66048       global_average_pooling2d_6[0][0] 
__________________________________________________________________________________________________
dense_93 (Dense)                (None, 64)           1088        lstm_78[0][0]                    
__________________________________________________________________________________________________
dropout_53 (Dropout)            (None, 512)          0           dense_90[0][0]                   
__________________________________________________________________________________________________
dropout_56 (Dropout)            (None, 64)           0           dense_93[0][0]                   
__________________________________________________________________________________________________
dense_91 (Dense)                (None, 512)          262656      dropout_53[0][0]                 
__________________________________________________________________________________________________
dense_94 (Dense)                (None, 64)           4160        dropout_56[0][0]                 
__________________________________________________________________________________________________
dropout_54 (Dropout)            (None, 512)          0           dense_91[0][0]                   
__________________________________________________________________________________________________
dropout_57 (Dropout)            (None, 64)           0           dense_94[0][0]                   
__________________________________________________________________________________________________
dense_92 (Dense)                (None, 128)          65664       dropout_54[0][0]                 
__________________________________________________________________________________________________
dense_95 (Dense)                (None, 32)           2080        dropout_57[0][0]                 
__________________________________________________________________________________________________
dropout_55 (Dropout)            (None, 128)          0           dense_92[0][0]                   
__________________________________________________________________________________________________
dropout_58 (Dropout)            (None, 32)           0           dense_95[0][0]                   
__________________________________________________________________________________________________
concatenate_3 (Concatenate)     (None, 160)          0           dropout_55[0][0]                 
                                                                 dropout_58[0][0]                 
__________________________________________________________________________________________________
dense_97 (Dense)                (None, 64)           10304       concatenate_3[0][0]              
__________________________________________________________________________________________________
dense_98 (Dense)                (None, 27)           1755        dense_97[0][0]                   
==================================================================================================
Total params: 562,347
Trainable params: 561,867
Non-trainable params: 480
__________________________________________________________________________________________________
</pre>

**Classification report**
<pre style="height:550px;overflow-y:scroll">
              precision    recall  f1-score   support

           0       0.66      0.80      0.72        70
           1       0.75      0.70      0.72       101
           2       0.63      0.64      0.64        70
           3       0.62      0.57      0.59        53
           4       0.79      0.76      0.77        78
           5       0.81      0.69      0.74        93
           6       0.86      0.90      0.88        48
           7       0.75      0.78      0.76       127
           8       0.64      0.59      0.61        70
           9       0.54      0.51      0.52       132
          10       0.75      0.79      0.77       132
          11       0.75      0.71      0.73        69
          12       0.84      0.83      0.84       139
          13       0.61      0.64      0.62       149
          14       0.72      0.70      0.71        83
          15       0.65      0.67      0.66        82
          16       0.53      0.45      0.49        66
          17       0.75      0.88      0.81       105
          18       0.56      0.66      0.61        77
          19       0.72      0.66      0.69        71
          20       0.67      0.81      0.73        57
          21       0.77      0.70      0.73        76
          22       0.48      0.60      0.53       112
          23       0.76      0.63      0.69       118
          24       0.69      0.53      0.60        55
          25       0.69      0.60      0.64        67
          26       0.68      0.72      0.70        67

    accuracy                           0.69      2367
   macro avg       0.69      0.68      0.69      2367
weighted avg       0.69      0.69      0.69      2367
</pre>