# Classifying Urban sounds using Deep Learning

## 3 Model Training and Evaluation 

### Load Preprocessed data 

In [1]:
# retrieve the preprocessed data from previous notebook

%store -r x_train 
%store -r x_test 
%store -r y_train 
%store -r y_test 
%store -r yy 
%store -r le

### Initial model architecture - MLP

We will start with constructing a Multilayer Perceptron (MLP) Neural Network using Keras and a Tensorflow backend. 

Starting with a `sequential` model so we can build the model layer by layer. 

We will begin with a simple model architecture, consisting of three layers, an input layer, a hidden layer and an output layer. All three layers will be of the `dense` layer type which is a standard layer type that is used in many cases for neural networks. 

The first layer will receive the input shape. As each sample contains 40 MFCCs (or columns) we have a shape of (1x40) this means we will start with an input shape of 40. 

The first two layers will have 256 nodes. The activation function we will be using for our first 2 layers is the `ReLU`, or `Rectified Linear Activation`. This activation function has been proven to work well in neural networks.

We will also apply a `Dropout` value of 50% on our first two layers. This will randomly exclude nodes from each update cycle which in turn results in a network that is capable of better generalisation and is less likely to overfit the training data.

Our output layer will have 10 nodes (num_labels) which matches the number of possible classifications. The activation is for our output layer is `softmax`. Softmax makes the output sum up to 1 so the output can be interpreted as probabilities. The model will then make its prediction based on which option has the highest probability.

In [2]:
pip install keras


Note: you may need to restart the kernel to use updated packages.


In [None]:
pip install tensorflow

In [3]:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.optimizers import Adam
from keras.utils import np_utils
from sklearn import metrics 

num_labels = yy.shape[1]
filter_size = 2

# Construct model 
model = Sequential()

model.add(Dense(256, input_shape=(40,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(num_labels))
model.add(Activation('softmax'))

Using TensorFlow backend.
W0726 15:11:53.761097 14244 deprecation_wrapper.py:119] From C:\Users\steph\Anaconda\lib\site-packages\keras\backend\tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0726 15:11:53.848887 14244 deprecation_wrapper.py:119] From C:\Users\steph\Anaconda\lib\site-packages\keras\backend\tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0726 15:11:53.875674 14244 deprecation_wrapper.py:119] From C:\Users\steph\Anaconda\lib\site-packages\keras\backend\tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0726 15:11:53.938665 14244 deprecation_wrapper.py:119] From C:\Users\steph\Anaconda\lib\site-packages\keras\backend\tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

W0726 15:11:53.974375 14

### Compiling the model 

For compiling our model, we will use the following three parameters: 

* Loss function - we will use `categorical_crossentropy`. This is the most common choice for classification. A lower score indicates that the model is performing better.

* Metrics - we will use the `accuracy` metric which will allow us to view the accuracy score on the validation data when we train the model. 

* Optimizer - here we will use `adam` which is a generally good optimizer for many use cases.


In [4]:
# Compile the model
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam') 

W0726 15:12:06.704394 14244 deprecation_wrapper.py:119] From C:\Users\steph\Anaconda\lib\site-packages\keras\optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0726 15:12:06.754489 14244 deprecation_wrapper.py:119] From C:\Users\steph\Anaconda\lib\site-packages\keras\backend\tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.



In [5]:
# Display model architecture summary 
model.summary()

# Calculate pre-training accuracy 
score = model.evaluate(x_test, y_test, verbose=0)
accuracy = 100*score[1]

print("Pre-training accuracy: %.4f%%" % accuracy)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 256)               10496     
_________________________________________________________________
activation_1 (Activation)    (None, 256)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 256)               65792     
_________________________________________________________________
activation_2 (Activation)    (None, 256)               0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                2570      
__________

### Training 

Here we will train the model. 

We will start with 100 epochs which is the number of times the model will cycle through the data. The model will improve on each cycle until it reaches a certain point. 

We will also start with a low batch size, as having a large batch size can reduce the generalisation ability of the model. 

In [6]:
from keras.callbacks import ModelCheckpoint 
from datetime import datetime 

num_epochs = 100
num_batch_size = 32

checkpointer = ModelCheckpoint(filepath='saved_models/weights.best.basic_mlp.hdf5', 
                               verbose=1, save_best_only=True)
start = datetime.now()

model.fit(x_train, y_train, batch_size=num_batch_size, epochs=num_epochs, validation_data=(x_test, y_test), callbacks=[checkpointer], verbose=1)


duration = datetime.now() - start
print("Training completed in time: ", duration)

W0726 15:12:31.569077 14244 deprecation.py:323] From C:\Users\steph\Anaconda\lib\site-packages\tensorflow\python\ops\math_grad.py:1250: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Train on 6985 samples, validate on 1747 samples
Epoch 1/100

Epoch 00001: val_loss improved from inf to 7.57571, saving model to saved_models/weights.best.basic_mlp.hdf5
Epoch 2/100

Epoch 00002: val_loss improved from 7.57571 to 1.64430, saving model to saved_models/weights.best.basic_mlp.hdf5
Epoch 3/100

Epoch 00003: val_loss improved from 1.64430 to 1.56790, saving model to saved_models/weights.best.basic_mlp.hdf5
Epoch 4/100

Epoch 00004: val_loss improved from 1.56790 to 1.36272, saving model to saved_models/weights.best.basic_mlp.hdf5
Epoch 5/100

Epoch 00005: val_loss improved from 1.36272 to 1.21658, saving model to saved_models/weights.best.basic_mlp.hdf5
Epoch 6/100

Epoch 00006: val_loss improved from 1.21658 to 1.18505, saving model to saved_models/weights.best.basic_mlp.hdf5
Epoch 7/100

Epoch 00007: val_loss improved from 1.18505 to 1.13575, saving model to saved_models/weights.best.basic_mlp.hdf5
Epoch 8/100

Epoch 00008: val_loss improved from 1.13575 to 1.03905, savin

Epoch 35/100

Epoch 00035: val_loss did not improve from 0.53865
Epoch 36/100

Epoch 00036: val_loss improved from 0.53865 to 0.52701, saving model to saved_models/weights.best.basic_mlp.hdf5
Epoch 37/100

Epoch 00037: val_loss did not improve from 0.52701
Epoch 38/100

Epoch 00038: val_loss did not improve from 0.52701
Epoch 39/100

Epoch 00039: val_loss did not improve from 0.52701
Epoch 40/100

Epoch 00040: val_loss improved from 0.52701 to 0.51798, saving model to saved_models/weights.best.basic_mlp.hdf5
Epoch 41/100

Epoch 00041: val_loss improved from 0.51798 to 0.51137, saving model to saved_models/weights.best.basic_mlp.hdf5
Epoch 42/100

Epoch 00042: val_loss did not improve from 0.51137
Epoch 43/100

Epoch 00043: val_loss did not improve from 0.51137
Epoch 44/100

Epoch 00044: val_loss improved from 0.51137 to 0.50322, saving model to saved_models/weights.best.basic_mlp.hdf5
Epoch 45/100

Epoch 00045: val_loss improved from 0.50322 to 0.48476, saving model to saved_models/wei

Epoch 75/100

Epoch 00075: val_loss did not improve from 0.43385
Epoch 76/100

Epoch 00076: val_loss did not improve from 0.43385
Epoch 77/100

Epoch 00077: val_loss did not improve from 0.43385
Epoch 78/100

Epoch 00078: val_loss did not improve from 0.43385
Epoch 79/100

Epoch 00079: val_loss did not improve from 0.43385
Epoch 80/100

Epoch 00080: val_loss did not improve from 0.43385
Epoch 81/100

Epoch 00081: val_loss improved from 0.43385 to 0.42935, saving model to saved_models/weights.best.basic_mlp.hdf5
Epoch 82/100

Epoch 00082: val_loss did not improve from 0.42935
Epoch 83/100

Epoch 00083: val_loss did not improve from 0.42935
Epoch 84/100

Epoch 00084: val_loss did not improve from 0.42935
Epoch 85/100

Epoch 00085: val_loss did not improve from 0.42935
Epoch 86/100

Epoch 00086: val_loss did not improve from 0.42935
Epoch 87/100

Epoch 00087: val_loss did not improve from 0.42935
Epoch 88/100

Epoch 00088: val_loss did not improve from 0.42935
Epoch 89/100

Epoch 00089: v

### Test the model 

Here we will review the accuracy of the model on both the training and test data sets. 

In [7]:
# Evaluating the model on the training and testing set
score = model.evaluate(x_train, y_train, verbose=0)
print("Training Accuracy: ", score[1])

score = model.evaluate(x_test, y_test, verbose=0)
print("Testing Accuracy: ", score[1])

Training Accuracy:  0.9224051539012169
Testing Accuracy:  0.8706353747578573


The initial Training and Testing accuracy scores are quite high. As there is not a great difference between the Training and Test scores (~5%) this suggests that the model has not suffered from overfitting. 

### Predictions  

Here we will build a method which will allow us to test the models predictions on a specified audio .wav file. 

In [8]:
import librosa 
import numpy as np 

def extract_feature(file_name):
   
    try:
        audio_data, sample_rate = librosa.load(file_name, res_type='kaiser_fast') 
        mfccs = librosa.feature.mfcc(y=audio_data, sr=sample_rate, n_mfcc=40)
        mfccsscaled = np.mean(mfccs.T,axis=0)
        
    except Exception as e:
        print("Error encountered while parsing file: ", file)
        return None, None

    return np.array([mfccsscaled])


In [9]:
def print_prediction(file_name):
    prediction_feature = extract_feature(file_name) 

    predicted_vector = model.predict_classes(prediction_feature)
    predicted_class = le.inverse_transform(predicted_vector) 
    print("The predicted class is:", predicted_class[0], '\n') 

    predicted_proba_vector = model.predict_proba(prediction_feature) 
    predicted_proba = predicted_proba_vector[0]
    for i in range(len(predicted_proba)): 
        category = le.inverse_transform(np.array([i]))
        print(category[0], "\t\t : ", format(predicted_proba[i], '.32f') )

### Validation 

#### Test with sample data 

Initial sainity check to verify the predictions using a subsection of the sample audio files we explored in the first notebook. We expect the bulk of these to be classified correctly. 

In [10]:
# Class: Air Conditioner

filename = '../UrbanSound Dataset sample/audio/100852-0-0-0.wav' 
print_prediction(filename) 

The predicted class is: air_conditioner 

air_conditioner 		 :  1.00000000000000000000000000000000
car_horn 		 :  0.00000000182134674009404307071236
children_playing 		 :  0.00000000041144126994296925659000
dog_bark 		 :  0.00000000061322580346967470177333
drilling 		 :  0.00000000941181177438465965678915
engine_idling 		 :  0.00000000507990716158701616222970
gun_shot 		 :  0.00000002270654420044593280181289
jackhammer 		 :  0.00000000630761709530247571819928
siren 		 :  0.00000000000407049394191005831090
street_music 		 :  0.00000000180346781952778201230103


In [11]:
# Class: Drilling

filename = '../UrbanSound Dataset sample/audio/103199-4-0-0.wav'
print_prediction(filename) 

The predicted class is: drilling 

air_conditioner 		 :  0.00000001935946869480176246725023
car_horn 		 :  0.00000218180412048241123557090759
children_playing 		 :  0.00019900461484212428331375122070
dog_bark 		 :  0.00013779639266431331634521484375
drilling 		 :  0.96273481845855712890625000000000
engine_idling 		 :  0.00000000027262900270663692481321
gun_shot 		 :  0.00000028573904842232877854257822
jackhammer 		 :  0.00000000101177721756329219715553
siren 		 :  0.00000013746632987476914422586560
street_music 		 :  0.03692576289176940917968750000000


In [12]:
# Class: Street music 

filename = '../UrbanSound Dataset sample/audio/101848-9-0-0.wav'
print_prediction(filename) 

The predicted class is: street_music 

air_conditioner 		 :  0.10584083944559097290039062500000
car_horn 		 :  0.00018762476975098252296447753906
children_playing 		 :  0.13765451312065124511718750000000
dog_bark 		 :  0.00338348606601357460021972656250
drilling 		 :  0.00505320308730006217956542968750
engine_idling 		 :  0.00413546431809663772583007812500
gun_shot 		 :  0.00065159739460796117782592773438
jackhammer 		 :  0.06000141054391860961914062500000
siren 		 :  0.00078313960693776607513427734375
street_music 		 :  0.68230873346328735351562500000000


In [13]:
# Class: Car Horn 

filename = '../UrbanSound Dataset sample/audio/100648-1-0-0.wav'
print_prediction(filename) 

The predicted class is: car_horn 

air_conditioner 		 :  0.00003582690260373055934906005859
car_horn 		 :  0.69806337356567382812500000000000
children_playing 		 :  0.00917035806924104690551757812500
dog_bark 		 :  0.10445753484964370727539062500000
drilling 		 :  0.04047197476029396057128906250000
engine_idling 		 :  0.00021504548203665763139724731445
gun_shot 		 :  0.00431029032915830612182617187500
jackhammer 		 :  0.00004849451215704903006553649902
siren 		 :  0.00646105315536260604858398437500
street_music 		 :  0.13676603138446807861328125000000


#### Observations 

From this brief sanity check the model seems to predict well. One errror was observed whereby a car horn was incorrectly classifed as a dog bark. 

We can see from the per class confidence that this was quite a low score (43%). This allows follows our early observation that a dog bark and car horn are similar in spectral shape. 

### Other audio

Here we will use a sample of various copyright free sounds that we not part of either our test or training data to further validate our model. 

In [14]:
filename = '../Evaluation audio/dog_bark_1.wav'
print_prediction(filename) 

The predicted class is: street_music 

air_conditioner 		 :  0.05818969756364822387695312500000
car_horn 		 :  0.03274155780673027038574218750000
children_playing 		 :  0.05405837669968605041503906250000
dog_bark 		 :  0.20545622706413269042968750000000
drilling 		 :  0.07921151816844940185546875000000
engine_idling 		 :  0.04908233135938644409179687500000
gun_shot 		 :  0.11350615322589874267578125000000
jackhammer 		 :  0.00165486067999154329299926757812
siren 		 :  0.04985950142145156860351562500000
street_music 		 :  0.35623976588249206542968750000000


In [15]:
filename = '../Evaluation audio/drilling_1.wav'

print_prediction(filename) 

The predicted class is: jackhammer 

air_conditioner 		 :  0.07550714164972305297851562500000
car_horn 		 :  0.00001126918505178764462471008301
children_playing 		 :  0.00064183131325989961624145507812
dog_bark 		 :  0.00001621937917661853134632110596
drilling 		 :  0.06308306008577346801757812500000
engine_idling 		 :  0.00002552934893174096941947937012
gun_shot 		 :  0.00068845611531287431716918945312
jackhammer 		 :  0.85967862606048583984375000000000
siren 		 :  0.00001200279439217410981655120850
street_music 		 :  0.00033594679553061723709106445312


In [16]:
filename = '../Evaluation audio/gun_shot_1.wav'

print_prediction(filename) 

# sample data weighted towards gun shot - peak in the dog barking sample is simmilar in shape to the gun shot sample

The predicted class is: dog_bark 

air_conditioner 		 :  0.10839908570051193237304687500000
car_horn 		 :  0.00103239156305789947509765625000
children_playing 		 :  0.01060478202998638153076171875000
dog_bark 		 :  0.41838169097900390625000000000000
drilling 		 :  0.00799924135208129882812500000000
engine_idling 		 :  0.04500160366296768188476562500000
gun_shot 		 :  0.01085520256310701370239257812500
jackhammer 		 :  0.00033229021937586367130279541016
siren 		 :  0.01738170534372329711914062500000
street_music 		 :  0.38001206517219543457031250000000


In [17]:
filename = '../Evaluation audio/siren_1.wav'

print_prediction(filename) 

The predicted class is: siren 

air_conditioner 		 :  0.00000062581676729678292758762836
car_horn 		 :  0.00006905255577294155955314636230
children_playing 		 :  0.00083423301111906766891479492188
dog_bark 		 :  0.04389220848679542541503906250000
drilling 		 :  0.00003940669557778164744377136230
engine_idling 		 :  0.06394499540328979492187500000000
gun_shot 		 :  0.00084809947293251752853393554688
jackhammer 		 :  0.00214644591324031352996826171875
siren 		 :  0.87295448780059814453125000000000
street_music 		 :  0.01527053117752075195312500000000


#### Observations 

The performance of our initial model is satisfactorry and has generalised well, seeming to predict well when tested against new audio data. 

### *In the next notebook we will refine our model*