# Audio classification through image classification:
## Densely Connected Convolutional Networks

This notebook shows how to implement and train [Densely Connected Convolutional Networks]( https://arxiv.org/pdf/1608.06993.pdf) on our audio dataset.   

Advantage:

- The advantage of this method is that it does not require any audio file feature engineering and nor it relies on the text that is associated with each audio recording. This means it can be faster and much cheaper.

Disadvantage: 

- The accuracy on the current dataset is around 86% with overfitting, which is not unexpected since we have a very small dataset.    
- Long tanning time with lots of learning rate annealing. 

In [1]:
%matplotlib inline
import importlib
import utils2; importlib.reload(utils2)
from utils2 import *

Using TensorFlow backend.


## Reading the images :

In [2]:
current_dir = os.getcwd()

In [3]:
batches = image.ImageDataGenerator().flow_from_directory(current_dir+'/Data/maestroqa/train', 
                                                           target_size=(128,128),
                                                           class_mode=None, 
                                                           shuffle=False,
                                                           batch_size=1)
trn_data = np.concatenate([batches.next() for i in range(batches.samples)])

Found 4870 images belonging to 2 classes.


In [4]:
val_batches = image.ImageDataGenerator().flow_from_directory(current_dir+'/Data/maestroqa/valid', 
                                                           target_size=(128,128),
                                                           class_mode=None, 
                                                           shuffle=False,
                                                           batch_size=1)
val_data = np.concatenate([val_batches.next() for i in range(val_batches.samples)])

Found 800 images belonging to 2 classes.


In [5]:
# trn_labels_1 = load_array(current_dir+'/Data/crop_data/models/trn_labels.bc')
# val_labels_1 = load_array(current_dir+'/Data/crop_data/models/val_labels.bc')

In [6]:
# val_classes = val_batches.classes
# trn_classes = batches.classes

In [8]:
# val_classes.shape, val_data.shape, trn_classes.shape,trn_data.shape

In [9]:
# trn_data_1 = load_array(current_dir+'/Data/crop_data/models/train_data_uu.bc')
# val_data_1 = load_array(current_dir+'/Data/crop_data/models/valid_data_uu.bc')

trn_labels_1 = load_array(current_dir+'/Data/crop_data/models/trn_labels_uu.bc')
val_labels_1 = load_array(current_dir+'/Data/crop_data/models/val_labels_uu.bc')

# print (trn_data_1.shape,val_data_1.shape, trn_labels_1.shape, val_labels_1.shape )

In [8]:
trn_data = trn_data/255.
val_data = val_data/255.

## Densenet

### The pieces

In [10]:
def relu(x): return Activation('relu')(x)
def dropout(x, p): return Dropout(p)(x) if p else x
def bn(x): return BatchNormalization(mode=0, axis=-1)(x)
def relu_bn(x): return relu(bn(x))

In [11]:
def conv(x, nf, sz, wd, p):
    x = Convolution2D(nf, sz, sz, init='he_uniform', border_mode='same', 
                          W_regularizer=l2(wd))(x)
    return dropout(x,p)

In [12]:
def conv_block(x, nf, bottleneck=False, p=None, wd=0):
    x = relu_bn(x)
    if bottleneck: x = relu_bn(conv(x, nf * 4, 1, wd, p))
    return conv(x, nf, 3, wd, p)

In [13]:
def dense_block(x, nb_layers, growth_rate, bottleneck=False, p=None, wd=0):
    if bottleneck: nb_layers //= 2
    for i in range(nb_layers):
        b = conv_block(x, growth_rate, bottleneck=bottleneck, p=p, wd=wd)
        x = merge([x,b], mode='concat', concat_axis=-1)
    return x

In [14]:
def transition_block(x, compression=1.0, p=None, wd=0):
    nf = int(x.get_shape().as_list()[-1] * compression)
    x = relu_bn(x)
    x = conv(x, nf, 1, wd, p)
    return AveragePooling2D((2, 2), strides=(2, 2))(x)

In [15]:
def create_dense_net(nb_classes, img_input, depth=40, nb_block=3, 
     growth_rate=12, nb_filter=16, bottleneck=False, compression=1.0, p=None, wd=0, activation='softmax'):
    
    assert activation == 'softmax' or activation == 'sigmoid'
    assert (depth - 4) % nb_block == 0
    nb_layers_per_block = int((depth - 4) / nb_block)
    nb_layers = [nb_layers_per_block] * nb_block

    x = conv(img_input, nb_filter, 3, wd, 0)
    for i,block in enumerate(nb_layers):
        x = dense_block(x, block, growth_rate, bottleneck=bottleneck, p=p, wd=wd)
        if i != len(nb_layers)-1:
            x = transition_block(x, compression=compression, p=p, wd=wd)

    x = relu_bn(x)
    x = GlobalAveragePooling2D()(x)
    return Dense(nb_classes, activation=activation, W_regularizer=l2(wd))(x)

In [25]:
input_shape = (128,128, 3)

In [26]:
img_input = Input(shape=input_shape)

In [27]:
x = create_dense_net(2, img_input, depth=100, nb_filter=16, compression=0.5, bottleneck=True, p=0.2, wd=1e-4)

  app.launch_new_instance()
  app.launch_new_instance()
  app.launch_new_instance()
  app.launch_new_instance()
  name=name)
  app.launch_new_instance()
  app.launch_new_instance()


In [28]:
model = Model(img_input, x)

In [29]:
model.compile(loss='categorical_crossentropy', 
      optimizer=keras.optimizers.SGD(0.001, 0.9, nesterov=True), metrics=["accuracy"])

In [30]:
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            (None, 128, 128, 3)  0                                            
__________________________________________________________________________________________________
conv2d_100 (Conv2D)             (None, 128, 128, 16) 448         input_2[0][0]                    
__________________________________________________________________________________________________
batch_normalization_100 (BatchN (None, 128, 128, 16) 64          conv2d_100[0][0]                 
__________________________________________________________________________________________________
activation_100 (Activation)     (None, 128, 128, 16) 0           batch_normalization_100[0][0]    
__________________________________________________________________________________________________
conv2d_101

In [31]:
parms = {'verbose': 2, 'callbacks': [TQDMNotebookCallback()]}

In [32]:
K.set_value(model.optimizer.lr, 0.00001)

In [33]:
model.fit(trn_data, 
          trn_labels_1, 
          batch_size = 8, 
          epochs = 20, 
          validation_data=(val_data, val_labels_1),
          **parms)

Train on 4870 samples, validate on 800 samples
Epoch 1/20
 - 306s - loss: 1.3510 - acc: 0.5012 - val_loss: 1.3300 - val_acc: 0.5050
Epoch 2/20
4864/|/[loss: 1.320, acc: 0.509] 100%|| 4864/4870 [04:54<00:00, 17.36it/s] - 296s - loss: 1.3199 - acc: 0.5097 - val_loss: 1.3209 - val_acc: 0.5112
Epoch 3/20
 - 296s - loss: 1.3181 - acc: 0.5544 - val_loss: 1.3193 - val_acc: 0.5138
Epoch 4/20
4864/|/[loss: 1.318, acc: 0.573] 100%|| 4864/4870 [04:53<00:00, 17.24it/s] - 296s - loss: 1.3178 - acc: 0.5733 - val_loss: 1.3186 - val_acc: 0.5150
Epoch 5/20
 - 297s - loss: 1.3174 - acc: 0.5729 - val_loss: 1.3173 - val_acc: 0.5212
Epoch 6/20
 - 297s - loss: 1.3168 - acc: 0.5809 - val_loss: 1.3168 - val_acc: 0.4925
Epoch 7/20
4864/|/[loss: 1.317, acc: 0.581] 100%|| 4864/4870 [04:54<00:00, 17.24it/s] - 297s - loss: 1.3171 - acc: 0.5811 - val_loss: 1.3163 - val_acc: 0.4925
Epoch 8/20
 - 296s - loss: 1.3166 - acc: 0.5776 - val_loss: 1.3154 - val_acc: 0.4938
Epoch 9/20
4864/|/[loss: 1.317, acc: 0.567] 100%|| 

<keras.callbacks.History at 0x7ef4cee75e10>

In [34]:
K.set_value(model.optimizer.lr, 0.001)
model.fit(trn_data, 
          trn_labels_1, 
          batch_size = 8, 
          epochs = 20, 
          validation_data=(val_data, val_labels_1),
          **parms)

Train on 4870 samples, validate on 800 samples
Epoch 1/20
 - 296s - loss: 1.3130 - acc: 0.5612 - val_loss: 1.3186 - val_acc: 0.5713
Epoch 2/20
4864/|/[loss: 1.298, acc: 0.574] 100%|| 4864/4870 [04:54<00:00, 17.14it/s] - 297s - loss: 1.2978 - acc: 0.5745 - val_loss: 1.3149 - val_acc: 0.5787
Epoch 3/20
 - 297s - loss: 1.2852 - acc: 0.5916 - val_loss: 1.4422 - val_acc: 0.5325
Epoch 4/20
 - 297s - loss: 1.2805 - acc: 0.5914 - val_loss: 1.3629 - val_acc: 0.5625
Epoch 5/20
4864/|/[loss: 1.271, acc: 0.602] 100%|| 4864/4870 [04:54<00:00, 17.09it/s] - 297s - loss: 1.2712 - acc: 0.6025 - val_loss: 1.4226 - val_acc: 0.5475
Epoch 6/20
 - 297s - loss: 1.2664 - acc: 0.6160 - val_loss: 1.3119 - val_acc: 0.6162
Epoch 7/20
 - 296s - loss: 1.2537 - acc: 0.6236 - val_loss: 1.3710 - val_acc: 0.6000
Epoch 8/20
4864/|/[loss: 1.242, acc: 0.643] 100%|| 4864/4870 [04:54<00:00, 17.17it/s] - 296s - loss: 1.2416 - acc: 0.6431 - val_loss: 1.3501 - val_acc: 0.6175
Epoch 9/20
 - 297s - loss: 1.2365 - acc: 0.6452 - v

<keras.callbacks.History at 0x7ef4abb46828>

In [35]:
K.set_value(model.optimizer.lr, 0.0001)
model.fit(trn_data, 
          trn_labels_1, 
          batch_size = 8, 
          epochs = 20, 
          validation_data=(val_data, val_labels_1),
          **parms)

Train on 4870 samples, validate on 800 samples
Epoch 1/20
 - 297s - loss: 0.9262 - acc: 0.8678 - val_loss: 2.0112 - val_acc: 0.5188
Epoch 2/20
4864/|/[loss: 0.915, acc: 0.879] 100%|| 4864/4870 [04:53<00:00, 17.22it/s] - 297s - loss: 0.9148 - acc: 0.8791 - val_loss: 1.8506 - val_acc: 0.5275
Epoch 3/20
4864/|/[loss: 0.898, acc: 0.894] 100%|| 4864/4870 [04:56<00:00, 17.02it/s] - 297s - loss: 0.8978 - acc: 0.8940 - val_loss: 2.0132 - val_acc: 0.5100
Epoch 4/20
 - 297s - loss: 0.9083 - acc: 0.8797 - val_loss: 1.8734 - val_acc: 0.5262
Epoch 5/20
4864/|/[loss: 0.902, acc: 0.880] 100%|| 4864/4870 [04:53<00:00, 17.24it/s] - 297s - loss: 0.9024 - acc: 0.8805 - val_loss: 1.9498 - val_acc: 0.5200
Epoch 6/20
 - 297s - loss: 0.8822 - acc: 0.8947 - val_loss: 1.8458 - val_acc: 0.5225
Epoch 7/20
 - 297s - loss: 0.8981 - acc: 0.8828 - val_loss: 1.7298 - val_acc: 0.5525
Epoch 8/20
4864/|/[loss: 0.895, acc: 0.886] 100%|| 4864/4870 [04:54<00:00, 16.90it/s] - 297s - loss: 0.8948 - acc: 0.8864 - val_loss: 1.

<keras.callbacks.History at 0x7ef4ab9ecd30>

In [36]:
K.set_value(model.optimizer.lr, 0.001)
model.fit(trn_data, 
          trn_labels_1, 
          batch_size = 16, 
          epochs = 20, 
          validation_data=(val_data, val_labels_1),
          **parms)

Train on 4870 samples, validate on 800 samples
Epoch 1/20
4864/|/[loss: 0.839, acc: 0.916] 100%|| 4864/4870 [04:30<00:00, 18.80it/s] - 273s - loss: 0.8388 - acc: 0.9162 - val_loss: 1.8378 - val_acc: 0.5250
Epoch 2/20
 - 273s - loss: 0.8193 - acc: 0.9201 - val_loss: 2.3030 - val_acc: 0.5075
Epoch 3/20
 - 273s - loss: 0.8112 - acc: 0.9193 - val_loss: 1.3588 - val_acc: 0.6613
Epoch 4/20
4864/|/[loss: 0.790, acc: 0.930] 100%|| 4864/4870 [04:31<00:00, 18.65it/s] - 273s - loss: 0.7901 - acc: 0.9296 - val_loss: 1.2102 - val_acc: 0.6750
Epoch 5/20
 - 273s - loss: 0.7801 - acc: 0.9390 - val_loss: 1.6928 - val_acc: 0.5425
Epoch 6/20
 - 274s - loss: 0.7696 - acc: 0.9462 - val_loss: 3.2443 - val_acc: 0.5000
Epoch 7/20
4864/|/[loss: 0.763, acc: 0.939] 100%|| 4864/4870 [04:32<00:00, 18.67it/s] - 273s - loss: 0.7631 - acc: 0.9386 - val_loss: 1.6343 - val_acc: 0.5725
Epoch 8/20
 - 273s - loss: 0.7555 - acc: 0.9452 - val_loss: 1.3053 - val_acc: 0.6350
Epoch 9/20
 - 273s - loss: 0.7475 - acc: 0.9456 - v

<keras.callbacks.History at 0x7ef4ab94a7b8>

In [37]:
K.set_value(model.optimizer.lr, 0.0001)
model.fit(trn_data, 
          trn_labels_1, 
          batch_size = 8, 
          epochs = 20, 
          validation_data=(val_data, val_labels_1),
          **parms)

Train on 4870 samples, validate on 800 samples
Epoch 1/20
 - 297s - loss: 0.7189 - acc: 0.9577 - val_loss: 1.2898 - val_acc: 0.6462
Epoch 2/20
4864/|/[loss: 0.703, acc: 0.964] 100%|| 4864/4870 [04:53<00:00, 17.26it/s] - 296s - loss: 0.7031 - acc: 0.9639 - val_loss: 1.1475 - val_acc: 0.7175
Epoch 3/20
 - 297s - loss: 0.7077 - acc: 0.9618 - val_loss: 1.2388 - val_acc: 0.6813
Epoch 4/20
 - 297s - loss: 0.6875 - acc: 0.9674 - val_loss: 1.0301 - val_acc: 0.7850
Epoch 5/20
4864/|/[loss: 0.700, acc: 0.959] 100%|| 4864/4870 [04:54<00:00, 17.32it/s] - 297s - loss: 0.7002 - acc: 0.9595 - val_loss: 1.0654 - val_acc: 0.7600
Epoch 6/20
 - 297s - loss: 0.6952 - acc: 0.9624 - val_loss: 1.0085 - val_acc: 0.7863
Epoch 7/20
 - 297s - loss: 0.6846 - acc: 0.9713 - val_loss: 1.1108 - val_acc: 0.7350
Epoch 8/20
4864/|/[loss: 0.688, acc: 0.964] 100%|| 4864/4870 [04:55<00:00, 17.18it/s] - 297s - loss: 0.6875 - acc: 0.9641 - val_loss: 1.0658 - val_acc: 0.7550
Epoch 9/20
 - 297s - loss: 0.6873 - acc: 0.9667 - v

<keras.callbacks.History at 0x7ef4ab900cc0>