# Audio classification through image classification:

This notebook shows how to use a pre-trained resent with additional Fully Convolutional Net to predict which call is good or bad. 

Advantage:

- The advantage of this method is that it does not require any audio file feature engineering and nor it relies on the text that is associated with each audio recording. This means it can be faster and much cheaper.

<p>

- An impressive 98% accuracy can be achieved with this model. 

In [1]:
# Rather than importing everything manually, we'll make things easy
#   and load them all in utils.py, and just import them from there.
%matplotlib inline
import utils; 

import importlib
importlib.reload(utils)

from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

def save_array(fname, arr): 
    c=bcolz.carray(arr, rootdir=fname, mode='w'); c.flush()
def load_array(fname):
    return bcolz.open(fname)[:]

from utils import *
current_dir = os.getcwd()
print ('current folder: ',current_dir)
model_path = current_dir+'/Data/crop_data/models'
print ('model path: ',model_path)

Using cuDNN version 6021 on context None
Mapped name None to device cuda: Quadro P6000 (0000:03:00.0)
Using Theano backend.


current folder:  /home/sohrab/MaestroQA_ffast_ai
model path:  /home/sohrab/MaestroQA_ffast_ai/Data/crop_data/models


In [2]:
import resnet50
from resnet50 import Resnet50

## Reading images into numpy arrays:

In [25]:
batches = image.ImageDataGenerator().flow_from_directory(current_dir+'/Data/maestroqa/train', 
                                                           target_size=(512,683),
                                                           class_mode=None, 
                                                           shuffle=False,
                                                           batch_size=1)
trn_data = np.concatenate([batches.next() for i in range(batches.samples)])

Found 4870 images belonging to 2 classes.


In [26]:
val_batches = image.ImageDataGenerator().flow_from_directory(current_dir+'/Data/maestroqa/valid', 
                                                           target_size=(512,683),
                                                           class_mode=None, 
                                                           shuffle=False,
                                                           batch_size=1)
val_data = np.concatenate([val_batches.next() for i in range(val_batches.samples)])

Found 800 images belonging to 2 classes.


## Labels:

In [15]:
batches = image.ImageDataGenerator().flow_from_directory(current_dir+'/Data/maestroqa/train', 
                                                           target_size=(512,683),
                                                           class_mode='categorical', 
                                                           shuffle=False,
                                                           batch_size=1)

val_batches = image.ImageDataGenerator().flow_from_directory(current_dir+'/Data/maestroqa/valid', 
                                                           target_size=(512,683),
                                                           class_mode='categorical', 
                                                           shuffle=False,
                                                           batch_size=1)

Found 4870 images belonging to 2 classes.
Found 800 images belonging to 2 classes.


In [16]:
trn_classes = batches.classes
val_classes = val_batches.classes

trn_labels = np.array(OneHotEncoder().fit_transform(trn_classes.reshape(-1,1)).todense())
val_labels = np.array(OneHotEncoder().fit_transform(val_classes.reshape(-1,1)).todense())

## Saving:

In [17]:
save_array(current_dir+'/Data/crop_data/models/train_data_uu.bc', trn_data)
save_array(current_dir+'/Data/crop_data/models/valid_data_uu.bc', val_data)

save_array(current_dir+'/Data/crop_data/models/trn_labels_uu.bc', trn_labels)
save_array(current_dir+'/Data/crop_data/models/val_labels_uu.bc', val_labels)

## Reading back the arrays:

In [3]:
trn_data_1 = load_array(current_dir+'/Data/crop_data/models/train_data_uu.bc')
val_data_1 = load_array(current_dir+'/Data/crop_data/models/valid_data_uu.bc')

trn_labels_1 = load_array(current_dir+'/Data/crop_data/models/trn_labels_uu.bc')
val_labels_1 = load_array(current_dir+'/Data/crop_data/models/val_labels_uu.bc')

print (trn_data_1.shape,val_data_1.shape, trn_labels_1.shape, val_labels_1.shape )

(4870, 3, 512, 683) (800, 3, 512, 683) (4870, 2) (800, 2)


### Defining the Resnet50 model:

In [4]:
rn0 = Resnet50(include_top=False , size= (512,683)).model

  .format(self.name, input_shape))
  x = Convolution2D(64, 7, 7, subsample=(2, 2), name='conv1')(x)


In [5]:
# Extracting feature from the pre-trained model
trn_features = rn0.predict(trn_data_1, batch_size=128, verbose=1)
val_features = rn0.predict(val_data_1, batch_size=128, verbose=1)



### Adding the Fully Convolutional Net:

In [20]:
nf=128; p=0.2
def get_lrg_layers():
    return [
        BatchNormalization(axis=1, input_shape=(2048, 16, 22)),
        Convolution2D(nf,3,3, activation='relu', border_mode='same'),
        BatchNormalization(axis=1),
        MaxPooling2D(),
        Convolution2D(nf,3,3, activation='relu', border_mode='same'),
        BatchNormalization(axis=1),
         MaxPooling2D(),
        Convolution2D(nf,3,3, activation='relu', border_mode='same'),
        BatchNormalization(axis=1),
        MaxPooling2D((1,1)),
        Convolution2D(2,3,3, border_mode='same'),
        Dropout(p),
        GlobalAveragePooling2D(),
        Activation('softmax')
    ]

In [21]:
lrg_model = Sequential(get_lrg_layers())



In [22]:
lrg_model.compile(Adam(lr=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])

In [23]:
K.set_value(lrg_model.optimizer.lr, 0.00001)
batch_size=32
lrg_model.fit(trn_features, trn_labels_1, epochs=3, batch_size=batch_size, 
       validation_data=(val_features, val_labels_1))

Train on 4870 samples, validate on 800 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7f877ba67eb8>

In [24]:
K.set_value(lrg_model.optimizer.lr, 0.0001)
batch_size=64
lrg_model.fit(trn_features, trn_labels_1, epochs=3, batch_size=batch_size, 
       validation_data=(val_features, val_labels_1))

Train on 4870 samples, validate on 800 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7f877b2e9ba8>

In [25]:
K.set_value(lrg_model.optimizer.lr, 0.001)
batch_size=64
lrg_model.fit(trn_features, trn_labels_1, epochs=3, batch_size=batch_size, 
       validation_data=(val_features, val_labels_1))

Train on 4870 samples, validate on 800 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7f877b2e9c88>