# Audio classification through image classification:

This notebook shows how to use a pre-trained resent with additional Fully Convolutional Net to predict which call is good or bad. 

Advantage:

- The advantage of this method is that it does not require any audio file feature engineering and nor it relies on the text that is associated with each audio recording. This means it can be faster and much cheaper.

Disadvantage: 

- The accuracy on the current dataset is around 60% which is lower than both the MFCC and NLP based approaches. But as will be shown in the following this model, and many other models that I tested, start overfitting as the training process continues which indicates that more data will improves the accuracy of this model.   
 


In [1]:
# Rather than importing everything manually, we'll make things easy
#   and load them all in utils.py, and just import them from there.
%matplotlib inline
import utils; 

import importlib
importlib.reload(utils)

from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

def save_array(fname, arr): 
    c=bcolz.carray(arr, rootdir=fname, mode='w'); c.flush()
def load_array(fname):
    return bcolz.open(fname)[:]

from utils import *
current_dir = os.getcwd()
print ('current folder: ',current_dir)
model_path = current_dir+'/Data/crop_data/models'
print ('model path: ',model_path)

current folder:  /home/sohrab/MaestroQA_ffast_ai
model path:  /home/sohrab/MaestroQA_ffast_ai/Data/crop_data/models


Using cuDNN version 6021 on context None
Mapped name None to device cuda: Quadro P6000 (0000:03:00.0)
Using Theano backend.


In [2]:
import resnet50
from resnet50 import Resnet50

## Reading the victories images :

In [3]:
trn_data_1 = load_array(current_dir+'/Data/crop_data/models/train_data_big.bc')
val_data_1 = load_array(current_dir+'/Data/crop_data/models/valid_data_big.bc')

trn_labels_1 = load_array(current_dir+'/Data/crop_data/models/trn_labels_big.bc')
val_labels_1 = load_array(current_dir+'/Data/crop_data/models/val_labels_big.bc')

print (trn_data_1.shape,val_data_1.shape, trn_labels_1.shape, val_labels_1.shape )

(10326, 3, 430, 1246) (1999, 3, 430, 1246) (10326, 2) (1999, 2)


### Defining the Resnet50 model:

In [4]:
rn0 = Resnet50(include_top=False , size= (430,1246)).model

  .format(self.name, input_shape))
  x = Convolution2D(64, 7, 7, subsample=(2, 2), name='conv1')(x)


In [5]:
# Extracting feature from the pre-trained model
trn_features = rn0.predict(trn_data_1, batch_size=128, verbose=1)
val_features = rn0.predict(val_data_1, batch_size=128, verbose=1)



### Adding the Fully Convolutional Net:

In [6]:
nf=128; p=0
def get_lrg_layers():
    return [
        BatchNormalization(axis=1, input_shape=(2048, 14, 39)),
        Convolution2D(nf,3,3, activation='relu', border_mode='same'),
        BatchNormalization(axis=1),
#         MaxPooling2D(),
        Convolution2D(nf,3,3, activation='relu', border_mode='same'),
        BatchNormalization(axis=1),
#         MaxPooling2D(),
        Convolution2D(nf,3,3, activation='relu', border_mode='same'),
        BatchNormalization(axis=1),
#         MaxPooling2D((1,1)),
        Convolution2D(2,3,3, border_mode='same'),
        Dropout(p),
        GlobalAveragePooling2D(),
        Activation('softmax')
    ]

In [7]:
lrg_model = Sequential(get_lrg_layers())



In [8]:
lrg_model.compile(Adam(lr=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])

In [9]:
batch_size=32
lrg_model.fit(trn_features, trn_labels_1, epochs=3, batch_size=batch_size, 
       validation_data=(val_features, val_labels_1))

Train on 10326 samples, validate on 1999 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7eeb8b185908>

In [10]:
batch_size=8
lrg_model.fit(trn_features, trn_labels_1, epochs=1, batch_size=batch_size, 
       validation_data=(val_features, val_labels_1))

Train on 10326 samples, validate on 1999 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7eeb928bc898>

In [11]:
batch_size=16
lrg_model.fit(trn_features, trn_labels_1, epochs=3, batch_size=batch_size, 
       validation_data=(val_features, val_labels_1))

Train on 10326 samples, validate on 1999 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7eeb8af27c18>

In [12]:
batch_size=8
lrg_model.fit(trn_features, trn_labels_1, epochs=5, batch_size=batch_size, 
       validation_data=(val_features, val_labels_1))

Train on 10326 samples, validate on 1999 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7eeb97a81048>

In [14]:
nf=128; p=0.2
def get_lrg_layers():
    return [
        BatchNormalization(axis=1, input_shape=(2048, 14, 39)),
        Convolution2D(nf,3,3, activation='relu', border_mode='same'),
        BatchNormalization(axis=1),
         MaxPooling2D(),
        Convolution2D(nf,3,3, activation='relu', border_mode='same'),
        BatchNormalization(axis=1),
         MaxPooling2D(),
        Convolution2D(nf,3,3, activation='relu', border_mode='same'),
        BatchNormalization(axis=1),
         MaxPooling2D((1,1)),
        Convolution2D(2,3,3, border_mode='same'),
        Dropout(p),
        GlobalAveragePooling2D(),
        Activation('softmax')
    ]

In [15]:
lrg_model = Sequential(get_lrg_layers())



In [16]:
lrg_model.compile(Adam(lr=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])

In [17]:
batch_size=32
lrg_model.fit(trn_features, trn_labels_1, epochs=3, batch_size=batch_size, 
       validation_data=(val_features, val_labels_1))

Train on 10326 samples, validate on 1999 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7eeb85ef4278>

In [18]:
batch_size=8
lrg_model.fit(trn_features, trn_labels_1, epochs=20, batch_size=batch_size, 
       validation_data=(val_features, val_labels_1))

Train on 10326 samples, validate on 1999 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7eeb85bfdb38>