# Local Feature Extractors

Train a set of CNNs handling a inputs (segments of mel-spectrograms) in different scale. 

These CNN models will be used as loacal feature extrators in music genre tagger.

We plan to train 5 different scales of CNNs. Their segment length are: 20, 30, 60, 120, 240.

All CNNs contain 5 conv layers. The "vertical" pooling size in each layer would be: `[2, 3, 2, 2, 4]`.

The horizontal pooling sizes are set according to their segment length:

Segment Length | Pooling sizes
- | -
20 | `[2, 2, 2, 2, 1]`
30 | `[2, 2, 2, 2, 1]`
60 | `[3, 2, 2, 2, 2]`
120 | `[4, 3, 2, 2, 2]`
240 | `[4, 4, 3, 2, 2]`
    

In [1]:
import numpy as np
from sklearn.model_selection import train_test_split
import keras
from keras.layers import Input, Dense, merge, Flatten, Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.layers.normalization import BatchNormalization
from keras.layers.pooling import GlobalAveragePooling2D
from keras.models import Model
from kapre.time_frequency import Melspectrogram
from keras.utils.visualize_util import plot

Using Theano backend.


## Load Dataset

We train these local feature extractors on GTZAN dataset. 

Audio files are first preprocessed to 2D mel-spectrograms then partitioned horizontally in different segment lengths. All data preprocessing jobs are done in `Split Dataset.ipynb`.

In [2]:
# Load Dataset
X = np.load('/Users/pengguo/Desktop/coms4995/Project/Multi_scale/dataset/X_train_seg30.npy')
Y_pre = np.load('/Users/pengguo/Desktop/coms4995/Project/Multi_scale/dataset/Y_train_seg30.npy').astype(int)

# Encode Y_pre to one-hot(Y)
Y = np.zeros((Y_pre.shape[0], 10))
Y[np.arange(Y_pre.shape[0]), Y_pre] = 1

In [3]:
print "Shape of X: {}".format(X.shape)
print "Shape of Y: {}".format(Y.shape)

Shape of X: (40500, 1, 96, 30)
Shape of Y: (40500, 10)


## Define model

In [4]:
def gen_model(seg_length, pool_sizes_hori):
    '''
    Generate model with different scales.
    seg_length and pooling layer sizes are set adjusting to different scales.
    '''
    psh = [0] + pool_sizes_hori # padding at front for index alignment
    
    # input
    x = Input(shape=(1, 96, seg_length))

    # 1st conv layer
    conv1 = Convolution2D(32, 3, 3, border_mode='same', init='he_normal', name='conv1_{}'.format(seg_length))(x)
    conv1 = BatchNormalization(axis=1, mode=2, name='BN1_{}'.format(seg_length))(conv1)
    conv1 = keras.layers.advanced_activations.ELU(alpha=1.0)(conv1)
    conv1 = MaxPooling2D(pool_size=(2, psh[1]))(conv1)

    # 2nd conv layer
    conv2 = Convolution2D(32, 3, 3, border_mode='same', init='he_normal', name='conv2_{}'.format(seg_length))(conv1)
    conv2 = BatchNormalization(axis=1, mode=2, name='BN2_{}'.format(seg_length))(conv2)
    conv2 = keras.layers.advanced_activations.ELU(alpha=1.0)(conv2)
    conv2 = MaxPooling2D(pool_size=(3, psh[2]))(conv2)

    # 3rd conv layer
    conv3 = Convolution2D(32, 3, 3, border_mode='same', init='he_normal', name='conv3_{}'.format(seg_length))(conv2)
    conv3 = BatchNormalization(axis=1, mode=2, name='BN3_{}'.format(seg_length))(conv3)
    conv3 = keras.layers.advanced_activations.ELU(alpha=1.0)(conv3)
    conv3 = MaxPooling2D(pool_size=(2, psh[3]))(conv3)

    # 4th conv layer
    conv4 = Convolution2D(32, 3, 3, border_mode='same', init='he_normal', name='conv4_{}'.format(seg_length))(conv3)
    conv4 = BatchNormalization(axis=1, mode=2, name='BN4_{}'.format(seg_length))(conv4)
    conv4 = keras.layers.advanced_activations.ELU(alpha=1.0)(conv4)
    conv4 = MaxPooling2D(pool_size=(2, psh[4]))(conv4)

    # 5th conv layer
    conv5 = Convolution2D(32, 3, 3, border_mode='same', init='he_normal', name='conv5_{}'.format(seg_length))(conv4)
    conv5 = BatchNormalization(axis=1, mode=2, name='BN5_{}'.format(seg_length))(conv5)
    conv5 = keras.layers.advanced_activations.ELU(alpha=1.0)(conv5)
    conv5 = MaxPooling2D(pool_size=(4, psh[5]))(conv5)

    # Flatten the output of last conv layer (conv5)
    conv5 = Flatten()(conv5)
    
    # output layer
    out = Dense(10, input_shape=[32], activation='softmax')(conv5)
    
    # define model
    model = Model(input=x, output=out)
    
    return model

## Declare Model

In [5]:
# Model parameters
seg_length = 30
pool_sizes_hori = [2, 2, 2, 2, 1] # sizes of pooling layers (in horizontal direction)

# Generate model
model = gen_model(seg_length, pool_sizes_hori)

In [6]:
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_1 (InputLayer)             (None, 1, 96, 30)     0                                            
____________________________________________________________________________________________________
conv1_30 (Convolution2D)         (None, 32, 96, 30)    320         input_1[0][0]                    
____________________________________________________________________________________________________
BN1_30 (BatchNormalization)      (None, 32, 96, 30)    128         conv1_30[0][0]                   
____________________________________________________________________________________________________
elu_1 (ELU)                      (None, 32, 96, 30)    0           BN1_30[0][0]                     
___________________________________________________________________________________________

In [6]:
# Compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

## Train Model

(Save all "model history"s to a list for plotting the whole training process.)

In [7]:
# Split Dataset (90% train + 10% dev)
X_train, X_dev, Y_train, Y_dev = train_test_split(X, Y, test_size=0.1)

In [8]:
# Train Model
model_his = model.fit(X_train, Y_train, batch_size=256, validation_data=(X_dev, Y_dev), nb_epoch=2)

Train on 36450 samples, validate on 4050 samples
Epoch 1/2
Epoch 2/2


## Evaluate Model

In [None]:
# Evaluate Model
model_eval = model.evaluate(X_test, Y_test)

## Save model Weights

In [10]:
# Save Model weights
import os
if not os.path.exists('./weights/'):
    os.mkdir('./weights/')
model.save_weights('./weights/cnn_{}.h5'.format(seg_length))

## Load Model Weights

In [None]:
# Load Model weights
if not os.path.exists('./weights/'):
    os.mkdir('./weights/')
model.load_weights('./weights/local_cnn_{}.h5'.format(seg_length), by_name=True)