# Train model with Jupyter & Keras

This notebook is meant to create a **1D convolutional model** in Keras and train it on the same exact voice data as the pure Tensorflow script. 

The Jupyter + Keras model quickly reaches an *accuracy of over 0.6* on the cross-validation set.
<br>The Tensorflow model gets stuck at *0.35*.

**The goal is to find the reason for this discrepancy.**

#### Confirm environment
And check the python version.

In [1]:
# list modules
!pip freeze

absl-py==0.1.13
adium-theme-ubuntu==0.3.4
alabaster==0.7.7
astor==0.6.2
Babel==1.3
backports-abc==0.5
backports.functools-lru-cache==1.4
backports.shutil-get-terminal-size==1.0.0
backports.weakref==1.0.post1
bcolz==1.2.0
bleach==1.5.0
boto==2.38.0
certifi==2018.1.18
chardet==2.3.0
configparser==3.5.0
croniter==0.3.8
cryptography==1.2.3
cycler==0.10.0
decorator==4.2.1
docutils==0.12
duplicity==0.7.6
entrypoints==0.2.3
enum34==1.1.6
funcsigs==1.0.2
functools32==3.2.3.post2
futures==3.2.0
gast==0.2.0
grpcio==1.10.0
html5lib==0.9999999
idna==2.0
ipaddress==1.0.16
ipykernel==4.8.2
ipython==5.6.0
ipython-genutils==0.2.0
ipywidgets==7.2.0
isoweek==1.3.3
jedi==0.11.1
Jinja2==2.10
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.2.3
jupyter-console==5.2.0
jupyter-core==4.4.0
Keras==2.1.5
lockfile==0.12.2
Markdown==2.6.11
MarkupSafe==1.0
matplotlib==2.1.0
mistune==0.8.3
mock==2.0.0
msgpack-python==0.4.6
nbconvert==5.3.1
nbformat==4.4.0
ndg-httpsclient==0.4.0
nose==1.3.7
notebook==5.4.1
numpy==

In [2]:
# confirm python version
from platform import python_version
import sys
print(sys.executable)
print("Python version: ", python_version())

/home/paperspace/anaconda3/bin/python
Python version:  3.6.3


## Import modules
We'll need a couple of additional libraries so let's import them.

In [3]:
# filter out warnings
import warnings
warnings.filterwarnings('ignore') 

In [26]:
import bcolz
import numpy as np
import os
import tensorflow
import time

# keras as tensorflow backend
from tensorflow.python.keras.layers import Dense, BatchNormalization, Dropout, Conv1D
from tensorflow.python.keras.layers import Input, MaxPooling1D, GlobalMaxPool1D, Activation
from tensorflow.python.keras.optimizers import Adam
from tensorflow.python.keras.models import Model

In [27]:
# define the bcolz array saving functions
def bcolz_save(fname, arr): c=bcolz.carray(arr, rootdir=fname, mode='w'); c.flush()
def bcolz_load(fname): return bcolz.open(fname)[:]

## Prepare data
Define the path to the downloaded voice data (the parent directory). That's where all the .bc files should be located.

In [28]:
path_to_data = "/home/paperspace/tfvoice/tensorflow_speech_recognition/data/main/redownloaded/data_redownloaded"

#### Load the data

In [29]:
# reload the y
train_y = bcolz_load(path_to_data + os.path.sep + "train_y" + ".bc")
cv_y = bcolz_load(path_to_data + os.path.sep + "cv_y" + ".bc")
test_y = bcolz_load(path_to_data + os.path.sep + "test_y" + ".bc")

In [30]:
# reload the Test & CV X
# raw
cv_X = bcolz_load(path_to_data + os.path.sep + "cv_X" + ".bc")
test_X = bcolz_load(path_to_data + os.path.sep + "test_X" + ".bc")

In [31]:
# reload the Train X
# raw
train_Xs = []
for i in range(7):
    train_subset = bcolz_load(path_to_data + os.path.sep + "train_X" + str(i + 1) +".bc")
    train_Xs.append(train_subset)

#### Split the Train y
Since we've split our Train X, it will be easier to split our Train y too, when we're passing it to our models.

In [32]:
# Train X subsets have 3168 examples each (7 total), exactly
train_ys = []
subset_size = 3168
for i in range(7):
    train_y_subset = train_y[subset_size * i : subset_size * (i + 1)]
    train_ys.append(train_y_subset)

In [33]:
train_ys[0][0]

array([0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

#### Expand dimensions for convolutions
We have to add a dimension for the convolutional layers.

In [34]:
# we need to expand the dimensions for 1D convolutions
expanded_train_Xs = [np.expand_dims(train_X, axis=2) for train_X in train_Xs]
expanded_train_Xs[0].shape

(3168, 16000, 1)

In [35]:
# same for CV
expanded_cv_X = np.expand_dims(cv_X, axis=2)
expanded_cv_X.shape

(3051, 16000, 1)

## Train Models
We're using a simple 1D-convolutional architecture.

In [36]:
# output needs 12 dimensions
num_categories = 12

Train for a couple more epochs, adjusting the learning rate.

#### 1D Convolutional Community Model

In [37]:
# input layer & batch normalization
inputs = Input(shape = (16000,1))
x_1d = BatchNormalization(name = 'batchnormal_1d_in')(inputs)

# iteratively create 9 blocks of 2 convolutional layers with batchnorm and max-pooling
for i in range(9):
    
    name = 'step'+str(i)
    
    # first 1D convolutional block
    x_1d = Conv1D(8*(2 ** i), (3),padding = 'same', name = 'conv'+name+'_1')(x_1d)
    x_1d = BatchNormalization(name = 'batch'+name+'_1')(x_1d)
    x_1d = Activation('relu')(x_1d)
    
    # second 1D convolutional block
    x_1d = Conv1D(8*(2 ** i), (3),padding = 'same', name = 'conv'+name+'_2')(x_1d)
    x_1d = BatchNormalization(name = 'batch'+name+'_2')(x_1d)
    x_1d = Activation('relu')(x_1d)
    
    # max pooling
    x_1d = MaxPooling1D((2), padding='same')(x_1d)

# final convolution and dense layer
x_1d = Conv1D(1024, (1),name='last1024')(x_1d)
x_1d = GlobalMaxPool1D()(x_1d)
x_1d = Dense(1024, activation = 'relu', name= 'dense1024_onlygmax')(x_1d)
x_1d = Dropout(0.2)(x_1d)

# soft-maxed prediction layer
predictions = Dense(num_categories, activation = 'softmax',name='cls_1d')(x_1d)


model = Model(inputs=inputs, outputs=predictions)
model.compile(Adam(lr=0.0001),loss="categorical_crossentropy", metrics=["accuracy"])

Train for 1 epoch.

In [38]:
# time it
start = time.time()

In [39]:
# keep track of epoch
cur_epoch_nr = 1

# fit iteratively
for i, expanded_train_X in enumerate(expanded_train_Xs):
    
    # pretty printing
    print(i + 1, "/", len(expanded_train_Xs))
    
    result = model.fit(expanded_train_X, train_ys[i], batch_size=32, epochs=1, 
             validation_data=(expanded_cv_X, cv_y))
    
    # pretty printing
    duration = time.time() - start
    print("Took {:.2f} seconds\n".format(duration))
    
    # results
    cv_acc = "{:.4f}".format(result.history["val_acc"][0]).replace(".","")
    train_acc = "{:.4f}".format(result.history["acc"][0]).replace(".","")

1 / 7
Train on 3168 samples, validate on 3051 samples
Epoch 1/1
Took 29.46 seconds

2 / 7
Train on 3168 samples, validate on 3051 samples
Epoch 1/1
Took 47.54 seconds

3 / 7
Train on 3168 samples, validate on 3051 samples
Epoch 1/1
Took 65.33 seconds

4 / 7
Train on 3168 samples, validate on 3051 samples
Epoch 1/1
Took 83.22 seconds

5 / 7
Train on 3168 samples, validate on 3051 samples
Epoch 1/1
Took 101.10 seconds

6 / 7
Train on 3168 samples, validate on 3051 samples
Epoch 1/1
Took 119.09 seconds

7 / 7
Train on 3168 samples, validate on 3051 samples
Epoch 1/1
Took 137.15 seconds



Train for more epochs.

In [40]:
# keep track of epoch
cur_epoch_nr = 1

# fit iteratively
for i, expanded_train_X in enumerate(expanded_train_Xs):
    
    # pretty printing
    print(i + 1, "/", len(expanded_train_Xs))
    
    result = model.fit(expanded_train_X, train_ys[i], batch_size=32, epochs=1, 
             validation_data=(expanded_cv_X, cv_y))
    
    # pretty printing
    duration = time.time() - start
    print("Took {:.2f} seconds\n".format(duration))
    
    # results
    cv_acc = "{:.4f}".format(result.history["val_acc"][0]).replace(".","")
    train_acc = "{:.4f}".format(result.history["acc"][0]).replace(".","")

1 / 7
Train on 3168 samples, validate on 3051 samples
Epoch 1/1
Took 154.82 seconds

2 / 7
Train on 3168 samples, validate on 3051 samples
Epoch 1/1
Took 172.61 seconds

3 / 7
Train on 3168 samples, validate on 3051 samples
Epoch 1/1
Took 190.42 seconds

4 / 7
Train on 3168 samples, validate on 3051 samples
Epoch 1/1
Took 208.25 seconds

5 / 7
Train on 3168 samples, validate on 3051 samples
Epoch 1/1
Took 226.45 seconds

6 / 7
Train on 3168 samples, validate on 3051 samples
Epoch 1/1
Took 244.75 seconds

7 / 7
Train on 3168 samples, validate on 3051 samples
Epoch 1/1
Took 262.88 seconds



We can see that after just 2 epochs our model has reached a **validation accuracy of 0.7 - 0.8**, with relatively little overfitting (0.85). 