# Final Model Training

Here, we will train the final model and export it so it can be used in the web application.

Many of the same steps are taken in the other notebooks, so we will not focus too much on describing the steps that are being taken.

In [12]:
# so we have access to the Google Drive filesystem
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [0]:
# necessary imports
import os
import pandas as pd
import numpy as np

# so we can access local modules within Colab
os.chdir('/content/drive/My Drive/auto-age-detector-model')

# feature selection defined functions
from feature_selection import tree_based_feature_selection

# model creation
from models import create_model

# for feature scaling
from sklearn.preprocessing import StandardScaler

In [0]:
df_train = pd.read_csv('data/audio_training_data_cleaned.csv').drop(columns=['Unnamed: 0','filename'])
# drop any null values we may have forgotten
df_train = df_train.dropna(how='any',axis=0)
X_train = df_train.drop(columns=['age'],axis=1)
y_train = df_train['age']

In [0]:
X_train,data_transformer = tree_based_feature_selection(X_train,y_train,
                                                        n_estimators=75)

In [0]:
replaced = {'teens':0,'twenties':1,'thirties':2,'fourties':3,'fifties':4,
            'sixties':5,'seventies':6,'eighties':7}

# https://stackoverflow.com/questions/29831489/convert-array-of-indices-to-1-hot-encoded-numpy-array

# need to put one hot encoded in keras model
y_train_ohe = y_train.replace(replaced)
y_train_ohe = np.eye(np.max(y_train_ohe)+1)[y_train_ohe]

In [0]:
# standard scaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)

In [0]:
# ensure reproducible results
from numpy.random import seed
seed(1)

In [19]:
X_train.shape

(73765, 85)

In [20]:
# get the model
model = create_model(dropout=0.1,learning_rate=1e-3)

model.fit(X_train,y_train_ohe,batch_size=32,validation_split=0.15,
          epochs=40)

Train on 62700 samples, validate on 11065 samples
Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


<keras.callbacks.History at 0x7fc24316d5f8>

We save the model.

In [0]:
model.save('model1.h5')
from google.colab import files
files.download(os.path.join(os.getcwd(),'model1.h5'))

We also save the feature selector and scaler for consistency.

In [0]:
import pickle
with open('feature_selector.pkl','wb') as feat_sel:
  pickle.dump(data_transformer,feat_sel)

In [0]:
with open('standard_scaler.pkl','wb') as standard_scaler:
  pickle.dump(scaler,standard_scaler)

In [36]:
test = np.array([1,3,4,3])
isinstance(test,np.ndarray)

True