# First Keras project

Keeping it simple. Probably a little too simple, but let's finish this, learn what we learn and move on to something more educational.

(I'm working with Kaggle's MNIST digist data. It's all preprocessed except for one little thing, which is why this is too easy.)

In [2]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras

keras.__version__

'2.3.0-tf'

In [3]:
#tf.python.client.device_lib.list_local_devices() #verifies GPU type.
tf.config.experimental.list_physical_devices('GPU')#verifies GPU detected

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

In [4]:
try: ## this means that if I re-run all cells, I don't have to wait for pd.read_csv, which is a little slow.
    dataframe.head()
except NameError:
    dataframe = pd.read_csv('data/train.csv')
dev_df = dataframe.sample(n=3000, random_state=1)
train_df = dataframe.drop(dev_df.index)
assert train_df.shape[1] == 785 #should be 784 + 1

In [5]:
## I have some learning to do with datasets. 
## So the cells below don't get used as of 16/11/20.

def dataframe_to_dataset(dataframe, batch_size=64, label='label'):
    ds = tf.data.Dataset.from_tensor_slices((dataframe.drop(label,axis=1).to_numpy(), dataframe[label]))
    ds.shuffle(buffer_size=len(dataframe))
    ds.batch(batch_size)
    return ds

#dev_ds = dataframe_to_dataset(dev_df)
#train_ds = dataframe_to_dataset(train_df)

In [6]:
from tensorflow.keras import layers

##Build the model
inputs = keras.Input(shape=(784))
x = layers.experimental.preprocessing.Rescaling(1./255)(inputs)
x = layers.Dense(100, activation='relu')(x)
x = layers.Dense(100, activation='relu')(x)
outputs = layers.Dense(10, activation='softmax')(x)
model = keras.Model(inputs=inputs,outputs=outputs, name='simple_model')
model.summary()

Model: "simple_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 784)]             0         
_________________________________________________________________
rescaling (Rescaling)        (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 100)               78500     
_________________________________________________________________
dense_1 (Dense)              (None, 100)               10100     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1010      
Total params: 89,610
Trainable params: 89,610
Non-trainable params: 0
_________________________________________________________________


In [7]:
## Cool. 
## Compile and train

model.compile(optimizer="adam", 
                loss="sparse_categorical_crossentropy",
                metrics=[keras.metrics.SparseCategoricalAccuracy(name="acc")],
            )
(X, y) = (train_df.drop('label',axis=1).to_numpy(), train_df['label'].to_numpy())
(X_val, y_val) = (dev_df.drop('label', axis = 1).to_numpy(), dev_df['label'].to_numpy())
assert X.shape[1] == 784   #makes sure shape is correct.
assert X.shape[0] == y.shape[0]

# run once to see if everything is in order, or comment out and go to next cell
# model.fit(X, y, epochs=1, batch_size=64, validation_data=(X_val, y_val))

In [8]:
## 25 epochs to get close to 0 loss.
batch_size = 256 ## Trial and error reveals this to be quick without stochastic bumps.
model.fit(X, y, epochs=25, batch_size=batch_size, validation_data=(X_val, y_val))

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<tensorflow.python.keras.callbacks.History at 0x7f5990bb9f40>

In [11]:
## We can now babysit by re-running until there's no consistent improvement in 
## training loss.

model.fit(X, y, epochs=5, batch_size=batch_size, validation_data=(X_val, y_val))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f5990b92e80>

## First ever Keras model complete.

That model was doomed to overfit the data; after about 45 epochs, acc = 100%. Validation loss has actually been increasing
despite the accuracy being consistently between .972 and .974
Next, I learn to add L2 regularization to reduce variance.

We can [add regularizers to layers](https://keras.io/api/layers/regularizers/) by specifying a regularizer.

In [14]:
##Build a model with regularization

## I ran this once with alpha = .01 and it made very slow progress. That might be necessary,
## but I am starting with a conservative .0003. We don't have that much variance in the
## simple model.

inputs = keras.Input(shape=(784))
x = layers.experimental.preprocessing.Rescaling(1./255)(inputs)
x = layers.Dense(100, activation='relu',
                    kernel_regularizer=keras.regularizers.l2(0.0003))(x)
x = layers.Dense(100, activation='relu',
                    kernel_regularizer=keras.regularizers.l2(0.0003))(x)
outputs = layers.Dense(10, activation='softmax')(x)
model_reg = keras.Model(inputs=inputs,outputs=outputs, name='L2_Regularized_Model')
model_reg.summary()


Model: "L2_Regularized_Model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_3 (InputLayer)         [(None, 784)]             0         
_________________________________________________________________
rescaling_2 (Rescaling)      (None, 784)               0         
_________________________________________________________________
dense_6 (Dense)              (None, 100)               78500     
_________________________________________________________________
dense_7 (Dense)              (None, 100)               10100     
_________________________________________________________________
dense_8 (Dense)              (None, 10)                1010      
Total params: 89,610
Trainable params: 89,610
Non-trainable params: 0
_________________________________________________________________


In [15]:
## Compile and run once to see if everything looks right
model_reg.compile(optimizer="adam", 
                loss="sparse_categorical_crossentropy",
                metrics=[keras.metrics.SparseCategoricalAccuracy(name="acc")],
            )
model_reg.fit(X, y, epochs=1, batch_size=batch_size, validation_data=(X_val, y_val))



<tensorflow.python.keras.callbacks.History at 0x7f5990f6a400>

In [17]:
model_reg.fit(X, y, epochs=100, verbose = 2, batch_size=batch_size, validation_data=(X_val, y_val))

Epoch 1/100
153/153 - 1s - loss: 0.0484 - acc: 0.9936 - val_loss: 0.1415 - val_acc: 0.9730
Epoch 2/100
153/153 - 1s - loss: 0.0436 - acc: 0.9962 - val_loss: 0.1323 - val_acc: 0.9707
Epoch 3/100
153/153 - 1s - loss: 0.0420 - acc: 0.9965 - val_loss: 0.1290 - val_acc: 0.9747
Epoch 4/100
153/153 - 1s - loss: 0.0376 - acc: 0.9983 - val_loss: 0.1204 - val_acc: 0.9753
Epoch 5/100
153/153 - 1s - loss: 0.0332 - acc: 0.9993 - val_loss: 0.1088 - val_acc: 0.9807
Epoch 6/100
153/153 - 1s - loss: 0.0317 - acc: 0.9994 - val_loss: 0.1151 - val_acc: 0.9770
Epoch 7/100
153/153 - 1s - loss: 0.0304 - acc: 0.9996 - val_loss: 0.1091 - val_acc: 0.9757
Epoch 8/100
153/153 - 1s - loss: 0.0353 - acc: 0.9977 - val_loss: 0.1212 - val_acc: 0.9730
Epoch 9/100
153/153 - 1s - loss: 0.0381 - acc: 0.9967 - val_loss: 0.1138 - val_acc: 0.9770
Epoch 10/100
153/153 - 1s - loss: 0.0456 - acc: 0.9945 - val_loss: 0.1539 - val_acc: 0.9657
Epoch 11/100
153/153 - 1s - loss: 0.0480 - acc: 0.9943 - val_loss: 0.1367 - val_acc: 0.97

<tensorflow.python.keras.callbacks.History at 0x7f5990f16eb0>

## Punch list

Things to do:
- Write a simple function to matplotlib the loss, given a history.
- Learn about the hyperparameter tuning process.

But first, **dropout** regularization.

In [18]:
model

<tensorflow.python.keras.engine.training.Model at 0x7f5990d56880>