## Keras and MNIST ##

Let's have another look at the MNIST dataset and this time build some classifiers using Keras.

In [None]:
from tensorflow.keras.datasets import mnist
import random

# Load in the MNIST data
(train_X, train_Y), (test_X, test_Y) = mnist.load_data()

N = int(.8 * train_X.shape[0])
val_X = train_X[N:]
val_Y = train_Y[N:]
train_X = train_X[:N]
train_Y = train_Y[:N]

# import matplotlib for visualization
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

for image, label in ([(train_X[idx,:], train_Y[idx]) for idx in np.arange(train_X.shape[0])[:3]]):
    print(type(image), type(label))
    print('Label:', label)
    print('Digit in the image')
    plt.imshow(image.reshape(28,28),cmap='gray')
    plt.show()



We still have some work to do.  Let's limit ourselves to 0's and 8's.





In [None]:

ze_train_X = np.concatenate((train_X[train_Y == 0,:], train_X[train_Y == 8,:]), axis = 0)
ze_train_Y = np.concatenate((train_Y[train_Y == 0], train_Y[train_Y == 8]), axis = 0)

ze_test_X = np.concatenate((test_X[test_Y == 0,:], test_X[test_Y == 8,:]), axis = 0)
ze_test_Y = np.concatenate((test_Y[test_Y == 0], test_Y[test_Y == 8]), axis = 0)

# And set all 8's to 1's for the binary classification task (i.e., predict 0 or 1)
ze_train_Y[ze_train_Y == 8] = 1
ze_test_Y[ze_test_Y == 8] = 1

# Let's shuffle the order of the training data
ze_train_indices = np.arange(ze_train_X.shape[0])
np.random.shuffle(ze_train_indices)

ze_train_X = ze_train_X[ze_train_indices,:]
ze_train_Y = ze_train_Y[ze_train_indices]

# We'll take another look at the selected training data
for image, label in ([(ze_train_X[idx,:], ze_train_Y[idx]) for idx in np.arange(ze_train_X.shape[0])[:5]]):
    print(type(image), type(label))
    print('Label:', label)
    print('Digit in the image')
    plt.imshow(image.reshape(28,28),cmap='gray')
    plt.show()

print(np.sum(ze_train_Y == 0))
print(np.sum(ze_train_Y == 1))



Now let's train up a model.

In [None]:
# Reshape and normalize input
ze_train_X = ze_train_X.reshape((ze_train_X.shape[0], -1)) /255.
ze_test_X = ze_test_X.reshape((ze_test_X.shape[0], -1)) / 255.

print(ze_train_X.shape)

We will build up a sequential model, layer by layer.  

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

model = Sequential()

model.add(Dense(units=64, activation='sigmoid', input_dim=28*28))
#model.add(Dense(units=250, activation='sigmoid'))
model.add(Dense(units=1, activation='sigmoid'))

# Specify the loss, optimizer and any additional metrics to follow
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])

# Train the model with the training data, set epochs and batch size
model.fit(ze_train_X, ze_train_Y, epochs=10, batch_size=32)


And then apply the model to the test data.

In [None]:
loss_and_metrics = model.evaluate(ze_test_X, ze_test_Y)
print(loss_and_metrics)

# Sometimes it is helpful to look at the raw predictions.  If the model is
# learning something, you should see some patterns.  All 0's or 1's or some fixed
# value are signs something is wrong.
raw_preds = model.predict(ze_test_X)
print(raw_preds)

Do you notice a dip in performance between the training data and the test data?  What do you notice about raw outputs?


Consider these questions.  When we adjust the size of the model, should we consider the performance of the adjusted model on the training data or test data?  Is there any reason why might want to evaluate the model on a dataset other than the test or training dataset?

How do you know if you are using at GPU?  Once we get to more involved examples, you will be able to tell.  You don't have access to a GPU by default but can [select a GPU enabled run time](https://colab.research.google.com/notebooks/gpu.ipynb).  

In [None]:
import tensorflow as tf
tf.test.gpu_device_name()


How robust do you think the network that we trained up will be for novel data?

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    rotation_range=30,
    width_shift_range=0.2,
    height_shift_range=0.2,
    vertical_flip=True)


for X_batch, Y_batch in datagen.flow(ze_test_X.reshape((-1, 28,28, 1)), ze_test_Y, batch_size=5):
    for image, label in ([(X_batch[idx,:], Y_batch[idx]) for idx in np.arange(X_batch.shape[0])]):
        print(type(image), type(label))
        print('Label:', label)
        print('Digit in the image')
        plt.imshow(image.reshape(28,28),cmap='gray')
        plt.show()

    break




How do you think that our model will perform on this augmented data?


In [None]:
for X_batch, Y_batch in datagen.flow(ze_test_X.reshape((-1, 28,28, 1)), ze_test_Y, batch_size=32):

    X_batch = X_batch.reshape((-1,28*28))

    loss_and_metrics = model.evaluate(X_batch, Y_batch)
    print(loss_and_metrics)

    raw_preds = model.predict(X_batch)

    break

Ouch!