# Keras Tutorial

Remember in the class we talked about the pipeline of a real computer vision system, in which we:

1. First clean the data to the format to be used for later steps (which includes data loading, data pre-processing, dataset splitting (we'll talk about this on Friday), data augmentation (which we're not gonna cover), etc);

2. Then we build the model for feature extraction as well as for final regression / classification. Remember we have many choices like linear model, fully connected neural nets, convolutional neural nets, etc. And we can implement these models very easily in Keras with just one line of code;

3. After we get the data and the model, we need to code up the optimization part (for which we'll use gradient descent). 

In this tutorial, we'll go over these parts sequentially.

## Data Loading and Pre-processing

So in Keras we don't need anything specific for data, we just use NumPy and represent our data in Numpy arrays. Now we're gonna create some fake data to be used later.

In [2]:
# Import necessary packages
import numpy as np

In [3]:
# Create random numpy arrays (ldata loading)
rand_data = np.random.random((1000, 32, 32, 3)) # We have 1000 fake images with spatial size 32 * 32
rand_label = np.array([0]*500 + [1]*500)        # Create fake binary labels for these images  

print(rand_data.shape)

(1000, 32, 32, 3)


In [5]:
# Split data into train, validation and test sets (we'll talk more about this on Friday)
train_ratio, val_ratio = 0.9, 0.05

X_train = rand_data[:int(rand_data.shape[0]*train_ratio), ...] # ... means all the other axes
y_train = rand_label[:int(rand_data.shape[0]*train_ratio), ...]

X_val = rand_data[int(rand_data.shape[0]*train_ratio):int(rand_data.shape[0]*(train_ratio+val_ratio)), ...]
y_val = rand_label[int(rand_data.shape[0]*train_ratio):int(rand_data.shape[0]*(train_ratio+val_ratio)), ...]

X_test = rand_data[int(rand_data.shape[0]*(train_ratio+val_ratio)):, ...]
y_test = rand_label[int(rand_data.shape[0]*(train_ratio+val_ratio)):, ...]

print(X_train.shape)
print(X_val.shape)
print(X_test.shape)

(900, 32, 32, 3)
(50, 32, 32, 3)
(50, 32, 32, 3)


## Model construction

Now we have all the data, next we're gonna build our model for feature extraction as well as classification. In Keras, you can easily build many models, as shown below.

In [6]:
import keras
from keras.models import Sequential # Sequential is one of the main models in Keras, which is basically a sequentially stacked series of layers

model = Sequential() # Initialize a Sequential model instance

Using TensorFlow backend.
W0705 09:26:18.411666 4627547584 deprecation_wrapper.py:119] From /Users/Caesar/miniconda2/envs/py37/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.



In [7]:
# First we'll use fully-connected neural nets
from keras.layers import Dense # Dense is Keras's name for fully connected layers

# We can stack layers like lego blocks by simplying using `add()`
# `units` is the number of neurons
# `activation` is the nonlinear function we add for each layer
# We only need to specify `input_dim` which is the input dimension for the layer for the input layer, because for later layers the input is just the output from last layer
# Once again, the number of neurons in hidden layers (e.g., 64 and 16 here) are design choices

model.add(Dense(units=64, activation='sigmoid', input_dim=32*32*3)) 
model.add(Dense(units=16, activation='sigmoid'))
model.add(Dense(units=1, activation='sigmoid'))

W0705 09:27:02.475337 4627547584 deprecation_wrapper.py:119] From /Users/Caesar/miniconda2/envs/py37/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0705 09:27:02.496649 4627547584 deprecation_wrapper.py:119] From /Users/Caesar/miniconda2/envs/py37/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.



In [8]:
# Once the model is build, we then configure the learning process with `compile()`
# We need to specify the loss function, the optimizer and the metric we use to evaluate our model
# For loss here we're using a function called binary cross-entropy loss, which is specifically for binary classification
# For optimizer we're using gradient descent, which is written as 'sgd' in Keras
# Since we're doing classification, normally the classification accuracy is how we evaluate the model

model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])

W0705 09:28:09.018414 4627547584 deprecation_wrapper.py:119] From /Users/Caesar/miniconda2/envs/py37/lib/python3.7/site-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0705 09:28:09.054360 4627547584 deprecation_wrapper.py:119] From /Users/Caesar/miniconda2/envs/py37/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3376: The name tf.log is deprecated. Please use tf.math.log instead.

W0705 09:28:09.068955 4627547584 deprecation.py:323] From /Users/Caesar/miniconda2/envs/py37/lib/python3.7/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


In [9]:
# The above is actually a convenient way that Keras provides for easy implementation. If you want to have more control over the learning process (e.g., the learning rate), you can use the following:

model.compile(loss=keras.losses.binary_crossentropy, optimizer=keras.optimizers.SGD(lr=0.001))

In [10]:
# Up to this point we're all doing configurations. Now everything is set up so we're letting the model do real things!

# Since now we're using a fully-connected nets, remember we need to flatten the image to a single long vector first
X_train_flat = X_train.reshape((-1, 32*32*3)) # -1 means letting NumPy to figure this axis out automatically
X_val_flat = X_val.reshape((-1, 32*32*3))
X_test_flat = X_test.reshape((-1, 32*32*3))

print(X_train_flat.shape)
print(X_val_flat.shape)
print(X_test_flat.shape)

# Then use fit() to actually train our model
# epochs is basically how many iterations we want for the update process. The model needs some time to reach the optimal state!
# batch_size is how many images we use each time to estimate the gradient. Remember that the more we use the more accurate each update will be, but it will also be slower

model.fit(X_train_flat, y_train, epochs=5, batch_size=32, validation_data=(X_val_flat, y_val))

W0705 09:30:39.864305 4627547584 deprecation_wrapper.py:119] From /Users/Caesar/miniconda2/envs/py37/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.



(900, 3072)
(50, 3072)
(50, 3072)
Train on 900 samples, validate on 50 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0xb30045860>

In [11]:
# Now let's see how our model does
acc = model.evaluate(X_test_flat, y_test)
print('The test accuracy is: {}'.format(acc))

# And make predictions
prob = model.predict(X_test_flat) # These are probabilities, and we want to convert them to class labels
label = np.array(prob > 0.5, dtype=int)

print('The predicted probabilities are: {}'.format(prob))
print('The predicted class labels are: {}'.format(label))


The test accuracy is: 0.7417427635192871
The predicted probabilities are: [[0.48281336]
 [0.47698927]
 [0.46662387]
 [0.48477855]
 [0.4843165 ]
 [0.47765195]
 [0.48685473]
 [0.47723022]
 [0.47426078]
 [0.49188405]
 [0.48913604]
 [0.4736628 ]
 [0.48325825]
 [0.48183548]
 [0.48358443]
 [0.46566638]
 [0.46648058]
 [0.48024574]
 [0.47309178]
 [0.47772107]
 [0.46777815]
 [0.48181674]
 [0.4698373 ]
 [0.47837853]
 [0.46743447]
 [0.48243085]
 [0.48156023]
 [0.47030056]
 [0.47579846]
 [0.46148956]
 [0.4782727 ]
 [0.47594103]
 [0.47401866]
 [0.4757933 ]
 [0.4740392 ]
 [0.46542567]
 [0.46474433]
 [0.48144367]
 [0.48078394]
 [0.48594508]
 [0.48337954]
 [0.4829551 ]
 [0.47512576]
 [0.45782456]
 [0.47999835]
 [0.4797726 ]
 [0.46572384]
 [0.47631648]
 [0.48253968]
 [0.4623026 ]]
The predicted class labels are: [[0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0

In [13]:
# As we can expect, the results are totally random
# You can also play with other models, e.g., convnets
# So we do the same procedure once more

model = Sequential() # Re-initialize the model

# Feature extractor
# We're using such an architecture: conv -> maxpool -> conv -> maxpool
# 'same' padding means we zero-pad the images so that the output will be of the same size as the input
model.add(keras.layers.Conv2D(filters=16, kernel_size=3, strides=(2, 2), padding='same'))
model.add(keras.layers.Activation('sigmoid'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2))) # By default the stride is the same as the pooling size

model.add(keras.layers.Conv2D(filters=32, kernel_size=2, strides=(1, 1), padding='same'))
model.add(keras.layers.Activation('relu')) # ReLU is another kind of non-linear function
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))

# Classifier
# We're using a 2-layer FC net for classification 
model.add(keras.layers.Flatten())

model.add(keras.layers.Dense(32))
model.add(keras.layers.Activation('relu'))

model.add(keras.layers.Dense(1))
model.add(keras.layers.Activation('sigmoid'))

# Compilation
model.compile(loss=keras.losses.binary_crossentropy, optimizer=keras.optimizers.SGD(lr=0.001))

# Training
model.fit(X_train, y_train, epochs=20, batch_size=32, validation_data=(X_val, y_val))

# Evaluation
acc = model.evaluate(X_test, y_test)
print('The test accuracy is: {}'.format(acc))

# And make predictions
prob = model.predict(X_test) # These are probabilities, and we want to convert them to class labels
label = np.array(prob > 0.5, dtype=int)

print('The predicted probabilities are: {}'.format(prob))
print('The predicted class labels are: {}'.format(label))

Train on 900 samples, validate on 50 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
The test accuracy is: 0.80976820230484
The predicted probabilities are: [[0.4396488 ]
 [0.44756165]
 [0.4440366 ]
 [0.4522219 ]
 [0.44278717]
 [0.44286335]
 [0.44352692]
 [0.4399996 ]
 [0.44521046]
 [0.43936926]
 [0.44289467]
 [0.4485495 ]
 [0.4394881 ]
 [0.43881896]
 [0.44759494]
 [0.44729146]
 [0.4495825 ]
 [0.44763592]
 [0.43686706]
 [0.44305956]
 [0.456174  ]
 [0.44370866]
 [0.45216244]
 [0.44362456]
 [0.44955513]
 [0.44751912]
 [0.4509665 ]
 [0.44346094]
 [0.4443113 ]
 [0.43750563]
 [0.44471875]
 [0.43879446]
 [0.45072603]
 [0.4459998 ]
 [0.45328486]
 [0.44212952]
 [0.44063485]
 [0.44992277]
 [0.44392672]
 [0.4435485 ]
 [0.4452948 ]
 [0.44324708]
 [0.43839777]
 [0.44514406]
 [0.45280415]
 [0.45127022]
 [0.4513