**Welcome to GeekHub 2018-2019 DL workshop**

First of all chech that you have runtime has GPU acceleration.

We will try to play with the image classification on the toy dataset.

And first we need to make some imports. You can check it now or later.

In [0]:
#@title Import numpy, keras submodules and some other modules

import numpy as np
from keras import layers, initializers, optimizers, regularizers
from keras import datasets, models, callbacks, applications
import matplotlib.pyplot as plt

And because of "International Women's Day" we will take "Fashion MNIST"

In [0]:
fashion_mnist = datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

Lets check whats inside

In [0]:
print("Train shape:", train_images.shape)
print("Unique values in labels:", np.unique(train_labels))
print("Test shape:", test_images.shape)
print("Train images min:", np.min(train_images), "max:", np.max(train_images))

print("First image:")
plt.figure()
plt.imshow(train_images[0], cmap=plt.cm.binary)
plt.grid(False)
plt.show()

print("Class for this image:", class_names[train_labels[0]])

And lets see more examples

In [0]:
plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[train_labels[i]])
plt.show()

Deep Learning models like normalized values, so lets normalize images.

For simplicity we can divide on 255.
But you can use StandardScaler also. Be careful with the shapes in such case.

Also be carefull with the types: `train_images` and `test_images` are uint8.

Please try to make it by yourself before looking on solution


In [0]:
#train_images_scaled = 
#test_images_scaled = 

In [0]:
#@title Solution

#solution #1
# train_images_scaled = train_images.astype(np.float32)/255.0
# test_images_scaled = test_images.astype(np.float32)/255.0

#solution #2
# from sklearn.preprocessing import StandardScaler

# scaler = StandardScaler()
# train_images_scaled = scaler.fit_transform(train_images.astype(np.float32).reshape(-1, 28*28)).reshape(-1, 28, 28)
# test_images_scaled = scaler.transform(test_images.astype(np.float32).reshape(-1, 28*28)).reshape(-1, 28, 28)

If you plan to use classic `categorical_crossentropy` you should also do one-hot encoding for labels to calculate loss.
But now in keras there is special loss to use raw labels. And this is the way for PyTorch also.
So, lets keep labels as it is.

**Finaly** we can do deep learning :)
Lets try something super simple and create "retro" model - multilayer perceptron. And try to use it for our task.

We will start right from the functional API of Keras. Sequential is ok for such simple examples, but functional API is much better with complex models.

First of all lets discuss how models and layers created and build our first model.

Can you make it deeper?

In [0]:
def getModel(input_shape):
  inp = layers.Input(input_shape)
  flatten = layers.Flatten()(inp) # (batch, width, height) -> (batch, length)
  fc = layers.Dense(100, activation='relu')(flatten)
  # fc = ?
  out = layers.Dense(10, activation='softmax')(fc)
  
  return models.Model(inp, out)
  
model = getModel((28, 28))

In [0]:
#@title Possible solution
# def getModel(input_shape):
#   inp = layers.Input(input_shape)
#   flatten = layers.Flatten()(inp) # (batch, width, height) -> (batch, length)
#   fc = layers.Dense(100, activation='relu')(flatten)
#   fc = layers.Dense(40, activation='relu')(fc)
#   out = layers.Dense(10, activation='softmax')(fc)
  
#   return models.Model(inp, out)
  
# model = getModel((28, 28))

To start training we need `compile` model - ask keras to prepare graph, losses, metrics and optimizer.

Pay attention that we used 'sparse_categorical_crossentropy' as we use raw labels.

For first time we will pass strings with the names to let keras use everything with the default parameters.
But you can use your functions for loss and metrics and optimizer class as optimizer.

In [0]:
model.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

Now we are ready for the fit-predict :)

But wait, you are not the first day in data science. Lets divide train on train and validation first.

In [0]:
from sklearn.model_selection import train_test_split

train_X, val_X, train_Y, val_Y = train_test_split(train_images_scaled, train_labels, test_size=0.2, random_state=42, stratify=train_labels)

Ok, now we can make fit.
Interface very close to sklearn with some modifications.

Here default mode is verbose as usually train may last long.

In [0]:
model.fit(train_X, train_Y,
          validation_data=(val_X, val_Y),
          batch_size=64,
          epochs=10
         )

88% accuracy - not so bad. Maybe you already have more? :)

Remember that we have test set also. Lets check values on it.

In [0]:
model.evaluate(test_images_scaled, test_labels)

I got 87%. Nice.

But can we do better?
I hope your answer is "Sure!"

You should already know that for images better to use **convolution networks**.
Lets start with the classic architectures and make something VGG16-like.

It should be easy for you to fill the gap.

You can also play with filters count or kernel_size.

Please note that activation can be used as separate layer. It is same as use it inside `Conv2D` layer, but leave you a place for normalization layers.

In [0]:
def getModel(input_shape):
  inp = layers.Input(input_shape)
  
  conv = layers.Conv2D(32, 3, padding='same')(inp) #filters=32, kernel size=3
  conv = layers.Activation('relu')(conv)
  conv = layers.Conv2D(32, 3, padding='same')(conv)
  conv = layers.Activation('relu')(conv)
  pool = layers.MaxPool2D()(conv) #default pool_size=2
  
  #?
  
  pool = layers.GlobalMaxPool2D()(conv)  
  # ?
  out = layers.Activation('softmax')(fc)    
  return models.Model(inp, out)
  
model = getModel((28, 28, 1))

model.compile(
    #?
              )

model.fit(train_X.reshape(-1, 28, 28, 1), train_Y,
          validation_data=(val_X.reshape(-1, 28, 28, 1), val_Y),
          batch_size=64,
          epochs=15
         )

In [0]:
#@title Possible solution

# def getModel(input_shape):
#   inp = layers.Input(input_shape)
  
#   conv = layers.Conv2D(32, 3, padding='same')(inp) #filters=32, kernel size=3
#   conv = layers.Activation('relu')(conv)
#   conv = layers.Conv2D(32, 3, padding='same')(conv)
#   conv = layers.Activation('relu')(conv)
#   pool = layers.MaxPool2D()(conv) #default pool_size=2
  
#   conv = layers.Conv2D(64, 3, padding='same')(pool)
#   conv = layers.Activation('relu')(conv)
#   conv = layers.Conv2D(64, 3, padding='same')(conv)
#   conv = layers.Activation('relu')(conv)
#   pool = layers.MaxPool2D()(conv) 
  
#   conv = layers.Conv2D(128, 3, padding='same')(pool) 
#   conv = layers.Activation('relu')(conv)
#   conv = layers.Conv2D(128, 3, padding='same')(conv)
#   conv = layers.Activation('relu')(conv)
  
#   pool = layers.GlobalMaxPool2D()(conv)  
#   fc = layers.Dense(40)(pool)
#   fc = layers.Activation('relu')(fc)
#   fc = layers.Dense(10)(fc)
#   out = layers.Activation('softmax')(fc)
  
  
#   return models.Model(inp, out)
  
# model = getModel((28, 28, 1))

# model.compile(optimizer='adam', 
#               loss='sparse_categorical_crossentropy',
#               metrics=['accuracy'])

# model.fit(train_X.reshape(-1, 28, 28, 1), train_Y,
#           validation_data=(val_X.reshape(-1, 28, 28, 1), val_Y),
#           batch_size=64,
#           epochs=13
#          )

Hm... looks like overfit.
Hope you got it also. If not, go back and do it :)

What we can do with this:
- reduce model size
- regularization
- dropout as regularization
- early stopping
- other models
- learning rate chnages
- ...

Lets do easiest points: dropout and early stopping+reduce learning rate on plateau.

In [0]:
def getModel(input_shape):
  inp = layers.Input(input_shape)
  
  conv = layers.Conv2D(32, 3, padding='same')(inp) #filters=32, kernel size=3
  conv = layers.Activation('relu')(conv)
  conv = layers.Conv2D(32, 3, padding='same')(conv)
  conv = layers.Activation('relu')(conv)
  pool = layers.MaxPool2D()(conv) #default pool_size=2
  
  conv = layers.Conv2D(64, 3, padding='same')(pool)
  conv = layers.Activation('relu')(conv)
  conv = layers.Conv2D(64, 3, padding='same')(conv)
  conv = layers.Activation('relu')(conv)
  pool = layers.MaxPool2D()(conv) 
  
  conv = layers.Conv2D(128, 3, padding='same')(pool) 
  conv = layers.Activation('relu')(conv)
  conv = layers.Conv2D(128, 3, padding='same')(conv)
  conv = layers.Activation('relu')(conv)
  
  pool = layers.GlobalMaxPool2D()(conv)
  # dropout
  fc = layers.Dense(40)(pool)
  fc = layers.Activation('relu')(fc)
  # dropout
  fc = layers.Dense(10)(fc)
  out = layers.Activation('softmax')(fc)   
  return models.Model(inp, out)
  
model = getModel((28, 28, 1))

model.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(train_X.reshape(-1, 28, 28, 1), train_Y,
          validation_data=(val_X.reshape(-1, 28, 28, 1), val_Y),
          batch_size=64,
          epochs=15,
          callbacks=[
              callbacks.ModelCheckpoint('weights.h5', verbose=1, save_best_only=True, save_weights_only=True),
              callbacks.ReduceLROnPlateau(patience=2, verbose=1),
              callbacks.EarlyStopping(patience=4, verbose=1)
          ]
         )

model.load_weights('weights.h5') 

In [0]:
#@title Possible solution

# def getModel(input_shape):
#   inp = layers.Input(input_shape)
  
#   conv = layers.Conv2D(32, 3, padding='same')(inp) #filters=64, kernel size=3
#   conv = layers.Activation('relu')(conv)
#   conv = layers.Conv2D(32, 3, padding='same')(conv)
#   conv = layers.Activation('relu')(conv)
#   pool = layers.MaxPool2D()(conv) #default pool_size=2
  
#   conv = layers.Conv2D(64, 3, padding='same')(pool)
#   conv = layers.Activation('relu')(conv)
#   conv = layers.Conv2D(64, 3, padding='same')(conv)
#   conv = layers.Activation('relu')(conv)
#   pool = layers.MaxPool2D()(conv) 
  
#   conv = layers.Conv2D(128, 3, padding='same')(pool) 
#   conv = layers.Activation('relu')(conv)
#   conv = layers.Conv2D(128, 3, padding='same')(conv)
#   conv = layers.Activation('relu')(conv)
  
#   pool = layers.GlobalMaxPool2D()(conv)
#   pool = layers.Dropout(0.2)(pool)
#   fc = layers.Dense(40)(pool)
#   fc = layers.Activation('relu')(fc)
#   fc = layers.Dropout(0.1)(fc)
#   fc = layers.Dense(10)(fc)
#   out = layers.Activation('softmax')(fc)
  
  
#   return models.Model(inp, out)
  
# model = getModel((28, 28, 1))

# model.compile(optimizer='adam', 
#               loss='sparse_categorical_crossentropy',
#               metrics=['accuracy'])

# model.fit(train_X.reshape(-1, 28, 28, 1), train_Y,
#           validation_data=(val_X.reshape(-1, 28, 28, 1), val_Y),
#           batch_size=64,
#           epochs=15,
#           callbacks=[
#               callbacks.ModelCheckpoint('weights.h5', verbose=1, save_best_only=True, save_weights_only=True),
#               callbacks.ReduceLROnPlateau(patience=2, verbose=1),
#               callbacks.EarlyStopping(patience=4, verbose=1)
#           ]
#          )

# model.load_weights('weights.h5') 

Ok. Overfit happened, but we took correct point. And it looks best from what we saw.
Lets verify with the test set.

In [0]:
model.evaluate(test_images_scaled.reshape(-1, 28, 28, 1), test_labels)


Almost 93% accuracy. Good improvement from 87% with the multilayered perceptron.

Lets try another model structure and look on several additional possibilities.

1. Code for generating model graph is just a code, so you can organize it in loops, generators, functions, classes... Whatever you want
2. Pay attention how easily we added batch normalization
3. Keras support multiple inputs and outputs. While it is not needed here, I added 2 outputs for the next model. Please check how it is used in `Model`, `compile` and `fit`

In [0]:
regularization = regularizers.L1L2(0, 1e-4)

def res_block(x):
  count = int(x.shape[-1])
  
  conv = layers.Conv2D(count, 3, padding='same', kernel_regularizer=regularization)(x)
  conv = layers.BatchNormalization()(conv)
  conv = layers.Activation('relu')(conv)
  conv = layers.Conv2D(count, 3, padding='same', kernel_regularizer=regularization)(conv)
  conv = layers.BatchNormalization()(conv)
  conv = layers.Activation('relu')(conv)
  
  return layers.add([conv, x])


def strided_res_block(x, filters):
  conv = layers.Conv2D(filters, 3, padding='same', strides=(2,2), kernel_regularizer=regularization)(x)
  conv = layers.BatchNormalization()(conv)
  conv = layers.Activation('relu')(conv)
  conv = layers.Conv2D(filters, 3, padding='same', kernel_regularizer=regularization)(conv)
  conv = layers.BatchNormalization()(conv)
  
  shortcut = layers.Conv2D(filters, 1, padding='same', strides=(2,2), kernel_regularizer=regularization)(x)
  shortcut = layers.BatchNormalization()(shortcut)
  
  s = layers.add([conv, shortcut])
  s = layers.Activation('relu')(s)
  
  return s


def getModel(input_shape):
  inp = layers.Input(input_shape)
  
  conv = layers.Conv2D(64, 3, padding='same', strides=(2,2), kernel_regularizer=regularization)(inp)
  conv = layers.BatchNormalization()(conv)
  conv = layers.Activation('relu')(conv)
    
  conv = res_block(conv)
  conv = res_block(conv)
  
  conv = strided_res_block(conv, 128)
  conv = res_block(conv)
  
  out2 = layers.GlobalAveragePooling2D()(conv)
  out2 = layers.Dropout(0.2)(out2)
  out2 = layers.Dense(10, activation='softmax', name='aux_out')(out2)
  
  conv = strided_res_block(conv, 256)
  conv = res_block(conv)
  
  conv = strided_res_block(conv, 512)
  conv = res_block(conv)   
  
  pool = layers.GlobalAveragePooling2D()(conv)
  pool = layers.Dropout(0.2)(pool)
  fc = layers.Dense(50)(pool)
  fc = layers.Activation('relu')(fc)
  fc = layers.Dense(10)(fc)
  out = layers.Activation('softmax', name='out')(fc)
   
  return models.Model(inputs=inp, outputs=[out, out2])
  
  
model = getModel((28, 28, 1))

model.compile(optimizer=optimizers.Adam(1e-3), 
              loss=['sparse_categorical_crossentropy', 'sparse_categorical_crossentropy'],
              loss_weights=[1, 0.5],
              metrics=['accuracy'])

model.fit(train_X.reshape(-1, 28, 28, 1), [train_Y, train_Y],
          validation_data=(val_X.reshape(-1, 28, 28, 1), [val_Y, val_Y]),
          batch_size=256,
          epochs=25,
          callbacks=[
              callbacks.ModelCheckpoint('weights.h5', verbose=1, 
                                        save_best_only=True, save_weights_only=True,
                                        monitor='val_out_loss'),
              callbacks.ReduceLROnPlateau(patience=2, verbose=1, monitor='val_out_loss'),
              callbacks.EarlyStopping(patience=5, verbose=1, monitor='val_out_loss')
          ]
         )

model.load_weights('weights.h5') 

With the 2 outputs `predict` also return 2 numpy arrays. We are interested only in first one.

In [0]:
predict = model.predict(test_images_scaled.reshape(-1, 28, 28, 1))[0]
labels_predict = np.argmax(predict, axis=-1)

from sklearn.metrics import accuracy_score
print(accuracy_score(labels_predict, test_labels))

0.9273


Accuracy not improved.
So our resnet-like structure not helped.
Thats may have lot of explanations: from wrong hyperparameters to any new structure ould not help much.

You can play with the different structures, by your own.
But, now lets try some other tricks with another dataset. 
Check 2nd colab notebook.