# Deep Learning: Image Classification

Zening Qu (zqu@uw.edu), Sean Yang

Last updated: May 11, 2020

## Introduction

People rarely train deep neural networks from scratch (with random initialization) these days. Instead, we often use some form of [transfer learning](http://cs231n.github.io/transfer-learning/), which uses a [pretrained network](https://keras.rstudio.com/articles/applications.html) to save training time and improve model prediction. In this assignment, we will explore and compare two widely-used transfer learning techniques: feature extraction and fine-tuning. 


**Dataset**

We will use the [Viziometrics dataset](https://canvas.uw.edu/files/59549223/download?download_frd=1) ([paper1](https://ieeexplore.ieee.org/abstract/document/7888968), [paper2](https://arxiv.org/abs/1908.07465)). The dataset contains 2k+ images from scientific publications that are hand-labeled as one of the five classes: equation, photo, scheme, table, and visualization. The train, validation, and test set has already been splitted for us. 


**Tasks**

We have two tasks. In part 1, "feature extraction", we will use a pretrained network, ResNet50, to extract features from the five classes of images. Using these extracted features and labels, we will train a neural network and make predictions. This approach is called "feature extraction" because we use a pretrained network to extract features and then use the features ***instead of*** our raw image data to train our model for prediction.

In part 2, "fine-tuning", we will append our own prediction layers to ResNet50; we will train this modified network on the raw images and the labels. This approach is called "fine tuning" because ResNet50 was pretrained on ImageNet but we are fine-tuning the model weights on our own dataset. We are not learning the weights of this complicated modified model from scratch; we start from the ResNet50 pretrained weights.

As you complete the two tasks you will notice that, if we use the same prediction layers, fine-tuning gives better validation accuracy than feature extraction. This is because with fine-tuning, there are a lot more layers at work during training. Because fine-tuning uses more layers, it will also take much longer time to train compared to feature extraction.


**Before You Start**

Please make sure you have [numpy](https://numpy.org/), [Keras](https://keras.io/) and one of its backend engines (e.g., [TensorFlow](https://www.tensorflow.org/install/)).


**What to Turn In**

We have provided some starter code. Each task you need to complete is marked with a red school bag 🎒and is worth a few points. Each task will either ask you to print something or report certain metrics. **Please submit your completed notebook with the code and the required output of each task. You will not receive full points if either code or output is missing.**

👉 **_Please start early. It takes time to train these networks._**

## 0. Load the Data (2 points)

🎒<font color='red'>(2 points)</font> Put the dataset in the same folder as this notebook, then run the following starter code. Because our training set contains a lot of images, we use [`ImageDataGenerator.flow_from_directory()`](https://keras.io/preprocessing/image/) to load images in batches. Please refer to [Vijayabhaskar's tutorial](https://medium.com/@vijayabhaskar96/tutorial-image-classification-with-keras-flow-from-directory-and-generators-95f75ebe5720) to understand the parameterization of the generators. This starter code should output something like:

```
Found xxx images belonging to 5 classes.
Found xxx images belonging to 5 classes.
Found xxx images belonging to 5 classes.
```

Please include this output in your submission.

In [1]:
import numpy as np
import keras
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.resnet50 import ResNet50, decode_predictions
from keras import layers
from keras.models import Model, Sequential
from keras.callbacks.callbacks import ModelCheckpoint, EarlyStopping

Using TensorFlow backend.


In [2]:
# create train, validation, and test generators from our image directory

datagen = ImageDataGenerator()

train_generator = datagen.flow_from_directory(
  directory=r"./viziometrics/train/",
  target_size=(224, 224),
  color_mode="rgb",
  batch_size=32,
  class_mode="categorical",
  shuffle=False,
  seed=42
)

val_generator = datagen.flow_from_directory(
  directory=r"./viziometrics/val/",
  target_size=(224, 224),
  color_mode="rgb",
  batch_size=32,
  class_mode="categorical",
  shuffle=False,
  seed=42
)

test_generator = datagen.flow_from_directory(
  directory=r"./viziometrics/test/",
  target_size=(224, 224),
  color_mode="rgb",
  batch_size=1,
  class_mode=None,
  shuffle=False,
  seed=42
)

Found 2733 images belonging to 5 classes.
Found 1571 images belonging to 5 classes.
Found 1563 images belonging to 5 classes.


## 1. Feature Extraction (28 points)

We will use a pretrained network (ResNet50) to extract features from the images. Then we will train a not-so-deep neural network on the features (X) and the labels (y). Finally we will evaluate our network. Please read [Vikas' tutorial on feature extraction](https://www.learnopencv.com/keras-tutorial-transfer-learning-using-pre-trained-models/) for code examples on feature extraction; we will use the same pipeline here.

🎒<font color='red'>(2 points)</font> Run the following code to load a [ResNet50 network](https://keras.rstudio.com/reference/application_resnet50.html) pretrained on the [ImageNet](http://www.image-net.org/) dataset. Note that we say `include_top=False` to exclude the last (or top) fully-connected layer which is responsible for classifying the labels in ImageNet; we want to use the earlier layers to extract features from our own dataset. Use `model.summary()` to inspect the ResNet50 architecture. This function should print out all the layers - their names, types, output shapes etc. Please include the printout in your submission.

In [3]:
# download the pre-trained ResNet50 model
resnet = keras.applications.resnet.ResNet50(include_top=False, weights='imagenet', input_shape=(224,224,3))

# inspect the ResNet50 architecture
resnet.summary()

# observe that the last layer, conv5_block3_out (Activation), has output shape 7 * 7 * 2048
# let's save this dimension for later use
DIM = 7 * 7 * 2048

Model: "resnet50"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 224, 224, 3)  0                                            
__________________________________________________________________________________________________
conv1_pad (ZeroPadding2D)       (None, 230, 230, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
conv1_conv (Conv2D)             (None, 112, 112, 64) 9472        conv1_pad[0][0]                  
__________________________________________________________________________________________________
conv1_bn (BatchNormalization)   (None, 112, 112, 64) 256         conv1_conv[0][0]                 
___________________________________________________________________________________________

🎒<font color='red'>(4 points)</font> Use ResNet50 to extract features from the images. Print the shape of the extracted features.

👉 We use [`model.predict()`](https://keras.io/models/sequential/#predict_generator) to extract features. This function will return the output of the last layer of ResNet50, namely, conv5_block3_out (Activation). These are the features captured by the pretrained network.

👉 This step is a bit slow. Setting `verbose=1` can make the wait a bit easier.

In [4]:
# # use ResNet50 to extract features from the images
train_features = resnet.predict(train_generator, verbose=1) # takes ~4min on my laptop




In [5]:
val_features = resnet.predict(val_generator, verbose=1) # 2min



In [6]:
test_features = resnet.predict(test_generator, verbose=1) # 2min



In [7]:
print(train_features.shape) 
print(val_features.shape)
print(test_features.shape)

# hint: if you encounter the error 
# 'ImportError: Could not import PIL.Image'
# try 'pip3 install pillow' and restart the notebook kernel
# see: https://stackoverflow.com/questions/48225729/importerrorcould-not-import-pil-image-working-with-keras-ternsorflow/50775336

(2733, 7, 7, 2048)
(1571, 7, 7, 2048)
(1563, 7, 7, 2048)


🎒<font color='red'>(2 points)</font> Observe that the features extracted by ResNet50 are 4D arrays. Following common practice, we want to reshape the features into 2D arrays, which will be the input X of our not-so-deep neural network classifier. Run the following code and print the shape of the 2D arrays.

In [8]:
# reshape the features to 2D arrays
train_X = train_features.reshape((-1, DIM))
val_X = val_features.reshape((-1, DIM))
test_X = test_features.reshape((-1, DIM))

print(train_X.shape, val_X.shape, test_X.shape)

(2733, 100352) (1571, 100352) (1563, 100352)


🎒<font color='red'>(3 points)</font> Read the class labels from the generators. Print `test_labels`

In [9]:
# # read the class labels from the generators
train_labels = train_generator.classes
#print(train_labels)
val_labels = val_generator.classes
#print(val_labels)
test_labels = test_generator.classes


In [10]:
import sys
import numpy
numpy.set_printoptions(threshold=sys.maxsize)
print(test_labels)

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 

🎒<font color='red'>(3 points)</font> Observe that the labels are not one-hot encoded. We need to one-hot encode these labels and they will be the y that our model will predict.

In [11]:
# # get one-hot encoding of labels
def get_one_hot(labels, nb_classes):
    res = np.eye(nb_classes)[np.array(labels).reshape(-1)]
    return res.reshape(list(labels.shape)+[nb_classes])

In [12]:
# #use get_one_hot()
NUM_CLASSES = 5
train_y = get_one_hot(train_labels, 5)
#print(train_y)
val_y = get_one_hot(val_labels, 5)
#print(val_y)
test_y = get_one_hot(test_labels, 5)
#print(test_y)

🎒<font color='red'>(4 points)</font> Define neural network that contains a few layers. There is a three-layer example network below. Please modify this network - you can add layers, remove layers, or change some of the parameters. Please print out your model architecture using `model.summary()`.

In [28]:
# create our model: a not-so-deep neural network
model = Sequential()

# input layer takes arrays of shape (*, DIM)
# todo: please modify this network to define your own model
model.add(layers.Dense(500, activation = "relu", input_shape=(DIM,))) 
model.add(layers.Dropout(0.3, noise_shape=None, seed=None))
model.add(layers.Dense(250, activation = "relu"))
model.add(layers.Dense(180, activation = "relu"))
model.add(layers.Dense(NUM_CLASSES, activation = "softmax"))



# print out network architecture
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_6 (Dense)              (None, 500)               50176500  
_________________________________________________________________
dropout_3 (Dropout)          (None, 500)               0         
_________________________________________________________________
dense_7 (Dense)              (None, 250)               125250    
_________________________________________________________________
dense_8 (Dense)              (None, 180)               45180     
_________________________________________________________________
dense_9 (Dense)              (None, 5)                 905       
Total params: 50,347,835
Trainable params: 50,347,835
Non-trainable params: 0
_________________________________________________________________


🎒<font color='red'>(2 points)</font> Compile your model using [`model.compile()`](https://keras.io/models/model/#compile). Choose `sgd` optimizer and `categorical_crossentropy` loss. Set `metrics = ["accuracy"]`

In [29]:
# # you need to compile the model before you can train it
model.compile(optimizer= 'sgd',loss='categorical_crossentropy',
              metrics=['accuracy'])


🎒<font color='red'>(7 points)</font> Train the model you defined for about 10 epochs, or until the validation accuracy reaches 0.8 (whichever comes earlier). Note that the model should be trained using `train_X`, `train_y`, `val_X`, and `val_y`. `train_X` and `val_X` are our extracted features; `train_y` and `val_y` are our one-hot-encoded labels. Do not use `train_generator` or `val_generator` to train this model because those generators represent our raw image data. We have already extracted features from the raw images and hence no longer need to use the images for training. Observe how the train accuracy and validation accuracy changes over epochs. Report (in a code comment) the best train and validation accuracy you've seen during training.

👉 Note that you can [check-point](https://keras.io/callbacks/#modelcheckpoint) model weights during training. After training, you can load back the weights of your best model.

In [30]:
# # save model weights while model is under training
checkpoint = ModelCheckpoint(
     'feature-extraction-weights.{epoch:02d}-{val_loss:.2f}.hdf5',
     monitor='val_loss', 
     verbose=1,
     save_best_only=True
     )

stopping = EarlyStopping(monitor='val_loss', mode='auto', baseline = 0.8,verbose=1)
# # train the model for about 10 epochs
# # each epoch takes about 20 seconds on my laptop
model.fit(
    x = train_X, 
    y = train_y, 
    epochs = 10,
    validation_data = (val_X,val_y),
    callbacks = [checkpoint,stopping]
    )
## best  train_accuracy: 0.9751 
## best val_accuracy: 0.8434

Train on 2733 samples, validate on 1571 samples
Epoch 1/10

Epoch 00001: val_loss improved from inf to 1.16255, saving model to feature-extraction-weights.01-1.16.hdf5


<keras.callbacks.callbacks.History at 0x82eca5d90>

In [31]:
model.fit(
    x = train_X, 
    y = train_y, 
    epochs = 10,
    validation_data = (val_X,val_y),
    callbacks = [checkpoint]
    )

Train on 2733 samples, validate on 1571 samples
Epoch 1/10

Epoch 00001: val_loss improved from 1.16255 to 0.61890, saving model to feature-extraction-weights.01-0.62.hdf5
Epoch 2/10

Epoch 00002: val_loss did not improve from 0.61890
Epoch 3/10

Epoch 00003: val_loss did not improve from 0.61890
Epoch 4/10

Epoch 00004: val_loss did not improve from 0.61890
Epoch 5/10

Epoch 00005: val_loss did not improve from 0.61890
Epoch 6/10

Epoch 00006: val_loss improved from 0.61890 to 0.56602, saving model to feature-extraction-weights.06-0.57.hdf5
Epoch 7/10

Epoch 00007: val_loss did not improve from 0.56602
Epoch 8/10

Epoch 00008: val_loss improved from 0.56602 to 0.56091, saving model to feature-extraction-weights.08-0.56.hdf5
Epoch 9/10

Epoch 00009: val_loss did not improve from 0.56091
Epoch 10/10

Epoch 00010: val_loss did not improve from 0.56091


<keras.callbacks.callbacks.History at 0x827c0aa50>

🎒<font color='red'>(1 point)</font> Report the test accuracy of your best model.

In [32]:
# # load the weights of your best model (model.load_weights)
d = model.load_weights("feature-extraction-weights.06-0.57.hdf5")
# # measure test accuracy
results = model.evaluate(test_X, test_y)
## best test_accuracy : 0.81.95



In [33]:
results

[0.5765993230721734, 0.8195777535438538]

## 2. Fine-tuning (10 points)

Now we will explore the second technique: fine-tuning. We will append our own classification layers to the ResNet50 layers and train the entire network (from images to labels). We will start over from the generators.

In [37]:
# create train, validation, and test generators from our image directory

datagen = ImageDataGenerator()

train_generator = datagen.flow_from_directory(
  directory=r"./viziometrics/train/",
  target_size=(224, 224),
  color_mode="rgb",
  batch_size=32,
  class_mode="categorical",
  shuffle=True,
  seed=42
)

val_generator = datagen.flow_from_directory(
  directory=r"./viziometrics/val/",
  target_size=(224, 224),
  color_mode="rgb",
  batch_size=32,
  class_mode="categorical",
  shuffle=True,
  seed=42
)

test_generator = datagen.flow_from_directory(
  directory=r"./viziometrics/test/",
  target_size=(224, 224),
  color_mode="rgb",
  batch_size=1,
  class_mode="categorical",
  shuffle=False,
  seed=42
)

Found 2733 images belonging to 5 classes.
Found 1571 images belonging to 5 classes.
Found 1563 images belonging to 5 classes.


🎒<font color='red'>(3 points)</font> Append classification layers to ResNet50. Please modify the example layers below to define your own classification layers. Print the network architecture of your own model.

In [38]:
# # add custom prediction layers to ResNet50
x = resnet.output
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.3)(x)
predictions = layers.Dense(NUM_CLASSES, activation= 'softmax')(x)
model = Model(inputs = resnet.input, outputs = predictions) 
model.summary()

Model: "model_2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 224, 224, 3)  0                                            
__________________________________________________________________________________________________
conv1_pad (ZeroPadding2D)       (None, 230, 230, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
conv1_conv (Conv2D)             (None, 112, 112, 64) 9472        conv1_pad[0][0]                  
__________________________________________________________________________________________________
conv1_bn (BatchNormalization)   (None, 112, 112, 64) 256         conv1_conv[0][0]                 
____________________________________________________________________________________________

🎒<font color='red'>(1 point)</font> Compile your model using [`model.compile()`](https://keras.io/models/model/#compile). Choose `sgd` optimizer and `categorical_crossentropy` loss. Set `metrics = ["accuracy"]`

In [39]:
# # need to compile the model before training
model.compile(optimizer= 'sgd',loss='categorical_crossentropy',
              metrics=['accuracy'])

🎒<font color='red'>(5 points)</font> Train the model you defined for about 10 epochs, or until the validation accuracy reaches 0.8 (whichever comes earlier). This model should be trained on the images and their labels, so we can use `train_generator` and `val_generator` to fit the model. Checkpoint model weights during training. Report (in a code comment) the best train and validation accuracy you've seen during training.

In [40]:
# # train the model for about 10 epochs
# # each epoch takes about 800 seconds on my laptop
checkpoint = ModelCheckpoint(
     'feature-extraction-weights.{epoch:02d}-{val_loss:.2f}.hdf5',
     monitor='val_loss', 
     verbose=1,
     save_best_only=True
     )

stopping = EarlyStopping(monitor='val_loss', mode='auto', baseline = 0.8,verbose=1)

STEP_SIZE_TRAIN=train_generator.n/train_generator.batch_size
STEP_SIZE_VAL=val_generator.n/val_generator.batch_size
model.fit_generator(generator = train_generator,
                     steps_per_epoch = STEP_SIZE_TRAIN ,
                     validation_data =val_generator,
                     validation_steps= STEP_SIZE_VAL,
                     epochs= 10,
                     callbacks = [checkpoint,stopping]
 )

Epoch 1/10

Epoch 00001: val_loss improved from inf to 1.32264, saving model to feature-extraction-weights.01-1.32.hdf5


<keras.callbacks.callbacks.History at 0x96d79dc90>

In [47]:
## Validation accuracy : 0.5563
## Train accuracy : 0.9301

🎒<font color='red'>(1 point)</font> Report the test accuracy of your best model.

In [41]:
# # load the weights of your best model
best_model_fine_tuning = model.load_weights("feature-extraction-weights.01-1.32.hdf5")

In [42]:
STEP_SIZE_TEST = test_generator.n//test_generator.batch_size
STEP_SIZE_TEST

1563

In [43]:
# # measure test accuracy
score = model.evaluate_generator(generator = test_generator, steps = 1563)
# # measure test accuracy

In [46]:
print("test accuracry score:",score[1])

test accuracry score: 0.5527831315994263
