# Introduction

I was watching some of the videos of the [fast.ai](http://course.fast.ai/) deep learning course (highly recommended), and I had an idea for a small project. How many times you took a photo and, for some reason, it appeared upside down? I wanted to see whether I can train a Neural Net model to predict whether an image is rotated.  

This notebook demonstrates how to create a neural net based classifier to predict the rotation of a given image using Python and Keras.

I used the "Dogs vs. Cats" [dataset](https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition). This dataset contains 25,000 dogs and cats images, however, since I wanted this notebook to run relatively fast, I only used 200 dogs images. 

The code below converts the images into tensors and rotates each image in one of 4 directions: 0°, 90°, 180° and 270°. Then it assigns a label for each image according to the rotation that was applied to it: '0' for 0° rotation, '1' for 90° , '2' for 180° rotation and '3' for 270° rotation. Finally, it uses one-hot-encoding to represent the labels.

In [None]:
%matplotlib inline
from keras.preprocessing.image import array_to_img,img_to_array,load_img
import numpy as np
import matplotlib.pyplot as plt
from glob import glob
from keras.utils import to_categorical

TRAIN_PATH = '../input/dogs-vs-cats-redux-kernels-edition/train/'
NUM_OF_IMAGES = 200
NUMBER_OF_ROTATIONS = 4

images,images_arr,labels = [],[],[]
label = 0
for path in glob(TRAIN_PATH+'dog*')[:NUM_OF_IMAGES]:
    #load the image
    img = load_img(path,target_size=(224,224))
    
    #convert the image to a tensor
    img_arr = img_to_array(img)
    
    #rotate the image according to the label
    img_arr = np.rot90(img_arr,label)
    
    #compute the rotated image
    img = array_to_img(img_arr)

    #save the images,tensors and labels
    images.append(img)
    images_arr.append(img_arr)
    labels.append(label)
    
    #next image will be rotated 90 degrees more
    label = (label+1)%NUMBER_OF_ROTATIONS
    
images_arr = np.asarray(images_arr)
labels = to_categorical(labels)

Lets plot the first 8 images with their corresponding rotation 

In [None]:
IMAGES_TO_PLOT = 8
_,axis = plt.subplots(1, IMAGES_TO_PLOT,figsize=(15,15))
for i,(img,label) in enumerate(zip(images,labels)):
    if i==IMAGES_TO_PLOT:
        break
        
    axis[i].imshow(img)
    axis[i].xaxis.set_visible(False) 
    axis[i].yaxis.set_visible(False)
    
    label = np.argmax(label)
    if label==0:
        axis[i].set_title('Original Image')
    else:
        axis[i].set_title('{}° Rotation'.format(label*90))

# Method

The general idea is pretty simple. We will take a trained network that can classify images into some predifined classes. One option is the [VGG16]((https://arxiv.org/pdf/1409.1556.pdf)) network which looks like this (image taken from [here](http://book.paddlepaddle.org/03.image_classification/)):
<img src="http://book.paddlepaddle.org/03.image_classification/image/vgg16.png">

As you can see, the network is pretty big (i.e., "very deep") and training such a model from scratch requires a lot more training examples, takes a few days, and requires more computer power than my laptop.

Instead, we will use this network to generate features for our images. We will do that by keeping the weights of the network fixed and removing the last fully connected layers. Instead of these layers, we will add a new fully connected layer and a softmax layer that takes as an input the output of the VGG16 model and tries to predict the rotation of the image. 

The VGG16 network extracts useful features and we train a new classifier to separate those features into our four classes. This technique is called transfer learning as knowledge is transferred from one task another. You can read more about it [here](http://cs231n.github.io/transfer-learning/).

# Implementation

Lets first create a train and test sets:

In [None]:
images_train = images[:NUM_OF_IMAGES//2]
images_test = images[NUM_OF_IMAGES//2:]

images_arr_train = images_arr[:NUM_OF_IMAGES//2]
images_arr_test = images_arr[NUM_OF_IMAGES//2:]

labels_train = labels[:NUM_OF_IMAGES//2]
labels_test = labels[NUM_OF_IMAGES//2:]

As we are not going to fine tune the weights of the VGG16 (we will keep them fixed during training), we can compute the extracted features a priori. This is going to make the training much faster as we will not have to recompute them in each forward pass. Unless you run this on a GPU, it is going to take a few minutes.

In [None]:
from keras.applications.vgg16 import VGG16
import h5py
from keras.engine import topology

def load_split_weights(model, model_path_pattern='model_%d.h5', memb_size=102400000):  
    """Loads weights from split hdf5 files.
    
    Parameters
    ----------
    model : keras.models.Model
        Your model.
    model_path_pattern : str
        The path name should have a "%d" wild card in it.  For "model_%d.h5", the following
        files will be expected:
        model_0.h5
        model_1.h5
        model_2.h5
        ...
    memb_size : int
        The number of bytes per hdf5 file.  
    """
    model_f = h5py.File(model_path_pattern, "r", driver="family", memb_size=memb_size)
    topology.load_weights_from_hdf5_group_by_name(model_f, model.layers)
    
    return model

'''
This code is taken from https://www.kaggle.com/ekkus93/keras-models-as-datasets-test
As we are running on Kaggle server, we can't download the VGG16 weights from github.
If you are running it on your machine, you can simply replace this code with:
base_model = VGG16(weights='imagenet', include_top=False)
'''
vgg16 = VGG16(include_top=False, weights=None)  
keras_models_dir = '../input/keras-models'
model_path_pattern = keras_models_dir + "/vgg16_weights_tf_dim_ordering_tf_kernels_%d.h5" 
base_model = load_split_weights(vgg16, model_path_pattern)

def pretrained_features(arr,base_model):
    features = base_model.predict(arr,batch_size=100, verbose=1)
    return features.reshape((features.shape[0],-1))

In [None]:
features_train = pretrained_features(images_arr_train,base_model)
features_test = pretrained_features(images_arr_test,base_model)

Now we can create a new classifier that given the preprocessed features predicts the rotation of the corresponding image

In [None]:
from keras.layers import Dense, GlobalAveragePooling2D,Dropout,Flatten
from keras.applications.vgg16 import VGG16,preprocess_input
from keras.models import Sequential
from keras.optimizers import Adam
from keras.regularizers import l2
from keras.callbacks import EarlyStopping,ReduceLROnPlateau
from keras.initializers import RandomNormal

model = Sequential()
model.add(Dense(128, input_dim=features_train.shape[1],activation='relu',
                kernel_regularizer=l2(0.1),kernel_initializer=RandomNormal(stddev=0.001)))
model.add(Dropout(0.5))
model.add(Dense(4, activation='softmax',
                kernel_regularizer=l2(0.1),kernel_initializer=RandomNormal(stddev=0.001)))
model.compile(optimizer=Adam(lr=0.0001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.fit(features_train, labels_train, batch_size=50, epochs=15,
          validation_data=(features_test,labels_test),
          callbacks=[EarlyStopping(patience=0)])

# Results

The classifier achieves about 82% accuracy on the validation set (a random classifier would achieve 25%).

Lets do some error analysis, starting from the confusion matrix.

In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sns


predictions = model.predict_classes(features_test)
true_classes = np.argmax(labels_test,axis=1)
cnf_matrix = confusion_matrix(true_classes, predictions)

sns.heatmap(cnf_matrix,annot=True,annot_kws={"size": 14})
plt.ylabel('True Class')
plt.xlabel('Predicted Class')

The numbers on the diagonal correspond to the class accuracy. The classifier is doing better on images with no rotation and images with 90 degrees rotation. Also, the classifier confuses 90 degrees rotations with 270 rotations. That make sense as these images are similar.

Lets examine the images with the lowest probability assigned by the model to the correct class (which will correspond to the highest cross entropy loss): 

In [None]:
predictions = model.predict_proba(features_test)
pred_true_class = predictions[range(len(predictions)),true_classes]
sorted_images = [images_test[i] for i in np.argsort(pred_true_class)]

_,axis = plt.subplots(1, 4,figsize=(15,15))
pred_true_class.sort()
for i in range(4):
    axis[i].imshow(array_to_img(sorted_images[i]))
    axis[i].xaxis.set_visible(False) 
    axis[i].yaxis.set_visible(False)
    axis[i].set_title('predicted: {0:.3f}'.format(pred_true_class[i]))

These are difficult cases. Especially the second image from the left.

Lets see the images for which the model has assigned the highest probability to the true class:

In [None]:
predictions = model.predict_proba(features_test)
pred_true_class = predictions[range(len(predictions)),true_classes]
sorted_images = [images_test[i] for i in np.argsort(pred_true_class)]

_,axis = plt.subplots(1, 4,figsize=(15,15))
pred_true_class.sort()
for i in range(4):
    axis[i].imshow(array_to_img(sorted_images[-1*(i+1)]))
    axis[i].xaxis.set_visible(False) 
    axis[i].yaxis.set_visible(False)
    axis[i].set_title('predicted: {0:.3f}'.format(pred_true_class[-1*(i+1)]))

All images are clear and are not rotated.

# Next Steps

- Train with more data - I've only used 100 images for training and more examples will probably improve the model. Additionally, the model used only dog images, and it is interesting to see whether it can perform as well on richer datasets (e.g., ImageNet).
- Tune the hyper-parameters - I didn't invest much time in picking the best hyper-parameters. Grid or random search will probably yield a better model.
- Predict more than 4 rotations - the model was trained to predict whether an image is rotated in 4 directions. An obvious extension would be to train it to predict finer grain rotations, i.e., 360 classes, or perhaps predict a continuous rotation (i.e., regression).