## Cats and Dogs using the TensorFlow Backend

This notebook shows how to take the 'Cats and Dogs' classifier from the very first lesson of the amazing 'Practical Deep Learning For Coders' course, and convert it to run using TensorFlow (instead of Theano).

http://course.fast.ai/

Why do we care what backend we use?

There are a few reasons, many of which are well covered here (as well as some good instructions on installing TensorFlow):
    http://www.pyimagesearch.com/2016/11/14/installing-keras-with-tensorflow-backend/

My compeling use case is that TensorFlow has good support (and example code) for running models on Android and iOS.  However, in order to use them, we'll need to be able to export our Keras model as a TensorFlow graph (more on this later).

For simplicity (and so you can inspect every step), I've taken the examples from the course notebook, and modified them so that everything fits into a single notebook without external library files.

#### Step 1: Install Tensorflow and configure it as your Keras backend

Install TensorFlow on your device or VM image (personally I use the anaconda install):
    https://www.tensorflow.org/versions/master/get_started/os_setup#anaconda_installation
    
Configure TensorFlow as your Keras backend:
    https://keras.io/backend/
    
For reference the following is my Keras config file (you might have a real machine, and want to use GPU instead):
```
$ cat $HOME/.keras/keras.json 
{
    "image_dim_ordering": "tf",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "tensorflow"
}
```

Note: That we could have saved ourselves a lot of trouble by using Theano ("th") dim ordering with the TensorFlow backend.  Although this is supported, I've found that training is *much* slower using Theano dim ordering, and doing the full conversion now is probably going to save us some work in the long run anyway.


#### Step 2: Ensure we are using the Tensorflow backend.

Ok, so let's do our regular imports, but make sure that Keras is using TensorFlow.

The first time you do the import, Keras will tell you which backend you're using when you import the first keras module.  Make sure it's TensorFlow before going any further.

In [1]:
import numpy as np
import tensorflow as tf

from keras import backend as K

from keras.utils.data_utils import get_file
from keras.models import Sequential
from keras.optimizers import SGD, RMSprop, Adam
from keras.layers import Input
from keras.layers.core import Flatten, Dense, Dropout, Lambda
from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.layers.normalization import BatchNormalization

from keras.preprocessing import image

Using TensorFlow backend.


#### Step 3: Define our VGG16 model

The VGG16 model we define below is *almost* identical to the one from the Deep Learning Lesson 1 Notebook (https://github.com/fastai/courses/blob/master/deeplearning1/nbs/lesson1.ipynb).

However, there are a few key differences:
* Some of the syntatic sugar we used with Theano for slicing tensors does not work using TensorFlow.
* TensorFlow orders image data in a different order from Theano ("image_dim_ordering": "tf"). Theano puts the channel data at the front of the tensor, while TensorFlow puts it at the back.
* As you'll see later, in order to convert our existing weights we actually need our model to support both types of ordering (so we added code to support both)



In [2]:
def vgg_preprocess(x):
    # We need to construct the vgg_mean differently depending on the
    # dimention ordering.
    if (K.image_dim_ordering() == 'th'):
        vgg_mean = np.array([123.68, 116.779, 103.939]).reshape((3,1,1))
        axis = 1
    else:
        vgg_mean = np.array([123.68, 116.779, 103.939]).reshape((1,1,3))
        axis = 3
    
    x = x - vgg_mean     # subtract mean
    
    # TensorFlow tensors do not support this type of slice operation:
    #     return x[:, ::-1]    # reverse axis bgr->rgb
    # So have to split and then concatenate the channels in reverse.
    
    r,g,b = tf.split(x, 3, axis)
    return tf.concat([b,g,r], axis)

# Define convenience functions for the repeating blocks of VGG16
def ConvBlock(layers, model, filters):
    for i in range(layers): 
        model.add(ZeroPadding2D((1,1)))
        model.add(Convolution2D(filters, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

def FCBlock(model):
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))

def VGG_16():
    model = Sequential()
    # The input dimentions differ depending on the dimention ordering.
    if (K.image_dim_ordering() == 'th'):
        model.add(Lambda(vgg_preprocess, input_shape=(3,224,224), output_shape=((3,224,224))))
    else:
        model.add(Lambda(vgg_preprocess, input_shape=(224,224,3), output_shape=((224,224,3))))
    ConvBlock(2, model, 64)
    ConvBlock(2, model, 128)
    ConvBlock(3, model, 256)
    ConvBlock(3, model, 512)
    ConvBlock(3, model, 512)

    model.add(Flatten())
    FCBlock(model)
    FCBlock(model)
    model.add(Dense(1000, activation='softmax'))
    return model

#### Step 4: Convert our predefined weights

This turns out to be more painful than you might think.  There are three issues we need to account for:
* TensorFlow uses correlation for its filters, while Theano uses convolutions.  They are basically equivalent operations, but the filters are flipped (on both the x and y axis) with repect to one another.
* The dim ordering used in our convolutional layer has changed, so we need to migrate the weights from a model defined using 'th' ordering, to one that uses 'tf' ordering.
* We need to shuffle our dense layer weights to apply to the right outputs from the Convolutional layers.

We create two models to perform the migration. One (th_model) to load the 'th' weights in, and another (tf_model) to save the 'tf' weights to.  We then load Theano Vgg16 weights for into our th_model, convert and then store them into our tf_model, and then save the weights to disk.

You are free to use different models here (with your own custom weights), you just need to make sure the weights you are converting match the model.

You only need to do the messy conversion once.  After this, you can just use your saved TensorFlow weights from now on.

In [3]:
# Download the precomputed model weights for Vgg16 (in Theano format).  They are
# too bit to store on Github
fname = 'vgg16.h5'
fpath = get_file(fname, 'http://www.platform.ai/models/' + fname, cache_subdir='models')

# Ok, here is some messy conversion from a Theano (with 'th' dim ordering) kernel
# to a TensorFlow (with 'tf' dim ordering) kernel.

# Fortunately, Somshubra Majumdar (@titu1994) had already done a great job of figuring out how to migrate
# between the two.  I've borrowed most of the code from the URL below:
#   https://github.com/titu1994/Keras-Classification-Models/blob/62ab38512c6ef0fe4a0f30a00d33643f453cf12e/weight_conversion_theano.py

from keras.utils.layer_utils import convert_all_kernels_in_model

first_dense = True
nb_last_conv = 0

def shuffle_rows(original_w, nb_last_conv, nb_rows_dense):
    ''' Note :
    This algorithm to shuffle dense layer rows was provided by Kent Sommers (@kentsommer)
    in a gist : https://gist.github.com/kentsommer/e872f65926f1a607b94c2b464a63d0d3
    '''
    converted_w = np.zeros(original_w.shape)
    count = 0
    for index, row in enumerate(original_w):
        if (index % nb_last_conv) == 0 and index != 0:
            count += 1
        new_index = ((index % nb_last_conv) * nb_rows_dense) + count
        converted_w[new_index] = row
    return converted_w

# Create a vgg16 model with a Theano dim ordering.  You could load weights for 
# other models here, but you would need to create corresponding models.
K.set_image_dim_ordering('th')
th_model = VGG_16()
# Create a vgg16 model with a Tensorflow dim ordering
K.set_image_dim_ordering('tf')
tf_model = VGG_16()

# Load the original VGG16 model trained using the Theano backend (and 'th' dim ordering)
th_model.load_weights(fpath)

convert_all_kernels_in_model(th_model)

count_dense = 0
for layer in th_model.layers:
    if layer.__class__.__name__ == "Dense":
        count_dense += 1

if count_dense == 1:
    first_dense = False # If there is only 1 dense, no need to perform row shuffle in Dense layer

for index, th_layer in enumerate(th_model.layers):
    if th_layer.__class__.__name__ in ['Convolution1D',
                                       'Convolution2D',
                                       'Convolution3D',
                                       'AtrousConvolution2D',
                                       'Deconvolution2D']:
        weights = th_layer.get_weights()
        weights[0] = weights[0].transpose((2, 3, 1, 0))
        tf_model.layers[index].set_weights(weights)

        nb_last_conv = th_layer.nb_filter # preserve last number of convolutions to use with dense layers
    else:
        if th_layer.__class__.__name__ == "Dense" and first_dense:
            weights = th_layer.get_weights()
            nb_rows_dense_layer = weights[0].shape[0] // nb_last_conv

            weights[0] = shuffle_rows(weights[0], nb_last_conv, nb_rows_dense_layer)
            tf_model.layers[index].set_weights(weights)

            first_dense = False
        else:
            tf_model.layers[index].set_weights(th_layer.get_weights())

tf_model.save_weights('vgg16-tf.h5')

#### Step 5: Test

Check that we can load our adjusted VGG16 weights and train against the Cats and Dogs dataset using the TensorFlow backend.

We should see results that are comparble (in speed and accuracy) to using Theano backend.

First we'll try the converted model...

In [4]:
# Set the ordering explicitly again, just to be sure.
K.set_image_dim_ordering('tf')

# Only using sample data for this example.  You can download and use the 
# entire Cats and Dogs if you want.  It's just a bit too big for Github.
# 
# I've tried it on the full data set, and it gets comparable accuracy to the
# Theano backend.  You'll need try it yourself, if you don't believe me. ;)
#
# Update Feb 27th, 2017:  The above is a lie.  Something is broken in the
# conversion process.
path = "sample/"

# As large as you can, but no larger than 64 is recommended. 
# If you have an older or cheaper GPU, you'll run out of memory, so will have to decrease this.
batch_size=64

# Create our batches for the training and validation sets
imageGenerator = image.ImageDataGenerator()
train_batches = imageGenerator.flow_from_directory(path+'train', target_size=(224,224),
        class_mode='categorical', shuffle=True, batch_size=batch_size)
imageGenerator = image.ImageDataGenerator()
valid_batches = imageGenerator.flow_from_directory(path+'valid', target_size=(224,224),
        class_mode='categorical', shuffle=True, batch_size=batch_size)

# Test after we've done the conversion
model = VGG_16()
model.load_weights('vgg16-tf.h5')
model.pop()
for layer in model.layers: layer.trainable=False
model.add(Dense(train_batches.nb_class, activation='softmax'))
model.compile(optimizer=Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(train_batches, samples_per_epoch=train_batches.nb_sample, 
                    validation_data=valid_batches, nb_val_samples=valid_batches.nb_sample,
                    nb_epoch=3)

Found 180 images belonging to 2 classes.
Found 20 images belonging to 2 classes.
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7f5b0b615898>

#### Now try again using the Theano dim ordering

For comparison... The two should be very similar.

In [6]:
K.set_image_dim_ordering('th')
# Create our batches for the training and validation sets
imageGenerator = image.ImageDataGenerator()
train_batches = imageGenerator.flow_from_directory(path+'train', target_size=(224,224),
        class_mode='categorical', shuffle=True, batch_size=batch_size)
imageGenerator = image.ImageDataGenerator()
valid_batches = imageGenerator.flow_from_directory(path+'valid', target_size=(224,224),
        class_mode='categorical', shuffle=True, batch_size=batch_size)

# Test after we've done the conversion
model = VGG_16()
model.load_weights(fpath)
model.pop()
for layer in model.layers: layer.trainable=False
model.add(Dense(train_batches.nb_class, activation='softmax'))
model.compile(optimizer=Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(train_batches, samples_per_epoch=train_batches.nb_sample, 
                    validation_data=valid_batches, nb_val_samples=valid_batches.nb_sample,
                    nb_epoch=3)

Found 180 images belonging to 2 classes.
Found 20 images belonging to 2 classes.
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7f5b0a044160>