# Fine-tuning  (pretrained on ImageNET) on CIFAR10

Here, we present the process of fine-tuning the InceptionResNetV2 network (from [keras.applications](https://keras.io/applications/)).

We use InceptionResNetv2 from [keras.applications](https://keras.io/applications/)), which is already pretrained on ImageNET database. Next we add some additional layers in order to train the network on CIFAR10 dataset.

We used the [keras](https://keras.io/) python deep learning library.
Namely, we follow [keras.applications](https://keras.io/applications/#usage-examples-for-image-classification-models) tutorial.

Here is the example to load the InceptionResNetV2 CNN with keras
```python
from keras.applications.inception_resnet_v2 import InceptionResNetV2
model = InceptionResNetV2(include_top=False, weights='imagenet', \
                    input_tensor=None, input_shape=None, pooling=None, classes=1000)
```
The most important for is to give the paramater
```python
include_top=False
```

since it builds the CNN model without the last (top) layer which is responsible to classify ImageNET categorized to 1000 classes.

In [1]:
network_names = [ 'incv2' ]

print("Available networks = ", network_names)
cnnid = 0; # int( input("Please choose the CNN network [0-{n}]: ".format(n=len(network_names)-1)) )

selected_network = network_names[cnnid]
print("Selected network: ", selected_network)

Available networks =  ['incv2', 'resnet50', 'vgg16', 'vgg19']
Selected network:  incv2


In [2]:
import time
#import myutils
import numpy as np
import tensorflow as tf
from keras.layers import Input, Dense, AveragePooling2D, GlobalAveragePooling2D
from keras import backend as K

Using TensorFlow backend.


# Load CIFAR10 data
Here we use [keras.datasets](https://keras.io/datasets/) which is pretty similar to our <tt>myutils.load_CIFAR10_dataset()</tt> procedure.

In [3]:
n_classes = 10
from keras.datasets import cifar10
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

n_training = X_train.shape[0]
n_testing = X_test.shape[0]

y_train = y_train.flatten()
y_test  = y_test.flatten()

print( X_train.shape, y_train.shape,  X_test.shape, y_test.shape )

from matplotlib import pyplot as plt
plt.imshow( X_train[0]  )
plt.show()

(50000, 32, 32, 3) (50000,) (10000, 32, 32, 3) (10000,)


<Figure size 640x480 with 1 Axes>

# Create model

In [4]:
from keras.models import Model
from keras.applications.inception_resnet_v2 import InceptionResNetV2

input_shape = {
    'incv2'   : (224,224,3)
}[selected_network]

def create_model_incv2():
    tf_input = Input(shape=input_shape)
    base_model = InceptionResNetV2(input_tensor=tf_input, weights='imagenet', include_top=False)
    x = base_model.output
    x = GlobalAveragePooling2D()(x)
    x = Dense(1024, activation='relu')(x)
    predictions = Dense(n_classes, activation='softmax')(x)
    model = Model(inputs=base_model.input, outputs=predictions)
    return base_model, model

create_model = {
    'incv2'    : create_model_incv2,
}[selected_network]

# Data generator for tensorflow

The feature extraction can process the batches of data. It is common in feeding neural networks in tensorflow.

In [5]:
# tensorflow placeholder for batch of images from CIFAR10 dataset
batch_of_images_placeholder = tf.placeholder("uint8", (None, 32, 32, 3))

batch_size = {
    'incv2'    : 64,
}[selected_network]

# Inception default size is 299x299
tf_resize_op = tf.image.resize_images(batch_of_images_placeholder, (input_shape[:2]), method=0)

In [6]:
# data generator for tensorflow session
from keras.applications.inception_resnet_v2 import preprocess_input as incv2_preprocess_input

preprocess_input = {
    'incv2'   : incv2_preprocess_input,
}[selected_network]

def data_generator(sess,data,labels):
    def generator():
        start = 0
        end = start + batch_size
        n = data.shape[0]
        while True:
            batch_of_images_resized = sess.run(tf_resize_op, {batch_of_images_placeholder: data[start:end]})
            batch_of_images__preprocessed = preprocess_input(batch_of_images_resized)
            batch_of_labels = labels[start:end]
            
            start += batch_size
            end   += batch_size
            if start >= n:
                start = 0
                end = batch_size
            yield (batch_of_images__preprocessed, batch_of_labels)
    return generator

# Create model

In [7]:
sess = tf.InteractiveSession()

In [8]:
K.set_session(sess)
K.set_learning_phase(1)  # 0 - test,  1 - train

In [9]:
base_model, model = create_model()

In [10]:
# let's visualize layer names and layer indices to see how many layers
# we should freeze:
for i, layer in enumerate(base_model.layers):
    print(i, layer.name)

0 input_1
1 conv2d_1
2 batch_normalization_1
3 activation_1
4 conv2d_2
5 batch_normalization_2
6 activation_2
7 conv2d_3
8 batch_normalization_3
9 activation_3
10 max_pooling2d_1
11 conv2d_4
12 batch_normalization_4
13 activation_4
14 conv2d_5
15 batch_normalization_5
16 activation_5
17 max_pooling2d_2
18 conv2d_9
19 batch_normalization_9
20 activation_9
21 conv2d_7
22 conv2d_10
23 batch_normalization_7
24 batch_normalization_10
25 activation_7
26 activation_10
27 average_pooling2d_1
28 conv2d_6
29 conv2d_8
30 conv2d_11
31 conv2d_12
32 batch_normalization_6
33 batch_normalization_8
34 batch_normalization_11
35 batch_normalization_12
36 activation_6
37 activation_8
38 activation_11
39 activation_12
40 mixed_5b
41 conv2d_16
42 batch_normalization_16
43 activation_16
44 conv2d_14
45 conv2d_17
46 batch_normalization_14
47 batch_normalization_17
48 activation_14
49 activation_17
50 conv2d_13
51 conv2d_15
52 conv2d_18
53 batch_normalization_13
54 batch_normalization_15
55 batch_normalization

In [11]:
for i, layer in enumerate(model.layers):
    print(i, layer.name)

0 input_1
1 conv2d_1
2 batch_normalization_1
3 activation_1
4 conv2d_2
5 batch_normalization_2
6 activation_2
7 conv2d_3
8 batch_normalization_3
9 activation_3
10 max_pooling2d_1
11 conv2d_4
12 batch_normalization_4
13 activation_4
14 conv2d_5
15 batch_normalization_5
16 activation_5
17 max_pooling2d_2
18 conv2d_9
19 batch_normalization_9
20 activation_9
21 conv2d_7
22 conv2d_10
23 batch_normalization_7
24 batch_normalization_10
25 activation_7
26 activation_10
27 average_pooling2d_1
28 conv2d_6
29 conv2d_8
30 conv2d_11
31 conv2d_12
32 batch_normalization_6
33 batch_normalization_8
34 batch_normalization_11
35 batch_normalization_12
36 activation_6
37 activation_8
38 activation_11
39 activation_12
40 mixed_5b
41 conv2d_16
42 batch_normalization_16
43 activation_16
44 conv2d_14
45 conv2d_17
46 batch_normalization_14
47 batch_normalization_17
48 activation_14
49 activation_17
50 conv2d_13
51 conv2d_15
52 conv2d_18
53 batch_normalization_13
54 batch_normalization_15
55 batch_normalization

430 activation_116
431 block17_10_mixed
432 block17_10_conv
433 block17_10
434 block17_10_ac
435 conv2d_118
436 batch_normalization_118
437 activation_118
438 conv2d_119
439 batch_normalization_119
440 activation_119
441 conv2d_117
442 conv2d_120
443 batch_normalization_117
444 batch_normalization_120
445 activation_117
446 activation_120
447 block17_11_mixed
448 block17_11_conv
449 block17_11
450 block17_11_ac
451 conv2d_122
452 batch_normalization_122
453 activation_122
454 conv2d_123
455 batch_normalization_123
456 activation_123
457 conv2d_121
458 conv2d_124
459 batch_normalization_121
460 batch_normalization_124
461 activation_121
462 activation_124
463 block17_12_mixed
464 block17_12_conv
465 block17_12
466 block17_12_ac
467 conv2d_126
468 batch_normalization_126
469 activation_126
470 conv2d_127
471 batch_normalization_127
472 activation_127
473 conv2d_125
474 conv2d_128
475 batch_normalization_125
476 batch_normalization_128
477 activation_125
478 activation_128
479 block17_13_

In [12]:
#   first: train only the top layers (which were randomly initialized)
#   i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
    layer.trainable = False

# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

In [13]:
y_train_one_hot = tf.one_hot( y_train, n_classes ).eval()

In [14]:
data_train_gen = data_generator(sess, X_train, y_train_one_hot )

In [15]:
# train the model on the new data for a "few" epochs
model.fit_generator(data_train_gen(), n_training/batch_size, verbose=1)

Epoch 1/1


<keras.callbacks.History at 0x2767d5ea198>

## Validation

In [16]:
images_resized = sess.run(tf_resize_op, {batch_of_images_placeholder: X_test})
images = preprocess_input(images_resized)

In [17]:
result = model.predict(images, verbose=2)

In [18]:
y_pred = [ np.argmax( result[i] ) for i in range(n_testing) ]

In [19]:
np.sum( y_pred == y_test ) / n_testing

0.7323

## Train again a little bit

In [20]:
# train the model on the new data for a "few" epochs
# model.fit_generator(data_train_gen(), n_training/batch_size, epochs=5, verbose=1)

## Validate again

In [21]:
result = model.predict(images, verbose=1)
y_pred = [ np.argmax( result[i] ) for i in range(n_testing) ]
np.sum( y_pred == y_test ) / n_testing



0.7323

## Train more layers

In [22]:
# at this point, the top layers are well trained and we can start fine-tuning
# convolutional layers from inception V3. We will freeze the bottom N layers
# and train the remaining top layers.

In [23]:
for layer in model.layers[:16]:
    layer.trainable = False
for layer in model.layers[16:]:
    layer.trainable = True

In [24]:
# we need to recompile the model for these modifications to take effect
# we use SGD with a low learning rate
from keras.optimizers import SGD
model.compile(optimizer=SGD(lr=0.001, momentum=0.9), loss='categorical_crossentropy')

In [25]:
# we train our model again (this time fine-tuning the top 2 inception blocks
# alongside the top Dense layers
model.fit_generator(data_train_gen(), n_training/batch_size, epochs=10, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x2767d700b38>

In [26]:
# model.fit_generator(data_train_gen(), n_training/batch_size, epochs=3, verbose=1)

## Validate tuned network

In [27]:
result = model.predict(images, verbose=1)
y_pred = [ np.argmax( result[i] ) for i in range(n_testing) ]
np.sum( y_pred == y_test ) / n_testing



0.9383

So we obtained 93.83% on testing dataset