## *Very Deep Convolutional Networks for Large-Scale Image Recognition, by K. Simonyan and A. Zisserman*
One model in the paper denoted as D or VGG-16 has 16 deep layers. An implementation in Java Caffe (http://caffe.berkeleyvision.org/) has been
used for training the model on the ImageNet ILSVRC-2012 (http://imagenet.
org/challenges/LSVRC/2012/) dataset, which includes images of 1,000 classes and is split into three sets:
training (1.3 million images), validation (50,000 images), and testing (100,000 images). Each image
is (224 x 224) on three channels. The model achieves 7.5% top 5 error on ILSVRC-2012-val and
7.4% top 5 error on ILSVRC-2012-test.


The goal of this competition is to estimate the content of photographs for the purpose of retrieval and
automatic annotation using a subset of the large hand-labeled ImageNet dataset (10 million labeled
images depicting 10,000 + object categories) as training. Test images will be presented with no
initial annotation—no segmentation or labels—and **algorithms will have to produce labelings
specifying what objects are present in the images**.


The weights learned by the model implemented in Caffe have been directly converted in Keras (for
more information refer to: https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3) and can be used for
preloading into the Keras model, which is implemented here as described in the paper.

**Additional code** https://gist.github.com/nitish11/73ba862753929e08b3b319ff1e8c9c09

In [None]:
from keras import backend as K
from keras.models import Sequential
from keras.layers.core import Flatten, Dense, Dropout
from keras.layers.convolutional import Conv2D, MaxPooling2D, ZeroPadding2D
from keras.optimizers import SGD
import cv2, numpy as np

### Defining Network

In [None]:
# define a VGG16 network
# 16 Layers
def VGG_16(weights_path=None):
    model = Sequential()
    #-----------------------------------------------------
    model.add(ZeroPadding2D((1,1),input_shape=(3,224,224)))
    model.add(Conv2D(64, (3, 3), activation='relu'))               # 1- layer
    
    model.add(ZeroPadding2D((1,1)))    
    model.add(Conv2D(64, (3, 3), activation='relu'))               # 2- layer
    
    model.add(MaxPooling2D((2,2), strides=(2,2)))
    #----------------------------------------------------- 
    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(128, (3, 3), activation='relu'))              # 3- layer
    
    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(128, (3, 3), activation='relu'))              # 4- layer
    
    model.add(MaxPooling2D((2,2), strides=(2,2)))
    #-----------------------------------------------------
    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(256, (3, 3), activation='relu'))              # 5- layer
    
    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(256, (3, 3), activation='relu'))              # 6- layer
    
    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(256, (3, 3), activation='relu'))              # 7- layer
    
    model.add(MaxPooling2D((2,2), strides=(2,2)))
    #-----------------------------------------------------
    
    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(512, (3, 3), activation='relu'))              # 8- layer
   
    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(512, (3, 3), activation='relu'))              # 9- layer
    
    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(512, (3, 3), activation='relu'))              # 10- layer
    
    model.add(MaxPooling2D((2,2), strides=(2,2)))
    #-----------------------------------------------------
    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(512, (3, 3), activation='relu'))              # 11- layer
    
    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(512, (3, 3), activation='relu'))              # 12- layer
    
    model.add(ZeroPadding2D((1,1)))
    model.add(Conv2D(512, (3, 3), activation='relu'))              # 13- layer
    
    model.add(MaxPooling2D((2,2), strides=(2,2)))
    #-----------------------------------------------------
    model.add(Flatten())

    #top layer of the VGG net
    model.add(Dense(4096, activation='relu'))                     # 14- layer
    model.add(Dropout(0.5))
    model.add(Dense(4096, activation='relu'))                     # 15- layer
    model.add(Dropout(0.5))
    model.add(Dense(1000, activation='softmax'))                  # 16- layer

    if weights_path:
        model.load_weights(weights_path)

    return model

### Download Image and Prediciting
For obtain the witghs of the model you can search in google for:
        - download vgg16_weights.h5  (https://drive.google.com/uc?id=0Bz7KyqmuGsilT0J5dmRCM0ROVHc&export=download)

**Note:** This file have 528 MB

**Tip:** For checking the meaning of the resulted number after run the prediction, please consult the website:

        https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a


In [None]:
if __name__ == "__main__":
    im = cv2.resize(cv2.imread('cat-standing.jpg'), (224, 224)).astype(np.float32)
    im = im.transpose((2,0,1))
    im = np.expand_dims(im, axis=0)
    K.set_image_dim_ordering("th")
    
    # Test pretrained model
    model = VGG_16('vgg16_weights.h5')
    optimizer = SGD()
    model.compile(optimizer=optimizer, loss='categorical_crossentropy')
    out = model.predict(im)
    print np.argmax(out)

# <font color='brown'>Utilizing Keras built-in VGG-16 net module </font>

In [None]:
from keras.models import Model
from keras.preprocessing import image
from keras.optimizers import SGD
from keras.applications.vgg16 import VGG16
import matplotlib.pyplot as plt
import numpy as np
import cv2

### Prebuild model with pre-trained weights on imagenet

In [None]:
# prebuild model with pre-trained weights on imagenet
model = VGG16(weights='imagenet', include_top=True)
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd, loss='categorical_crossentropy')
# resize into VGG16 trained images' format
im = cv2.resize(cv2.imread('steam-locomotive.jpg'), (224, 224))
im = np.expand_dims(im, axis=0)

### Predict

In [None]:
# predict
out = model.predict(im)
plt.plot(out.ravel())
plt.show()
print np.argmax(out)
#this should print 820 for steaming train

# <font color='brown'> Extracting features from an intermediate layer in a DCNN </font>

The intermediate layer has the capability to
extract important features from an image, and these features are more likely to help in different kinds
of classification. **This has multiple advantages.**
- First, we can rely on publicly available large-scale
training and transfer this learning to novel domains. 
- Second, we can save time for expensive large
training. 
- Third, we can provide reasonable solutions even when we don't have a large number of
training examples for our domain. We also get a good starting network shape for the task at hand,
instead of guessing it.

Next is the code to implements the idea by extracting features from a specific layer.

In [None]:
from keras.applications.vgg16 import VGG16
from keras.models import Model
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
import numpy as np

In [None]:
# pre-built and pre-trained deep learning VGG16 model
base_model = VGG16(weights='imagenet', include_top=True)
for i, layer in enumerate(base_model.layers):
    print (i, layer.name, layer.output_shape)

In [None]:
# extract features from block4_pool block
model = Model(input=base_model.input, output=base_model.get_layer('block4_pool').output)
img_path = 'cat-standing.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
# get the features from this block
features = model.predict(x)

# <font color='brown'> Every deep inception-v3 net used for transfer learning </font>

Computer vision researchers now commonly use pre-trained CNNs to
generate representations for novel tasks, where the dataset may not be large enough to train an entire
CNN from scratch. Another common tactic is to take the pre-trained ImageNet network and then to
fine-tune the entire network to the novel task.

**Inception-v3** net is a very deep ConvNet **developed by Google**. The default input size for
this model is 299 x 299 on three channels

**For more info:** https://keras.io/applications/

We suppose to have
a training dataset D in a domain, different from ImageNet. D has 1,024 features in input and 200
categories in output. Let us see a code fragment:

In [None]:
from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K


### Create the base pre-trained model

In [None]:
# create the base pre-trained model
base_model = InceptionV3(weights='imagenet', include_top=False)

We use a trained inception-v3; we do not include the top model because we want to fine-tune on D.
The top level is a dense layer with 1,024 inputs and where the last output level is a softmax dense
layer with 200 classes of output. **x = GlobalAveragePooling2D()(x)** is used to convert the input to the correct
shape for the dense layer to handle. In fact, **base_model.output** tensor has the shape (samples, channels,
rows, cols) for **dim_ordering="th"** or (samples, rows, cols, channels) for **dim_ordering="tf"** but dense needs
them as (samples, channels) and GlobalAveragePooling2D averages across (rows, cols). So if you look at
the last four layers (where include_top=True), you see these shapes:

** layer.name, layer.input_shape, layer.output_shape**

            ('mixed10', [(None, 8, 8, 320), (None, 8, 8, 768), (None, 8, 8, 768), (None, 8, 8, 192)], (None, 8, 8, 2048))
            ('avg_pool', (None, 8, 8, 2048), (None, 1, 1, 2048))
            ('flatten', (None, 1, 1, 2048), (None, 2048))
            ('predictions', (None, 2048), (None, 1000))

When you do include_top=False, you are removing the last three layers and exposing the mixed10 layer, so
the GlobalAveragePooling2D layer converts the (None, 8, 8, 2048) to (None, 2048), where each element in
the (None, 2048) tensor is the average value for each corresponding (8, 8) subtensor in the (None, 8,
8, 2048) tensor:

In [None]:
# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)# let's add a fully-connected layer as first layer
x = Dense(1024, activation='relu')(x)# and a logistic layer with 200 classes as last layer
predictions = Dense(200, activation='softmax')(x)# model to train
model = Model(input=base_model.input, output=predictions)

All the convolutional levels are pre-trained, so we freeze them during the training of the full model:

In [None]:
# that is, freeze all convolutional InceptionV3 layers
for layer in base_model.layers: layer.trainable = False

The model is then compiled and trained for a few epochs so that the top layers are trained:
    
            # compile the model (should be done *after* setting layers to non-trainable)
            model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
            # train the model on the new data for a few epochs model.fit_generator(...)

Then we freeze the top layers in inception and fine-tune some inception layer. In this example, we
decide to freeze the first 172 layers (an hyperparameter to tune):

In [None]:
# we chose to train the top 2 inception blocks, that is, we will freeze
# the first 172 layers and unfreeze the rest:
for layer in model.layers[:172]: layer.trainable = False
for layer in model.layers[172:]: layer.trainable = True

The model is then recompiled for fine-tune optimization. We need to recompile the model for these
modifications to take effect:

In [None]:
# we use SGD with a low learning rate
from keras.optimizers
import SGD
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy')
# we train our model again (this time fine-tuning the top 2 inception blocks)
# alongside the top Dense layers
model.fit_generator(...)

Now we have a new deep network that reuses the standard Inception-v3 network, but it is trained on a
new domain D via transfer learning. Of course, there are many parameters to fine-tune for achieving
good accuracy. However, we are now reusing a very large pre-trained network as a starting point via
transfer learning. In doing so, we can save the need to train on our machines by reusing what is
already available in Keras.