# Convolutional  Neural Networks and Transfer Learning

Today we are going to explore ConvNets for image classification. And how you might be able to use them for your final projects.

There are an insane number of possible arrangements of pixels. for a 512x512 image there are $256^{786432}$ possible arrangments of RGB colors. Even Google only sees and infitesimal fraction of the possible images. 

![](https://silverspaceship.com/static/shot_1.png)
luckily for us, natural images share certain properties that make the problem much more tractable.

Nearby pixels tend to be similar to eachother, and higher level patterns build on top of each other. pixels form lines, lines form shapes. It is rare for an image to contain large amounts of static.

this tendency for nearby pixels to be related, means we can exploit this with something called convolutional filters

![](img/conv-net2.png)

## Convolution

![](img/3D_Convolution_Animation.gif)

## Max pooling

![](img/Max_pooling.png)

What are the layers learning?

First layer
![](img/filt1.jpeg)

Looks like Gabor Filters

Subsequent Layers get harder to visualize. ![But Inception networks can give it a go](http://yosinski.com/static/proj/deepvis_all_layers.jpg)

Famous Pre-trained Image Models
* VGG16
* VGG19
* InceptionV3


## VGG16

![](img/vgg16_croped.png)

Download VGG-16 trained weights from [Here](https://github.com/fchollet/deep-learning-models/releases)


#### Download labels too

In [1]:
!curl https://raw.githubusercontent.com/torch/tutorials/master/7_imagenet_classification/synset_words.txt -o synset_words.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 31675  100 31675    0     0  31675      0  0:00:01 --:--:--  0:00:01 99921


## install opencv

opencv is a collection of useful computer vision tools

```conda install -c https://conda.binstar.org/menpo opencv3```

or

```pip install opencv-python```

## A Convolutional Neural Network Architecture in Practice

### VGG (Loading a Pretrained Network, first the harder & less stable way)

See section below for the better way to load a pretrained network using keras utilities.

The hard way is to explicitly define the network structure, and then go find and manually load the learned weights that correspond to that structure. Someone has already trained this network on a large dataset (imagenet), has saved those weights, and has made them publicly downloadable.

**For your image projects, you would very likely want to use a pretrained network as the basis for an image classifier**. You might use something more recent than VGG like ResNet or Google Inception. You would use transfer learning on top of this, as detailed below.

In [2]:
from keras import backend as K
K.set_image_dim_ordering('th')

from keras.models import Sequential
from keras.layers.core import Flatten, Dense, Dropout, Activation
from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.optimizers import SGD
import cv2, numpy as np
import pandas as pd

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [3]:
def VGG_16(weights_path=None):
    model = Sequential()
    model.add(ZeroPadding2D((1,1),input_shape=(3,224,224)))
    model.add(Convolution2D(64, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(64, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(128, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(128, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(Flatten())
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(1000, activation='softmax'))

    if weights_path:
        model.load_weights(weights_path)

    return model

In [4]:
synset = pd.read_csv('synset_words.txt', skipinitialspace=True, names = ['synset', 'words'])

# note that we don't actually train/adjust the weights at all here
model = VGG_16('/Users/jeddy-metis/Documents/VGG/vgg16_weights.h5')
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd, loss='categorical_crossentropy')

  after removing the cwd from sys.path.
  
  # Remove the CWD from sys.path while we load stuff.
  if sys.path[0] == '':
  app.launch_new_instance()


In [5]:
def prepare_image(image_path):
    im = cv2.resize(cv2.imread(image_path), (224, 224)).astype(np.float32)

    # these subtractions are just mean centering the images 
    # based on known means for different color channels
    im[:,:,0] -= 103.939
    im[:,:,1] -= 116.779
    im[:,:,2] -= 123.68

    im = im.transpose((2,0,1)) # adjust from (224, 224, 3) to (3, 224, 224) for keras
    im = np.expand_dims(im, axis=0) # adjust to (1, 3, 224, 224) for generating keras prediction
    return im

![](img/dog.jpeg)

One way to do this - we're using an explicitly defined model structure and loaded pre-trained weights into it. We can then run the pre-trained model any any image we want, and see which label it predicts. Of course, the predicted label is drawn from the set of labels that the model was trained on (ImageNet).

In [6]:
img = prepare_image('img/dog.jpeg')
out = model.predict(img)
y_pred = np.argmax(out)

print(y_pred)
print(synset.loc[y_pred].synset)

259
n02112018 Pomeranian


## Instead, the preferred and stable way to do this!!

A more stable way to do this - using built in keras applications to load the network we want and tell it to download / use pretrained weights. **This is highly recommended over manually defining architectures and loading weights!**

Here are [more examples of keras transfer learning](https://keras.io/applications/) with modern pretrained CNNs. Check out the documentation specific to the model(s) you want to use.

In [7]:
from keras.applications.vgg16 import VGG16
from keras.applications.imagenet_utils import decode_predictions

img = prepare_image('img/dog.jpeg')

model = VGG16(weights='imagenet')
out = model.predict(img)
y_pred = np.argmax(out)

print('Predicted:', decode_predictions(out))

Predicted: [[('n02112018', 'Pomeranian', 0.4994781), ('n02113023', 'Pembroke', 0.12940948), ('n02115641', 'dingo', 0.06858055), ('n02085620', 'Chihuahua', 0.050600257), ('n02104365', 'schipperke', 0.035876952)]]


Close enough

![](img/sloth.jpg)

In [8]:
img = prepare_image('img/sloth.jpg')
out = model.predict(img)

print('Predicted:', decode_predictions(out))

Predicted: [[('n07930864', 'cup', 0.6601413), ('n03063599', 'coffee_mug', 0.19571926), ('n04131690', 'saltshaker', 0.021055091), ('n02493509', 'titi', 0.013833829), ('n02490219', 'marmoset', 0.01272898)]]


It took a huge amount of gpu time/power and data to train this model. More that you will have access to over the next 3 weeks. So what is one to do if you want to do an image related project?

**Cheat!**

### Transfer Learning

it turns out that the lower level featured learned by VGG16 on imagenet are still applicable to other problems with natural images. If we can preserve the lower-level features, we can just train a new model on those features. (In fact, in the case of 'softmax', we can think of this as just training a new multinomial logistic regression, on those convolution features)

Lets just snip off last layer.

A Caveat

if we just add a new layer with default weights, it is going to be very wrong the first iteration. Since it is so wrong, the gradient will be huge, and because we are using back propagation those errors will be sent down stream into the lower level features. This can quickly destroy the rest of the network.

In order to retrain this model we must protect the lower-level features, until our new layers have reached more stability. We can do this by freezing those layers

Then we'll add our new layer.

In [9]:
from keras.models import Model

# note we exclude the final dense layers and add one back below, we would retrain it ourselves
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(3,224,224)) 
 
# Freeze convolutional layers
for layer in base_model.layers:
    layer.trainable = False    
    
x = base_model.output
x = Flatten()(x) # flatten from convolution tensor output 
predictions = Dense(2, activation='softmax')(x) # should match # of classes predicted

# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)

In [10]:
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9),
            loss='categorical_crossentropy', metrics=['accuracy'])

Then you would just train like normal

In [None]:
model.fit(X_train,y_train)

How much data do you need?

More!

Actually with this bottleneck approach, you don't need as much. 200-1000 representitive images of each class will give good results. Because
* Google has already done most of the hard work
* We can use image augmentation to increase our number of training samples

### Image Augmentation

![](img/DataAugmentation.png)

Possible Augmentations:
* Scale
* Rotation
* Skew
* Flips
* color tinting
* blur
* crops
* ETC

As long as you do not destroy the info you are trying to represent.

[Check out this](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html) for more with keras!

### Other Deep Learning Architectures

#### Auto-Encoders
![](img/Autoencoder.png)

Trying to learn an approximation of our input data after compression step. Autoencoders are super similar to Matrix Factorization, and are completely unsupervised

#### Variational Auto-Encoder
![](http://kvfrans.com/content/images/2016/08/vae.jpg)
If you build in an understanding of noise, you have a Variational Autoencoder (VAE)

#### Siamese Network
![](img/siamese.png)

This tends to be used to compare the inputs. 

- Answer and question matching
- standardization
- facial recognition
- picture captioning

#### Generative Adversarial Networks
![](img/GAN.jpg)
Create a Generative model, and a Discrimative model. Pit them against eachother to improve both

[Really cool example of Adversarial Networks in action](http://carpedm20.github.io/faces/)

#### Stack GANs
Gans can be stacked with awesome results
![](http://i.imgur.com/SGzE7vI.jpg)

https://arxiv.org/pdf/1612.03242v1.pdf

#### image analogy

![](https://raw.githubusercontent.com/awentzonline/image-analogies/master/examples/images/sugarskull-analogy.jpg)
![](https://raw.githubusercontent.com/awentzonline/image-analogies/master/examples/images/image-analogy-explanation.jpg)
![](https://raw.githubusercontent.com/awentzonline/image-analogies/master/examples/images/trump-image-analogy.jpg)

https://github.com/awentzonline/image-analogies

New Architectures are being published every day. So much to read!

* [Curated List of Deep Learning papers](https://github.com/ChristosChristofidis/awesome-deep-learning)
* [Good reddit post for keeping up with the latest research](https://www.reddit.com/r/MachineLearning/comments/6d7nb1/d_machine_learning_wayr_what_are_you_reading_week/)
