# Kaggle Learn - Chapter 5: Deep Learning for computer vision

##### Introduction to Convents
The following lines of code show you what a basic convnet looks like. It’s a stack of Conv2D and MaxPooling2D layers. 

### Installing a small convnet

In [1]:
from keras import layers
from keras import models

model = models.Sequential()
model.add(layers.Conv2D(32, (3,3), activation='relu',
                        input_shape=(28,28, 1)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64, (3,3), activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64, (3,3), activation='relu'))


Using TensorFlow backend.


Importantly, a convnet takes as input tensors of shape (image_height, image_width, image_channels) (not including the batch dimension). In this case, we’ll configure the convnet to process inputs of size (28, 28, 1), which is the format of MNIST images. We’ll do this by passing the argument input_shape=(28, 28, 1) to the first layer.


Let's dispaly the architecture of the convnet so far:

In [2]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 3, 3, 64)          36928     
Total params: 55,744
Trainable params: 55,744
Non-trainable params: 0
_________________________________________________________________


You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink
Licensed to <null>
Introduction to convnets 121
as you go deeper in the network. The number of channels is controlled by the first argument passed to the Conv2D layers (32 or 64)

The next step is to feed the last output tensor (of shape (3, 3, 64)) into a densely connected classifier network like those you’re already familiar with: a stack of Dense layers. These classifiers process vectors, which are 1D, whereas the current output is a 3D tensor. First we have to flatten the 3D outputs to 1D, and then add a few Dense lay- ers on top.


In [3]:
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

We’ll do 10-way classification, using a final layer with 10 outputs and a softmax activa- tion. Here’s what the network looks like now:

In [4]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten_1 (Flatten)          (None, 576)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                36928     
__________

In [5]:
# Training the Convnet on MNIST images
from keras.datasets import mnist
from keras.utils import to_categorical

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28,28, 1))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)
                             
model.compile(optimizer='rmsprop',
             loss='categorical_crossentropy',
             metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=5, batch_size=64)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x11b42ada0>

In [6]:
# Evaluating the Model
test_loss, test_acc = model.evaluate(test_images, test_labels)

# View the accuracy
test_acc



0.99160000000000004

In [7]:
# View the loss
test_loss

0.029627922806469725

Why Max pooling consists of extracting windows from the input feature maps and out- putting the max value of each channel. It’s conceptually similar to convolution, except that instead of transforming local patches via a learned linear transformation (the con- volution kernel), they’re transformed via a hardcoded max tensor operation. A big dif- ference from convolution is that max pooling is usually done with 2 × 2 windows and stride 2, in order to downsample the feature maps by a factor of 2. On the other hand, convolution is typically done with 3 × 3 windows and no stride (stride 1).

downsample feature maps this way? Why not remove the max-pooling layers and keep fairly large feature maps all the way up? Let’s look at this option. The convo- lutional base of the model would then look like this:

In [8]:
# Why use maxpooling?
model_no_max_pool = models.Sequential()
model_no_max_pool.add(layers.Conv2D(32, (3,3), activation='relu',
                                   input_shape=(28,28,1)))
model_no_max_pool.add(layers.Conv2D(64, (3,3), activation='relu'))
model_no_max_pool.add(layers.Conv2D(64, (3,3), activation='relu'))

In [9]:
model_no_max_pool.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_4 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 24, 24, 64)        18496     
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 22, 22, 64)        36928     
Total params: 55,744
Trainable params: 55,744
Non-trainable params: 0
_________________________________________________________________



What’s wrong with this setup? Two things:
- It isn’t conducive to learning a spatial hierarchy of features. The 3 × 3 windows in the third layer will only contain information coming from 7 × 7 windows in the initial input. The high-level patterns learned by the convnet will still be very small with regard to the initial input, which may not be enough to learn to clas- sify digits (try recognizing a digit by only looking at it through windows that are 7 × 7 pixels!). We need the features from the last convolution layer to contain information about the totality of the input.
- The final feature map has 22 × 22 × 64 = 30,976 total coefficients per sample. This is huge. If you were to flatten it to stick a Dense layer of size 512 on top, that layer would have 15.8 million parameters. This is far too large for such a small model and would result in intense overfitting.


### Cat and Dogs Lesson

In [11]:
import os, shutil

original_dataset_dir = '/Users/Kavi/Dropbox/DataScience/Tutorials/Tutorial\ -\ Keras/Data/og_data'
base_dir = '/Users/Kavi/Dropbox/DataScience/Tutorials/Tutorial - Keras/Data/cats_and_dogs_small'

os.mkdir(base_dir)

train_dir = os.path.join(base_dir, 'train')
os.mkdir(train_dir)
validation_dir = os.path.join(base_dir, 'validation')
os.mkdir(validation_dir)
test_dir = os.path.join(base_dir, 'test')
os.mkdir(test_dir)

train_cats_dir = os.path.join(train_dir, 'cats')
os.mkdir(train_cats_dir)

train_dogs_dir = os.path.join(train_dir, 'dogs')
os.mkdir(train_dogs_dir)

validation_cats_dir = os.path.join(validation_dir, 'cats')
os.mkdir(validation_cats_dir)

validation_dogs_dir = os.path.join(validation_dir, 'dogs')
os.mkdir(validation_dogs_dir)

test_cats_dir = os.path.join(test_dir, 'cats')
os.mkdir(test_cats_dir)

test_dogs_dir = os.path.join(test_dir, 'dogs')
os.mkdir(test_dogs_dir)

fnames = ['cat.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(train_cats_dir, fname)
    shutil.copyfile(src, dst)
    
    
fnames = ['cat.{}.jpg'.format(i) for i in range(1000, 1500)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(validation_cats_dir, fname)
    shutil.copyfile(src, dst)

fnames = ['cat.{}.jpg'.format(i) for i in range(1500, 2000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(test_cats_dir, fname)
    shutil.copyfile(src, dst)
    
fnames = ['dog.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(train_dogs_dir, fname)
    shutil.copyfile(src, dst)
    
fnames = ['dog.{}.jpg'.format(i) for i in range(1000, 1500)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(validation_dogs_dir, fname)
    shutil.copyfile(src, dst)
    

fnames = ['dog.{}.jpg'.format(i) for i in range(1500, 2000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(test_dogs_dir, fname)
    shutil.copyfile(src, dst)

FileNotFoundError: [Errno 2] No such file or directory: '/Users/Kavi/Dropbox/DataScience/Tutorials/Tutorial\\ -\\ Keras/Data/og_data/cat.0.jpg'

In [None]:
print('total training cat images:', len(os.listdir(train_cats_dir)))

In [None]:
print('total training dog images:', len(os.listdir(train_dogs_dir)))

In [None]:
print('total validation cat images:', len(os.listdir(validation_cats_dir)))

In [None]:
print('total validation cat images:', len(os.listdir(validation_cats_dir)))

In [None]:
print('total validation dog images:', len(os.listdir(validation_dogs_dir)))

In [None]:
print('total test cat images:', len(os.listdir(test_cats_dir))

In [None]:
print('total test dog images:', len(os.listdir(test_dogs_dir)))

In [3]:
# Instantiatinga small convnet for dogs vs cats classification
from keras import layers
from keras import models

model = models.Sequential()
model.add(layers.Conv2D(32, (3,3), activation='relu', 
                        input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

In [4]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_5 (Conv2D)            (None, 148, 148, 32)      896       
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 74, 74, 32)        0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 72, 72, 64)        18496     
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 36, 36, 64)        0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 34, 34, 128)       73856     
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 17, 17, 128)       0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 15, 15, 128)       147584    
__________

In [6]:
### Configuring the model for training
from keras import optimizers

model.compile(loss='binary_crossentropy',
             optimizer=optimizers.RMSprop(lr=1e-4),
              metrics=['acc'])

In [8]:
# Using ImageDataGenerator to read image from directories
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale=1./255)
test_datage=n = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    train_dir, target_size=(150,150),
    batch_size=20, class_mode='binary')

validation_generator = test_datagend.flow_from_directory(
    validtation_dir, target_size=(150, 150), batch_size=20,
    class_mode = 'binary')

NameError: name 'train_dir' is not defined

In [11]:
### Understanding Python Generators
def generator():
    i =0
    while True:
        i += 1
        yield i
for item in generator():
    print(item)
    if item > 4:
        break

1
2
3
4
5


Let’s look at the output of one of these generators: it yields batches of 150 × 150 RGB images (shape (20, 150, 150, 3)) and binary labels (shape (20,)). There are 20 samples in each batch (the batch size). Note that the generator yields these batches indef- initely: it loops endlessly over the images in the target folder. For this reason, you need to break the iteration loop at some point:

In [None]:
for data_batch, labels_batch in train_generator:
    print("data batch shape:", data_batch.shape)
    print("labels batch shape:", labels_batch.shape)
    break

Let’s fit the model to the data using the generator. You do so using the fit_generator method, the equivalent of fit for data generators like this one. It expects as its first argument a Python generator that will yield batches of inputs and targets indefinitely, like this one does. Because the data is being generated endlessly, the Keras model needs to know how many samples to draw from the generator before declaring an epoch over. This is the role of the steps_per_epoch argument: after having drawn steps_per_epoch batches from the generator—that is, after having run for steps_per_epoch gradient descent steps—the fitting process will go to the next epoch. In this case, batches are 20 samples, so it will take 100 batches until you see your target of 2,000 samples.

In [None]:
### Fitting the model using a batch generator
history = model.fit_generator(
    train_generator, steps_per_epoch=100, epochs=30,
validation_data=validation_generator,
validation_steps=50)

In [None]:
# Saving the model
model.save('cats_and_dogs_small_1.h5')

In [12]:
# Displaying curevs of loss and accuracy during training
import matplotlib.pyplot as plt

acc=  history.history['acc']
val_acc=  history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', labels='Training acc')
plt.plot(epocsh, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()
plt.plot(epochs, loss, 'bo', labe='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()

NameError: name 'history' is not defined

You already know about a number of techniques that can help mitigate overfitting, such as dropout and weight decay (L2 regularization). We’re now going to work with a new one, specific to computer vision and used almost universally when processing images with deep-learning models: data augmentation.


### Data Augmentation

In [13]:
# Setting up a data augmentation configuration via ImageDataGenerator
datagen =  ImageDataGenerator(rotation_range=40,
                              width_shift_range=0.2,
                              height_shift_range=0.2,
                              shear_range =0.2,
                              zoom_range=0.2,
                              horizontal_flip=True,
                              fill_mode='nearest')

In [None]:
# Displaying some randomly augmented trainig images
from keras.preprocessing import Image

fname = [os.path.join(train_cats_dir, fname) for fname in 
         os.listdir(train_cats_dir)]

img_path = fname[3]

imge = image.load_img(img_path, target_size=(150,150))

x = image.img_to_array(img)

x = x.reshape((1,)) + x.shape

i = 0
for batch in datagen.flow(x, batch_size=1):
    plt.figure()
    imgplot = plt.imshow(image.array_to_img(batch[0]))
    i += 1
    if i % 4 == 0:
        break
        
plt.show()

If you train a new network using this data-augmentation configuration, the network will never see the same input twice. But the inputs it sees are still heavily intercor- related, because they come from a small number of original images—you can’t pro- duce new information, you can only remix existing information. As such, this may not be enough to completely get rid of overfitting. To further fight overfitting, you’ll also add a Dropout layer to your model, right before the densely connected classifier.

In [20]:
# Defining a new convnet that includes dropout
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu',
                        input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5)) # Dropout included
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
              optimizer=optimizers.RMSprop(lr=1e-4),
              metrics=['acc'])

Let’s train the network using data augmentation and dropout.

In [None]:
# Trinaing the convnet using data-augmentation generators
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,)

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = test_datagen.flow_from_directory(
        train_dir, 
                    target_size=(150,150), batch_size=32,
                    class_model='binary')

validation_generator = test_datagen.flow_from_directory(
        validation_dir,
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')

history = model.fit_generator(
        train_generator,
        steps_per_epoch=100,
        epochs=100,
        validation_data=validation_generator,
        validation_steps=50)

In [None]:
# Save the second model
model.save('cat_and_dogs_small_2.h5')


And let’s plot the results again: see figures 5.12 and 5.13. Thanks to data augmenta- tion and dropout, you’re no longer overfitting: the training curves are closely tracking the validation curves. You now reach an accuracy of 82%, a 15% relative improvement over the non-regularized model.m

In [None]:
# Displaying curevs of loss and accuracy during training of
# the new model

import matplotlib.pyplot as plt

acc=  history.history['acc']
val_acc=  history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', labels='Training acc')
plt.plot(epocsh, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()
plt.plot(epochs, loss, 'bo', labe='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()

By using regularization techniques even further, and by tuning the network’s parame- ters (such as the number of filters per convolution layer, or the number of layers in the network), you may be able to get an even better accuracy, likely up to 86% or 87%. But it would prove difficult to go any higher just by training your own convnet from scratch, because you have so little data to work with. As a next step to improve your accuracy on this problem, you’ll have to use a pretrained model, which is the focus of the next two sections.

### Using the VGG16 model (exteranl model) on our problem


The VGG16 model, among others, comes prepackaged with Keras. You can import it from the keras.applications module.

In [52]:
# Instantiating the VGG16 convolutional base
from keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
                 include_top =False,
                 input_shape =  (150,150,3))

Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
 2768896/58889256 [>.............................] - ETA: 33:45

KeyboardInterrupt: 

You pass three arguments to the constructor:
 - weights specifies the weight checkpoint from which to initialize the model.
 - include_top refers to including (or not) the densely connected classifier on top of the network. By default, this densely connected classifier corresponds to the 1,000 classes from ImageNet. Because you intend to use your own densely connected classifier (with only two classes: cat and dog), you don’t need to
include it.
 - input_shape is the shape of the image tensors that you’ll feed to the network.
This argument is purely optional: if you don’t pass it, the network will be able to process inputs of any size.

Let’s look at a concrete example. Consider X with shape (32, 10) and y with shape (10,). First, we add an empty first axis to y, whose shape becomes (1, 10). Then, we repeat y 32 times alongside this new axis, so that we end up with a tensor Y with shape (32, 10), where Y[i, :] == y for i in range(0, 32). At this point, we can proceed to add X and Y, because they have the same shape.

n terms of implementation, no new 2D tensor is created, because that would be terribly 
inefficient. The repetition operation is entirely virtual: it happens at the algo- rithmic level rather than at the memory level. But thinking of the vector being

repeated 10 times alongside a new axis is a helpful mental model. Here’s what a naive
implementation would look like:

In [None]:
conv_base.summary()


The final feature map has shape (4, 4, 512). That’s the feature on top of which you’ll stick a densely connected classifier.


At this point, there are two ways you could proceed:
- Running the convolutional base over your dataset, recording its output to a Numpy array on disk, and then using this data as input to a standalone, densely connected classifier similar to those you saw in part 1 of this book. This solution is fast and cheap to run, because it only requires running the convolutional base once for every input image, and the convolutional base is by far the most expensive part of the pipeline. But for the same reason, this technique won’t allow you to use data augmentation.
- Extending the model you have (conv_base) by adding Dense layers on top, and running the whole thing end to end on the input data. This will allow you to use data augmentation, because every input image goes through the convolutional base every time it’s seen by the model. But for the same reason, this technique is

We’ll cover both techniques. Let’s walk through the code required to set up the first one: recording the output of conv_base on your data and using these outputs as inputs to a new model.far more expensive than the first.

### Extracting features using the pretrained convolutional base

In [None]:
import os
import numyp as np
from keras.preprocessing.image import ImageDataGenerator

base_dir = '/Users/fchollet/Downloads/cats_and_dogs_small' 
train_dir = os.path.join(base_dir, 'train') 
validation_dir = os.path.join(base_dir, 'validation') 
test_dir = os.path.join(base_dir, 'test')

datagen = ImageDataGenerator(rescale=1./255)
batch_size = 20

def extract_features(directory, sample_count):
    features = np.zeros(shape=(sample_count, 4, 4, 512))
    labels = np.zero(shape=(sample_count))
    genrator = datagen.flow_from_directory(
                directory, target_size=(150,150),
                batch_size=batch_size, class_mode='binary')
    i = 0
    for inputs_batch, labels_batch in generator:
        feature_batch = conv_base.predict(inputs_batch)
        features[i * batch_size: (i +1) * batch_size] = feature_batch
        labels[i * batch_size: (i+1) * batch_size] = labels_batch
        i += 1
        if i * batch_size >= sample_count:
            break
    return features, labels

train_features, train_labels = extract_features(train_dir, 2000)
validation_features, validation_labels = extract_features(validation_dir, 1000)
test_features, test_labels = extract_features(test_dir, 1000)


The extracted features are currently of shape (samples, 4, 4, 512). You’ll feed them to a densely connected classifier, so first you must flatten them to (samples, 8192):

In [None]:
train_features = np.random(train_featuers, (2000, 4*4*512))
validation_features = np.reshape[validation_features, (1000, 4*4*512)]
test_features = np.reshape(test_features, (1000, 4*4*512))

At this point, you can define your densely connected classifier (note the use of drop- out for regularization) and train it on the data and labels that you just recorded.

In [None]:
from keras import models
from keras import layers
from keras import optimizers

model = models.Sequential()
model.add(layers.Dense(256, acitvation='relu', input_dim=4*4*512))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer=optmizers.RMSprop(lr=2e-5),
             loss='binary_crossentropy',
             metrics=['acc'])

history = model.fit(train_features, train_labels,
                   epochs=30, batch_size=20,
                   validation_data =(validation_features, validation_labels))

In [None]:
# Plotting the Results
import matplotlib.pyplot as plt

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()

### Features Extraction with Data Augmentation

In [None]:
# Adding a densely connected classifier on top of the convolutional base
# Run this model only if you have a GPU

from keras import models
from keras import layers

model = Sequentia()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

In [None]:
model.summary()

As you can see, the convolutional base of VGG16 has 14,714,688 parameters, which is very large. The classifier you’re adding on top has 2 million parameters.

Before you compile and train the model, it’s very important to freeze the convolu- tional base. Freezing a layer or set of layers means preventing their weights from being updated during training. If you don’t do this, then the representations that were pre- viously learned by the convolutional base will be modified during training. Because the Dense layers on top are randomly initialized, very large weight updates would be propagated through the network, effectively destroying the representations previously learned.

In Keras, you freeze a network by setting its trainable attribute to False:


In [None]:
print('This is the number of trainable weight'
      'before freezing the conv base ', len(model.trainable_weight))

In [53]:
conv_base.trainable=False

NameError: name 'conv_base' is not defined

In [54]:
print('This is the number of trainable weight'
      'after freezing the conv base ', len(model.trainable_weight))

AttributeError: 'Sequential' object has no attribute 'trainable_weight'

With this setup, only the weights from the two Dense layers that you added will be trained. That’s a total of four weight tensors: two per layer (the main weight matrix and the bias vector). Note that in order for these changes to take effect, you must first compile the model. If you ever modify weight trainability after compilation, you should then recompile the model, or these changes will be ignored.
Now you can start training your model, with the same data-au

##### Training the model end to end with a frozen convolutional base


In [61]:
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizonal_flip=True,
    fill_mode='nearest')

test_datagen = ImageDataGenerator(rsecale=1./255)

train_generator= train_datagen.flow_from_directory(
    train_dir, target_size=(150,150), batch_size=20,
    class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
    train_dir, target_size=(150,150), batch_size=20, 
    class_mode='binary')

model.compile(loss='binary_crossentropy',
             optimizers=optimizers.RMSprop(lr=2e-5),
             metrics=['acc'])

history = model.fit_generator(train_generator, steps_per_epoch=100,
                             epochs=30, validation_data=validation_generator, 
                             validation_steps=50)

TypeError: __init__() got an unexpected keyword argument 'horizonal_flip'

In [62]:
# Plotting the Results
import matplotlib.pyplot as plt

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()

NameError: name 'history' is not defined

### Fine Tuning

I stated earlier that it’s necessary to freeze the convolution base of VGG16 in order to be able to train a randomly initialized classifier on top. For the same reason, it’s only possible to fine-tune the top layers of the convolutional base once the classifier on top has already been trained. If the classifier isn’t already trained, then the error signal propagating through the network during training will be too large, and the represen- tations previously learned by the layers being fine-tuned will be destroyed. Thus the steps for fine-tuning a network are as follow:
- Add your custom network on top of an already-trained base network.
- Freeze the base network.
-  Train the part you added.
-  Unfreeze some layers in the base network.
-  Jointly train both these layers and the part you added.

You already completed the first three steps when doing feature extraction. Let’s pro- ceed with step 4: you’ll unfreeze your conv_base and then freeze individual layers inside it.

In [63]:
conv_base.summary()

NameError: name 'conv_base' is not defined

You’ll fine-tune the last three convolutional layers, which means all layers up to block4_pool should be frozen, and the layers block5_conv1, block5_conv2, and block5_conv3 should be trainable.
Why not fine-tune more layers? Why not fine-tune the entire convolutional base? You could. But you need to consider the following:
    
- Earlier layers in the convolutional base encode more-generic, reusable features, whereas layers higher up encode more-specialized features. It’s more useful to fine-tune the more specialized features, because these are the ones that need to be repurposed on your new problem. There would be fast-decreasing returns in fine-tuning lower layers.
- The more parameters you’re training, the more you’re at risk of overfitting. The convolutional base has 15 million parameters, so it would be risky to attempt to train it on your small dataset

Thus, in this situation, it’s a good strategy to fine-tune only the top two or three layers in the convolutional base. Let’s set this up, starting from where you left off in the pre- vious example

In [66]:
### Freezing all layers up to a specific one
conv_base.trainable = True
set_trainable = False
for layer in conv_base.layers:
    if layer.name == 'block5_conv1':
        set_trainable = True
    if set_trainable:
        layer.trainable = True
    else:
        layer.trainable = False

NameError: name 'conv_base' is not defined

Now you can begin fine-tuning the network. You’ll do this with the RMSProp opti- mizer, using a very low learning rate. The reason for using a low learning rate is that you want to limit the magnitude of the modifications you make to the representations of the three layers you’re fine-tuning. Updates that are too large may harm these rep- resentations.

### Fine-Tuning the Model

In [68]:
model.compile(loss='binary_crossentropy', optimizer=optimizers.RMSprop(lr=1e-5),
             metrics=['acc'])

history = model.fit_generator(train_generator, steps_per_epoch=100, epochs=100,
                              validation_data=validation_generator,
                              validation_steps=50)

NameError: name 'train_generator' is not defined

In [None]:
# Plotting the Results
import matplotlib.pyplot as plt

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()

### Smoothing the plots

In [70]:
#Creating the smoothign function
def smooth_curves(points, factor=0.8):
    smoothed_points = []
    for point in points:
        if smoothed_points:
            previous =  smoothed_points[-1]
            smoothed_points.append(previous * factor + point * (1 - factor))
        else:
            smoothed_points.append(point)
        return smoothed_points
    
 # Plooting smoothed function   
plt.plot(epochs,
         smooth_curve(acc), 'bo', label='Smoothed training acc')
plt.plot(epochs,
         smooth_curve(val_acc), 'b', label='Smoothed validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()
plt.plot(epochs,
         smooth_curve(loss), 'bo', label='Smoothed training loss')
plt.plot(epochs,
         smooth_curve(val_loss), 'b', label='Smoothed validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()

SyntaxError: unexpected EOF while parsing (<ipython-input-70-992103996ec9>, line 3)

The validation accuracy curve look much cleaner. You’re seeing a nice 1% absolute improvement in accuracy, from about 96% to above 97%.

Note that the loss curve doesn’t show any real improvement (in fact, it’s deteriorat- ing). You may wonder, how could accuracy stay stable or improve if the loss isn’t decreasing? The answer is simple: what you display is an average of pointwise loss val- ues; but what matters for accuracy is the distribution of the loss values, not their aver- age, because accuracy is the result of a binary thresholding of the class probability predicted by the model. The model may still be improving even if this isn’t reflected in the average loss.

You can now finally evaluate this model on the test data:

In [None]:
test_generator = test_datagen.flow_from_directory(
    test_dir, target_size=(150, 150), batch_size=20, 
    class_mode='binary')

test_loss, test_acc = model.evaluate_generator(test_generator, steps=50,)
print('test acc:', test_acc)

### Visualizing Intermediate Activations

Visualizing intermediate activations consists of displaying the feature maps that are output by various convolution and pooling layers in a network, given a certain input (the output of a layer is often called its activation, the output of the activation func- tion). This gives a view into how an input is decomposed into the different filters learned by the network. You want to visualize feature maps with three dimensions: width, height, and depth (channels). Each channel encodes relatively independent features, so the proper way to visualize these feature maps is by independently plot- ting the contents of every channel as a 2D image. Let’s start by loading the model that you saved in section 5.2:

In [None]:
# load in a model
from keras.models import load_model
model = load_model('cats_and_dogs_small_2.h5')
model.summary()

In [None]:
# Preprocessing a single image
img_path = '/Users/fchollet/Downloads/cats_and_dogs_small/test/cats/cat.1700.jpg'

from keras.preprocessing import imag 
import numpy as np

img = image.load_img(img_path, target_size=(150,150))
img_tensor = image.img_to_array(img)
img_tensor = np.extend_dims(img_tensor, axis=0)
img_tensor /= 255
print(img_tensor.shape)

### Displaying the test picture

In [None]:
import matplotlib.pyplot as plt
plt.imshow(img_tensor[0])
plt.show()

In order to extract the feature maps you want to look at, you’ll create a Keras model that takes batches of images as input, and outputs the activations of all convolution and pooling layers. To do this, you’ll use the Keras class Model. A model is instantiated using two arguments: an input tensor (or list of input tensors) and an output tensor (or list of output tensors). The resulting class is a Keras model, just like the Sequential models you’re familiar with, mapping the specified inputs to the specified outputs. What sets the Model class apart is that it allows for models with multiple outputs, unlike Sequential. 

In [None]:
from keras import models

# Extracts the outputs of Creates a model that will return these the top eight layers
layer_outputs = [layer.output for layer in model.layers[:8]]

# Creates a model that will return these the top eight layers outputs, given the model input
acitivation_model = models.Model(inputs=model.input, outputs=layer_outputs)

When fed an image input, this model returns the values of the layer activations in the original model. This is the first time you’ve encountered a multi-output model in this book: until now, the models you’ve seen have had exactly one input and one output. In the general case, a model can have any number of inputs and outputs. This one has one input and eight outputs: one output per layer activation.

##### Running the model in predict mode

In [None]:
# Returns a list of five Numpy arrays: one array per layer activation
activations = activation_model.predict(img_tensor)

# For instance, this is the activation of the first convolution layer 
#for the cat image input:

first_layer_activation = activaiton[0]
print(first_layer_acitvation.shape)

##### Visualizing the Channel

In [None]:
# Visualizing the fourth channel
import matplotlib.pyplot as plt
plt.matshow(first_layer_activation[0,:,:,4], cmap='viridis')

This channel appears to encode a diagonal edge detector. Let’s try the seventh chan- nel (see figure 5.26)—but note that your own channels may vary, because the specific filters learned by convolution layers aren’t deterministic.


In [None]:
# Visualizing the seventh channel
import matplotlib.pyplot as plt
plt.matshow(first_layer_activation[0,:,:,7], cmap='viridis')

This one looks like a “bright green dot” detector, useful to encode cat eyes. At this point, let’s plot a complete visualization of all the activations in the network (see fig- ure 5.27). You’ll extract and plot every channel in each of the eight activation maps, and you’ll stack the results in one big image tensor, with channels stacked side by side

##### Visualizing every channel in every intermediate activation

In [None]:
layer_names = []
for layer in model.layers[:8]:
    layer_names.append(layer.name)
    
images_per_row = 16

for layer_name, layer_activation in zip(layer_names, activations):
    n_features = layer_activations.shape[-1]
    size = layer_activation.shape[1]
    n_cols = n_features // images_per_row
    display_grid = np.zeros((size*n_cols, images_per_row * size))
    
    for col in range(n_cols):
        for row in range(images_per_row):
            channel_imag = layer_activation[0,:,;,col * images_per_row + row]
            channel_image -= channel_image.mean()
            channel_image /= channel_image.std()
            channel_image *= 64
            channel_image += 128
            channel_image = np.clip(channel_image, 0, 255).astype('uint8')
            display_grid[col * size : (col + 1) * size,
             row * size : (row + 1) * size] = channel_image
scale = 1. / size
plt.figure(figsize=(scale * display_grid.shape[1],
                    scale * display_grid.shape[0]))
plt.title(layer_name)
plt.grid(False)
plt.imshow(display_grid, aspect='auto', cmap='viridis')

There are a few things to note here:
- The first layer acts as a collection of various edge detectors. At that stage, the activations retain almost all of the information present in the initial picture.
- As you go higher, the activations become increasingly abstract and less visually
interpretable. They begin to encode higher-level concepts such as “cat ear” and “cat eye.” Higher presentations carry increasingly less information about the visual contents of the image, and increasingly more information related to the class of the image.
- The sparsity of the activations increases with the depth of the layer: in the first layer, all filters are activated by the input image; but in the following layers, more and more filters are blank. This means the pattern encoded by the filter isn’t found in the input image

### Defining the loss tensor for filter visualizatinon

To implement gradient descent, you’ll need the gradient of this loss with respect to the model’s input. To do this, you’ll use the gradients function packaged with the backend module of Kera

In [None]:
from keras.applicatiosn import VGG16
from keras import backend as K

model =  VGG16(weights='imagenet', include_top=False)
layer_name = 'blaock3_covn1'
filter_index =0

layer_output = model.get_layer(layer_name).output
loss = K.mean(layer_output[:,:<:,filter_index])

### Gradient Normalization Trick

In [71]:
# Add 1e–5 before dividing to avoid accidentally dividing by 0.
grads /= (K.sqrt(K.mean(K.square(grads)))) + 1e-5)

SyntaxError: invalid syntax (<ipython-input-71-b53343c25e16>, line 2)

Now you need a way to compute the value of the loss tensor and the gradient tensor, given an input image. You can define a Keras backend function to do this: iterate is 
a function that takes a Numpy tensor (as a list of tensors of size 1) and returns a list of
two Numpy tensors: the loss value and the gradient value.

### Fetching Numpy output value given Numpy input values

In [None]:
iterate = K.funtion([model.input], [loss, grads])
import numpy as np
loss_value, grads_value = iterate([np.zeros((1,150,150,3))])

### Loss maximization via stochastic gradient descent

In [None]:
input_img_data = np.random.random((1, 150, 150, 3)) * 20 + 128

step = 1 # Magnitude of each gradient update

# uns gradient ascent for 40 steps
for i in range(40):
    # Computes the loss value and gradient value
    loss_value, grads_value = iterate([input_img_data])
    
    #Adjusts the input image in the direction that maximizes the los
    input_img_data += grads_value * step
    

### Utitlity function to convert a tensor into a valid  image

In [None]:

def deprocess_image(x):
    # ormalizes the tensor: centers on 0, ensures that std is 0.1
    x -=  x.mean()
    x /= (x.std()) + 1e-5)
    x*= 0.1
    
    # Clips to [0,1]
    x += 0.5
    x = np.clop(x, 0.1)
    
    x *= 255
    x = np.clip(x,0,255).astype('uint8')
    return x

### Function to generate filter visualizations

In [None]:
# Builds a loss function that maximizes the activation of 
#the nth filter of the layer under consideration
def generate_pattern(layer_name, filter_index, size=150):
       layer_output = model.get_layer(layer_name).output
       loss = K.mean(layer_output[:, :, :, filter_index])

# Computes the gradient of the input picture with regard to this loss
grads = K.gradients(loss, model.input)[0]

# Normalization trick: normalizes the gradient
grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5)

# Returns the loss and grads given the input picture
iterate = K.function([model.input], [loss, grads])

#Starts from a gray image with some noise
input_img_data = np.random.random((1, size, size, 3)) * 20 + 128.

# Runs gradient ascent for 40 steps
step = 1.
for i in range(40):
    loss_value, grads_value = iterate([input_img_data])
    input_img_data += grads_value * step
img = input_img_data[0]
return deprocess_image(img)

plt.imshow(generate_pattern('block3_conv1', 0))

### Genrating a grid of alll filter respones patterns in a layer

In [None]:
layer_name = 'block1_conv1'
size = 64
margin = 5

# Empty black image to rstroe results
results = np.zeros((8 * size + 7 * margin, 8 * size + 7 * margin, 3))

# Iterates over the rows of the results grid
# Then Iterates over the columns of the results grid
for i in range(8): 
    for j in range(8):
        # Generates the pattern for filter i + (j * 8) in layer_name
        filter_img = generate_pattern(layer_name, i + (j * 8), size=size)
        
        
        # Puts the result in the square (i, j) of the results grid
        horizontal_start = i * size + i * margin
horizontal_end = horizontal_start + size
vertical_start = j * size + j * margin
vertical_end = vertical_start + size
results[horizontal_start: horizontal_end,
        vertical_start: vertical_end, :] = filter_img
Puts the result in the square (i, j) of the results grid

# Displays the results grid
plt.figure(figsize=(20, 20))
plt.imshow(results)