# An introduction to Convolutional Neural Networks in 3 Parts
In this notebook we will start with 2 different threads:

1. An introduction to processing images. In our previous work with images (for example the MNIST data), the images were provided to us preprocessed. So we got a nicely formated CSV file. In this thread we will learn to process actual jpg images. 
2. An introduction to convolutional networks (CNN).

Once you learn the basics of these two threads, you will combine them to create a CNN that processes images. Let's get started.


# Part 1: Dogs and Cats
Who doesn't love dogs or cats?

### from F. Chollet

## Part 1 Step 1: Downloading the data

The cats vs. dogs dataset that we will use isn't packaged with Keras. It was made available by Kaggle.com as part of a computer vision 
competition in late 2013, back when convnets weren't quite mainstream. You can download the original dataset at: 
`https://www.kaggle.com/c/dogs-vs-cats/data` (you will need to create a Kaggle account if you don't already have one -- don't worry, the 
process is painless).

The pictures are medium-resolution color JPEGs. They look like this:

![cats_vs_dogs_samples](https://s3.amazonaws.com/book.keras.io/img/ch5/cats_vs_dogs_samples.jpg)

Unsurprisingly, the cats vs. dogs Kaggle competition in 2013 was won by entrants who used convnets. The best entries could achieve up to 
95% accuracy. In our own example, we will get fairly close to this accuracy (in the next section), even though we will be training our 
models on less than 10% of the data that was available to the competitors.
This original dataset contains 25,000 images of dogs and cats (12,500 from each class) and is 543MB large (compressed). After downloading 
and uncompressing it, we will create a new dataset containing three subsets: a training set with 1000 samples of each class, a validation 
set with 500 samples of each class, and finally a test set with 500 samples of each class.

Here are a few lines of code to do this:

### Don't forget to change the directory names to match your file system.

In [3]:
import os, shutil
# The path to the directory where the original
# dataset was uncompressed
original_dataset_dir = '/Users/raz/data/dogTrain'

# The directory where we will
# store our smaller dataset
base_dir = '/Users/raz/data/cats_and_dogs_small'
os.mkdir(base_dir)

# Directories for our training,
# validation and test splits
train_dir = os.path.join(base_dir, 'train')
os.mkdir(train_dir)
validation_dir = os.path.join(base_dir, 'validation')
os.mkdir(validation_dir)
test_dir = os.path.join(base_dir, 'test')
os.mkdir(test_dir)

# Directory with our training cat pictures
train_cats_dir = os.path.join(train_dir, 'cats')
os.mkdir(train_cats_dir)

# Directory with our training dog pictures
train_dogs_dir = os.path.join(train_dir, 'dogs')
os.mkdir(train_dogs_dir)

# Directory with our validation cat pictures
validation_cats_dir = os.path.join(validation_dir, 'cats')
os.mkdir(validation_cats_dir)

# Directory with our validation dog pictures
validation_dogs_dir = os.path.join(validation_dir, 'dogs')
os.mkdir(validation_dogs_dir)

# Directory with our validation cat pictures
test_cats_dir = os.path.join(test_dir, 'cats')
os.mkdir(test_cats_dir)

# Directory with our validation dog pictures
test_dogs_dir = os.path.join(test_dir, 'dogs')
os.mkdir(test_dogs_dir)

# Copy first 1000 cat images to train_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(train_cats_dir, fname)
    shutil.copyfile(src, dst)

# Copy next 500 cat images to validation_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(1000, 1500)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(validation_cats_dir, fname)
    shutil.copyfile(src, dst)
    
# Copy next 500 cat images to test_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(1500, 2000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(test_cats_dir, fname)
    shutil.copyfile(src, dst)
    
# Copy first 1000 dog images to train_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(train_dogs_dir, fname)
    shutil.copyfile(src, dst)
    
# Copy next 500 dog images to validation_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(1000, 1500)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(validation_dogs_dir, fname)
    shutil.copyfile(src, dst)
    
# Copy next 500 dog images to test_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(1500, 2000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(test_dogs_dir, fname)
    shutil.copyfile(src, dst)

FileExistsError: [Errno 17] File exists: '/Users/raz/data/cats_and_dogs_small'

## Part 1 Step 2: Data preprocessing

As you already know by now, data should be formatted into appropriately pre-processed floating point tensors before being fed into our 
network. Currently, our data sits on a drive as JPEG files, so the steps for getting it into our network are roughly:

* Read the picture files.
* Decode the JPEG content to RBG grids of pixels.
* Convert these into floating point tensors.
* Rescale the pixel values (between 0 and 255) to the [0, 1] interval (as you know, neural networks prefer to deal with small input values).

It may seem a bit daunting, but thankfully Keras has utilities to take care of these steps automatically. Keras has a module with image 
processing helper tools, located at `keras.preprocessing.image`. In particular, it contains the class `ImageDataGenerator` which allows to 
quickly set up Python generators that can automatically turn image files on disk into batches of pre-processed tensors. This is what we 
will use here.

In [None]:
from keras.preprocessing.image import ImageDataGenerator

# All images will be rescaled by 1./255
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        # This is the target directory
        train_dir,
        # All images will be resized to 150x150
        target_size=(150, 150),
        batch_size=20,
        # Since we use binary_crossentropy loss, we need binary labels
        class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
        validation_dir,
        target_size=(150, 150),
        batch_size=20,
        class_mode='binary')

**note**: the generators convert the jpeg images to the `target_size`. In the example above, the images are scaled to 150x150x3.

Suppose we construct a fully neural network model. We can fit our model to the data using the `train_generator` and `validation_generator`. We do it using the `fit_generator` method, the equivalent of `fit` for data generators 
like ours. It expects as first argument a Python generator that will yield batches of inputs and targets indefinitely, like ours does. 
Because the data is being generated endlessly, the generator needs to know example how many samples to draw from the generator before 
declaring an epoch over. This is the role of the `steps_per_epoch` argument: after having drawn `steps_per_epoch` batches from the 
generator, i.e. after having run for `steps_per_epoch` gradient descent steps, the fitting process will go to the next epoch. In our case, 
batches are 20-sample large, so it will take 100 batches until we see our target of 2000 samples.

When using `fit_generator`, one may pass a `validation_data` argument, much like with the `fit` method. Importantly, this argument is 
allowed to be a data generator itself, but it could be a tuple of Numpy arrays as well. If you pass a generator as `validation_data`, then 
this generator is expected to yield batches of validation data endlessly, and thus you should also specify the `validation_steps` argument, 
which tells the process how many batches to draw from the validation generator for evaluation.

Here's an example:


      history = model.fit_generator(
		      train_generator,
		      steps_per_epoch=100,
		      epochs=30,
		      validation_data=validation_generator,
		      validation_steps=50)
 
## Part 1 Step 3: create and compile a fully connected model.
Construct a fully connected model like you did for the mnist data. The architecture should be:

    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    flatten_1 (Flatten)          (None, 67500)             0         
    _________________________________________________________________
    dense_3 (Dense)              (None, 512)               34560512  
    _________________________________________________________________
    dense_4 (Dense)              (None, 1)                 513       
    =================================================================
    Total params: 34,561,025
    Trainable params: 34,561,025
    Non-trainable params: 0
    __________________________

Compile the network using the following parameters:

 parameter | value
 :---: | :---:
 loss |'binary_crossentropy'
 optimizer| optimizers.RMSprop(lr=1e-4)
 metrics | ['acc']


In [None]:
from keras import models
from keras import layers

network = models.Sequential()
network.add(layers.Flatten(input_shape=(150, 150, 3)))

# YOUR WORK HERE

network.summary()


## Part 1 Step 4: fit the model 
Fit the model as described in step 2 above.


## Part 1 Step 5: evaluate the model on the test data

In [None]:
scoreSeg = network.evaluate_generator(validation_generator,5250)
print("Accuracy = ",scoreSeg[1])



That accuracy is pretty disappointing. Shortly we will see if we can do better. But first ...

# Part 2: Convolutional Neural Networks 

## A guided tour

Let's first practice what we learned in the first Keras Python Notebook

## Part 2 Step 1. Load the MNIST Dataset
First let's load the MNIST dataset from Keras into Numpy arrays called: 

     train_images, train_labels, test_images, test_labels
     
(We've done this before)

## Part 2 Step 2: Reshape the Data
In the first notebook we reshaped the data into

     (60000, 28 * 28)
     
Basically, we flattened the image. For this notebook let's retain the 2D structure and reshape it:

     ((60000, 28, 28, 1))
     
At the same time we should also set the type to be `float32`

At the end of this step you should have new values for `train_images` and `test_images`

## Part 2 Step 3: Categorically Encode the Labels

## Part 2 Step 4: Creating the Model - The ConvNet layers
This part I will give to you. It was explained in the associated lecture.

In [None]:
from keras import layers
from keras import models

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.summary()

## Part 2 Step 5. Adding Layers
Now we need to flatten the outputs of the ConvNet layers to the 1D representation our classifier needs.
The classification layer will be exactly the same as that in the first notebook

In [None]:
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
# ADD CLASSIFICATION LAYER

## Part 2 Step 6: Compile the Model
Now it is time to compile the model. Use the same optimizer, loss function, and accuracy metrics we used before.

## Part 2 Step 7. Fit the Model to the Training Data
Use 5 epochs and a batch size of 64. Don't forget to save the history. We will also use the `fit` parameter `validation_split`. Here is what the Keras documentation says about it: 

> validation_split: Float between 0 and 1. Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. The validation data is selected from the last samples in the x and y data provided, before shuffling.

Let's use a validation split of 0.2

We will save the intermediate results of fitting by using a variable:

    history = model.fit( ...

## Part 2 Step 8. Run the Model on The Test Data
Run the Model on the Test Data and print the accuracy

## Part 2 Step 9. Graph The Loss and Accuracy
Let's use Matplotlib to plot the training and validation loss side by side, as well as the training and validation accuracy:

In [None]:
import matplotlib.pyplot as plt

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()

# Part 3: Dogs 'n Cats with ConvNets
This Python notebook is a slight remix of one by François Chollet for his book *Deep Learning With Python*

## Previously ... 
we downloaded the data then divided it up into a set of image files inside a number of directories. Let's make sure we have those images

In [None]:
import os
base_dir = '/Users/raz/data/cats_and_dogs_small'
train_dir = os.path.join(base_dir, 'train')
train_cats_dir = os.path.join(train_dir, 'cats')
train_dogs_dir = os.path.join(train_dir, 'dogs')
validation_dir = os.path.join(base_dir, 'validation')
validation_cats_dir = os.path.join(validation_dir, 'cats')
validation_dogs_dir = os.path.join(validation_dir, 'dogs')
test_dir = os.path.join(base_dir, 'test')
test_cats_dir = os.path.join(test_dir, 'cats')
test_dogs_dir = os.path.join(test_dir, 'dogs')


print('total training cat images:', len(os.listdir(train_cats_dir)))
print('total training dog images:', len(os.listdir(train_dogs_dir)))
print('total validation cat images:', len(os.listdir(validation_cats_dir)))
print('total validation dog images:', len(os.listdir(validation_dogs_dir)))
print('total test cat images:', len(os.listdir(test_cats_dir)))
print('total test dog images:', len(os.listdir(test_dogs_dir)))

This should print:
    
    total training cat images: 1000
    total training dog images: 1000
    total validation cat images: 500
    total validation dog images: 500
    total test cat images: 500
    total test dog images: 500
    
## Part 3 Step 1: Building a ConvNet Model
We've already built a small convnet for MNIST in the previous example, so you should be familiar with them. We will reuse the same general structure: our convnet will be a stack of alternated `Conv2D` (with relu activation) and `MaxPooling2D` layers. Here are the steps we would like to do:

1. Create a sequential model
2. Add a `Conv2D` layer. Use 3x3 patches and a depth of 32. The input with be 150 x 150 pixel RGB images (depth of 3)
3. Add a `MaxPooling2D`  with a patch size of 2 x 2. 
4. Add another `Conv2D` layer. Use 3x3 patches and a depth of 64. 
5. Add a `MaxPooling2D`  with a patch size of 2 x 2. 
6. Add another `Conv2D` layer. Use 3x3 patches and a depth of 128.
7. Add a `MaxPooling2D`  with a patch size of 2 x 2. 
8. Add another `Conv2D` layer. Use 3x3 patches and a depth of 128.
9. Add a `MaxPooling2D`  with a patch size of 2 x 2. 
10. Finish up by flattening and adding a dense layer

In [None]:
# ADD YOUR CODE HERE




model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))




### Verify that we did it right. 
Let's check by using the summary method:

In [None]:
model.summary()

You should see:
    
    
		_________________________________________________________________
		Layer (type)                 Output Shape              Param #   
		=================================================================
		conv2d_1 (Conv2D)            (None, 148, 148, 32)      896       
		_________________________________________________________________
		max_pooling2d_1 (MaxPooling2 (None, 74, 74, 32)        0         
		_________________________________________________________________
		conv2d_2 (Conv2D)            (None, 72, 72, 64)        18496     
		_________________________________________________________________
		max_pooling2d_2 (MaxPooling2 (None, 36, 36, 64)        0         
		_________________________________________________________________
		conv2d_3 (Conv2D)            (None, 34, 34, 128)       73856     
		_________________________________________________________________
		max_pooling2d_3 (MaxPooling2 (None, 17, 17, 128)       0         
		_________________________________________________________________
		conv2d_4 (Conv2D)            (None, 15, 15, 128)       147584    
		_________________________________________________________________
		max_pooling2d_4 (MaxPooling2 (None, 7, 7, 128)         0         
		_________________________________________________________________
		flatten_1 (Flatten)          (None, 6272)              0         
		_________________________________________________________________
		dense_1 (Dense)              (None, 512)               3211776   
		_________________________________________________________________
		dense_2 (Dense)              (None, 1)                 513       
		=================================================================
		Total params: 3,453,121
		Trainable params: 3,453,121
		Non-trainable params: 0
        
## Part 3 Step 2: Compile the Model

Set the parameters:

* set loss to be binary_crossentropy
* use `optimizers.RMSprop(lr=1e-4)` as the optimizer
* for metrics use `acc`



## Part 3 Step 3: # Data Preprocessing
Use the same data preprocessing steps we used for the original Dogs n' Cats. You should create a `train_generator` and a `validation_generator`. 

## Part 3 Step 4: Fit the model
Use `fit_generator`. Don't forget to save the history.

1. Use 30 epochs
2. Each with 100 steps
3. Use the `validation_generator` as the validation data
4. Set the validation steps to 50.



## Part 3 Step 5: Evaluate the model using the test data.

## Part 3 Step 6: Plot the loss and accuracy:

## Look at those plots. 
What do they indicate? Seriously, spend a few minutes looking at them. Please.

# OPTIONAL Using data augmentation

This section is directly from François Chollet 

Overfitting is caused by having too few samples to learn from, rendering us unable to train a model able to generalize to new data. Given infinite data, our model would be exposed to every possible aspect of the data distribution at hand: we would never overfit. Data augmentation takes the approach of generating more training data from existing training samples, by "augmenting" the samples via a number of random transformations that yield believable-looking images. The goal is that at training time, our model would never see the exact same picture twice. This helps the model get exposed to more aspects of the data and generalize better.
In Keras, this can be done by configuring a number of random transformations to be performed on the images read by our ImageDataGenerator instance. Let's get started with an example:



In [None]:
from keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
      rotation_range=40,
      width_shift_range=0.2,
      height_shift_range=0.2,
      shear_range=0.2,
      zoom_range=0.2,
      horizontal_flip=True,
      fill_mode='nearest')

These are just a few of the options available (for more, see the Keras documentation). Let's quickly go over what we just wrote:

* `rotation_range` is a value in degrees (0-180), a range within which to randomly rotate pictures.
* `width_shift` and `height_shift` are ranges (as a fraction of total width or height) within which to randomly translate pictures 
vertically or horizontally.
* `shear_range` is for randomly applying shearing transformations.
* `zoom_range` is for randomly zooming inside pictures.
* `horizontal_flip` is for randomly flipping half of the images horizontally -- relevant when there are no assumptions of horizontal 
asymmetry (e.g. real-world pictures).
* `fill_mode` is the strategy used for filling in newly created pixels, which can appear after a rotation or a width/height shift.

Let's take a look at our augmented images:

In [None]:
import matplotlib.pyplot as plt 
# This is module with image preprocessing utilities
from keras.preprocessing import image

fnames = [os.path.join(train_dogs_dir, fname) for fname in os.listdir(train_dogs_dir)]

# We pick one image to "augment"
#print(os.listdir(train_dogs_dir).index('dog.788.jpg'))
img_path = fnames[107]
# Read the image and resize it
img = image.load_img(img_path, target_size=(150, 150))

# Convert it to a Numpy array with shape (150, 150, 3)
x = image.img_to_array(img)

# Reshape it to (1, 150, 150, 3)
x = x.reshape((1,) + x.shape)

# The .flow() command below generates batches of randomly transformed images.
# It will loop indefinitely, so we need to `break` the loop at some point!
i = 0
for batch in datagen.flow(x, batch_size=1):
    plt.figure(i)
    imgplot = plt.imshow(image.array_to_img(batch[0]))
    i += 1
    if i % 4 == 0:
        break

plt.show()

If we train a new network using this data augmentation configuration, our network will never see twice the same input. However, the inputs 
that it sees are still heavily intercorrelated, since they come from a small number of original images -- we cannot produce new information, 
we can only remix existing information. As such, this might not be quite enough to completely get rid of overfitting. To further fight 
overfitting, we will also add a Dropout layer to our model using: 

     augmented_model.add(layers.Dropout(0.5))

between the flatten layer and the densely-connected classifier. Copy your model declaration from above and add this line.

## Compile the Model
Use the same settings as before

Sweet! Here is the code for the augmented data generator:

In [None]:
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,)

# Note that the validation data should not be augmented!
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        # This is the target directory
        train_dir,
        # All images will be resized to 150x150
        target_size=(150, 150),
        batch_size=32,
        # Since we use binary_crossentropy loss, we need binary labels
        class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
        validation_dir,
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')



## Fit the model
Now we will use these new generators to fit the model.

Use `fit_generator`. Don't forget to save the history.

1. Use 100 epochs
2. Each with 100 steps
3. Use the `validation_generator` as the validation data
4. Set the validation steps to 50

### Let's plot our results again
One plot showing the training and validation accuracy, another showing training and validation loss.



# Questions

1. What do these graphs show? How are they different from the graphs in the previous section?
2. What was the point of data augmentation? 
   1. How did it help?
   2. When would we use it?
3. In the very first codeblock of this notebook we created training, validation and test sets. 
   1. Why do we need them?
   2. What does each one do?
   3. Why can't we use the training set for validation?
4. What is a dropout layer?