# Convolutional Neural Network - Dog vs. Cat Image Classification

In this project we will train a Convolutional Neural Network to identify dogs and cats.
## Building the CNN

In [1]:
# Importing the Keras libraries and packages
# Sequential is what we will use to initialize our Neural Network
from keras.models import Sequential

# Conv2D is what we use to add our Convolutional layers.
# 2D for images. 3D would be for video (adding time)
from keras.layers import Conv2D

# MaxPooling2D is what we will use for the pooling step
from keras.layers import MaxPooling2D

# Flatten allows us to turn our pooling layer into a large feature input vector
from keras.layers import Flatten

# Dense to add fully connected layers
from keras.layers import Dense

Using TensorFlow backend.


In [2]:
# Initializing the CNN
classifier = Sequential()

### Convolution

![convolution](convolution.png)

In [3]:
# call Conv2D to create a 2-dimensional CNN
# 32 is the number of feature detectors in our image of 3x3 dimensions
# Our Convolution Layer is composed of 32 feature maps
# Input shape is the shape of our input images. 64x64 pixels & 3 channels because we are using color images
# use rectifier function for activation
classifier.add(Conv2D(32, (3,3), input_shape = (64,64,3), activation = 'relu'))

![input_shape](input_shape.png)
![dog_rgb](dog_rgb.png)

### Pooling
Pooling condenses the feature map (generated by the convolution) into a pooled feature map. It does this by cycling through the feature map with a 2x2 square and stores the max number of features found in each cycle. The main reason we do this is to reduce the number of nodes in the Flattening step. Moreover, this reduces complexity and time execution without losing performance.

![maxpool](maxpool.png)

In [4]:
# set our pool_size to 2x2, so we cycle through conv. with a 2x2 box
classifier.add(MaxPooling2D(pool_size = (2,2)))

### Adding a Second Convolutional Layer

In [5]:
# Note we do not need to provide the input shape because it's coming from 1st conv. layer
# Same filter, kernel size, & activation function
classifier.add(Conv2D(32, (3,3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2,2)))

### Flattening
The flattening process converts the pooling layer into a future input layer for an artificial neural network. We take all of our pooled feature maps and put them into one single vector. 

![flattening](flattening.png)

In [6]:
classifier.add(Flatten())

#### 2 Important Flattening Questions

 1. *Why don't we lose the spacial structure by flattening all these feature maps into one single vector?*
 
 By creating our feature maps, we extracted the spacial structure information by representing spacial structure with larger numbers. These large numbers were generated thanks to the feature detectors that we applied on the input image in the *Convolution* step. High numbers in the feature maps are associated to a specific feature in the input image. Then by applying *Max Pooling* we keep the large numbers because we take the max. The flattening step just consists of putting the numbers from the cells in the pooling layer into one single vector.<br>
 <br>
 2. *Why didn't we just skip the Convolution and Pooling steps and, instead, take all the pixels from the input image and flatten them into the one single vector?*
 
 If we were to flatten the pixels from the input image and put them into one single vector, then each node of this massive vector would represent one pixel of the image independent from its surrounding pixels. We would only gain information from a single pixel instead of how each pixel is spacially connected to pixels around it.

### Full Connection
We will use our flattened vector as the input to a classic artificial neural network, because an ANN is a great classifier for non-linear problems, such as image classification. The Dense() function creates a new layer, known as a fully connected layer, where every input node is connected to every fully connected layer node.

![full_connection](full_connection.png)

In [7]:
# take our classifier and add a fully-connected hidden layer
# A common practice that leads to good results is to set the # of input nodes to around 100
# 128 because it is a power of 2
# Set activation function to rectifier function
classifier.add(Dense(units = 128, activation = 'relu'))

# Now add our output layer
# It will only output one node - dog or a cat
# Because our output is binomial, we choose the sigmoid function
classifier.add(Dense(units = 1, activation = 'sigmoid'))

### Compile the CNN

Last thing remaining is to compile our CNN!

In [8]:
# call compile and set optimizer to 'adam' for gradient descent
# set our loss function to binary_crossentropy because it corresponds to logarithmic loss & binary outcome
# if we had more than 2 outcomes, we would just choose 'crossentropy'
# Use accuracy as our performance metric
classifier.compile(optimizer='adam', loss = 'binary_crossentropy', metrics=['accuracy'])

![log_loss](log_loss.png)

# Fitting the CNN to the Images

In [9]:
# import Keras module
from keras.preprocessing.image import ImageDataGenerator

To reduce overfitting, we will implement an image augmentation trick that allows us to enrich our training set without needing to add more images. It does this by randomly tweaking some of the images by zooming in, blurring, rotating, etc. This will allow us to generate good performance results while minimizing the risk of overfitting.

In [10]:
# rescale multiplies the image data by the number provided
# shear range determines the shear intensity (in radians)
# zoom range determines the amount of zoom
# set horizontal_flip to True so model will randomly flip images horizontally
train_datagen = ImageDataGenerator(rescale = 1./255, shear_range = 0.2, zoom_range = 0.2, horizontal_flip = True)

In [11]:
# Same for test data
test_datagen = ImageDataGenerator(rescale = 1./255)

In [12]:
# Create a training_set composed of all augmented images extracted from our image data generator.
# Set path to image directory
# Set target_size to 64x64 pixels because these dimensions are expected by CNN
# 32 is the size of our batches that include random samples
# batch_size also sets the number of images to pass through the CNN before the weights are updated
# Class_mode = binary because we have 2 classes - cats and dogs
training_set = train_datagen.flow_from_directory('dataset/training_set', target_size = (64,64), batch_size = 32, class_mode = 'binary')

Found 8000 images belonging to 2 classes.


In [13]:
# Create a test_set composed of all augmented images extracted from our image data generator.
test_set = test_datagen.flow_from_directory('dataset/test_set', target_size = (64,64), batch_size = 32, class_mode = 'binary')

Found 2000 images belonging to 2 classes.


In [14]:
# train our classifier with the training_set
# steps_per_epoch is the number of images in training_set
# epochs are the number of cycles to repeat the training
# validation data is the test_set
# validation step corresponds to the 2000 test images
classifier.fit_generator(training_set, steps_per_epoch = 8000, epochs = 20, validation_data = test_set, validation_steps = 2000)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x11d241610>

### Results

After 20 epochs, our model was extremely accurate at classifying dogs vs. cats for the training_set - **99.61%**, and our model performed pretty well on the test_set - **81.75%**. To further improve our model, we can always consider adding additional *Convolutional Layers* and/or additional *Fully Connected Layers*. The downside of adding additional convolutional layers is that it adds model complexity and the model will take longer to train. We can also add more training images so our model will be able to extract additional information.

### Single Predictions with CNNs
Now we will test the trained CNN model on two images (one of a dog and one of a cat) it has never seen before.

In [21]:
import numpy as np

In [22]:
from keras.preprocessing import image

In [24]:
# load our images
test_image1 = image.load_img('dataset/single_prediction/cat_or_dog_1.jpg', target_size = (64,64))
test_image2 = image.load_img('dataset/single_prediction/cat_or_dog_2.jpg', target_size = (64,64))

**Test Image 1**
![Stella](dataset/single_prediction/cat_or_dog_1.jpg)

**Test Image 2**
![Hobie](dataset/single_prediction/cat_or_dog_2.jpg)

In [25]:
test_image1 = image.img_to_array(test_image1)
test_image2 = image.img_to_array(test_image2)

In [26]:
test_image1 = np.expand_dims(test_image1, axis=0)
test_image2 = np.expand_dims(test_image2, axis=0)

In [27]:
result1 = classifier.predict(test_image1)
result2 = classifier.predict(test_image2)

In [28]:
training_set.class_indices

{'cats': 0, 'dogs': 1}

In [31]:
if result1[0][0] == 1:
    prediction1 = 'dog'
else:
    prediction1 = 'cat'
        
if result2[0][0] == 1:
    prediction2 = 'dog'
else:
    prediction2 = 'cat'




In [32]:
print "Test Image 1 Prediction: ", prediction1

Test Image 1 Prediction:  dog


In [33]:
print "Test Image 2 Prediction: ", prediction2

Test Image 2 Prediction:  cat
