In [7]:
from keras.models import Sequential

used to initialize the neural network, because there are two ways of initializing
a neural network, either as a sequence of layers or as a graph.

In [8]:
from keras.layers import Convolution2D

 used for the convultional step in the CNN; where we add the 
 convolutional layers; since we are using images, we use the 2D class.

In [9]:
from keras.layers import MaxPooling2D

used for step two; the pooling step; which will add our pooling layers so we can decrease the size of our images

In [10]:
from keras.layers import Flatten

used for step 3; flattening; where we convert all of the pooled feature maps that
 we create through convolution and max pooling into a feature vector that will eventually become our input of
 our fully connected layers

In [11]:
from keras.layers import Dense

 used to add the fully connected layers in the classic neural network.

 Each package correspondes to one step of the construction of the CNN.

In [12]:
classifier = Sequential() # initializing the CNN as a sequence of layers.

In [13]:
classifier.add(Convolution2D(32,3, 3, input_shape = (64, 64, 3), activation = 'relu'))

  """Entry point for launching an IPython kernel.


First argument is the number of filters; the number of filters that we choose is the number of feature maps 
that we want to create as well; i.e, one feature map created for each filter used. 
So this means we created 32 feature detectors of 3x3 dimensions and so our convolutional layer will be 
composed of 32 feature maps.

Next parameter is border_mode ='same'; this is how we want to specify how the feature detectors will handle the
borders of the input image, most of the time we choose 'same', so we leave it as default

Next argument is input_shape; the shape of the input image on which you are going to apply your feature maps; we need to specify the expected format of our input images. Also need to be careful because the docs for this parameter say input_shape = (3,256,256) and thats the order for the THEANO backend; here we are using the TENSORFLOW backend; run the first cell to see. The order for the TENSORFLOW backend is actually input_shape = (256,256, 3) wher 256,256 is the dimension of our input array, and 3; where 3 is the number of channels; i.e, red,green,blue (it would only be 1 if we were dealing with black and white image)

Last argument is the activation function which we will set as the Rectifier Activation Function (RELu) so that we make sure we dont have any negative pixel values in our feature maps. Dependeing on the parameters that we use for the convolution operation, we can get negative pixels in the feature map and we want to remove these negative pixels in order to have non-linearity in the CNN. Because classifying images is a non-linear problem and we need to have non-linearity in our model; so we use relu activation function to break up the linearit of the image. By linearity we mean like the flow of the colors; we want to break that up so that we can get some distict parts of the image.

Now we are going to apply the max pooling step so that we can reduce the number of nodes we'll get in the
flattening step, and then the full connection step for the input layer of the feature NN.


In [14]:
classifier.add(MaxPooling2D(pool_size = (2,2)))

basically this line will reduce the size of the feature maps # by dividing by 2. 

the size parameter, pool_size = (); the pool size i.e, is like how much you slide the matrix around 

Applying maxpooling to reduce the size of the feature maps and therefore reduce the number of nodes in the upcoming
fully connected layers; it will reduce the complexity and the time execution w/o losing the performance.

We are keeping track of the parts of the image that contain the high numbers corresponding to where the
feature detectors detected some specific features in the input image; so we will not lose spatial structure  information and therefore we do not lose the performance of the model; and at the same we manage to reduce the 
time complexity and less expensive

Note: The reason we are applying the MaxPooling step and the Convultional Layers is so that when we get to the final
step, the flattening step, we dont have this huge 1D vector of input nodes that represents every single pixel in the image independently of the pixels that are around it.

We want information of the spatial structure for the pixels; so we apply max pooling and convolutional layers; using
our feature maps, we extracted spatial structure information.


In [15]:
classifier.add(Flatten())

Flattening step completed; a single vector was created that contains all of the 
information of the spatial structure of the images.

Now the only thing left to do is to create FULLY CONNECTED a classic neural network that will classify the images;
And it will classify them well thanks to this input vector (flattened matrix vector) that contains the information
of the spatial structure.

In [16]:
classifier.add(Dense(output_dim = 128, activation = 'relu'))

  """Entry point for launching an IPython kernel.


Added the hidden layer; the fully connected layer.

Using the Dense function to add a fully connected layer.

The first parameter is output_dim is the number of nodes in the hidden layer; how many nodes should we input?
Common practice is to choose a number between the number of hidden nodes and the number of output nodes; we are
experimenting; also its common practice to pick a number of the form; where x is an integer,

$$2^x$$ 

So we will go with x = 7; 128 hidden nodes in the hidden layer

Below we are going to define the output layer using the activation function, sigmoid,
because we have a binary outcome, so we say output_dim = 1, since we are only expecting one node that is going to 
be the predicted probability of one class; in this case, cat or dog.

In [17]:
classifier.add(Dense(output_dim = 1, activation = 'sigmoid'))

  """Entry point for launching an IPython kernel.


Compiling the CNN using the compile( ) method below

In [18]:
classifier.compile(optimizer = 'adam',loss = 'binary_crossentropy', metrics = ['accuracy'])

Using the SGD algorithm, adam algorithm, the loss function is going to be the binary_crossentropy, since this function
corresponds to the logarithmic loss which is the loss function generally used for classification problems using 
a classification model like logistic regression; and also because we have a binary outcome.

Note: If we had more than two outcomes, we would need to use categorical_crossentropy

And the last parameter we will use is the performance metric; i.e, how we are measuring accurary (because thats
what we want, we want to be accurate), so we set metrics = ['accuracy'].

----------------Part 2: Fitting the CNN to the images-----------------

Using a shortcut from the Keras documentation for Image Augmentation.

Consists of pre-processing the images to prevent overfitting.

If we do not do this image augmentation then we will end up with great accuracy results on the training set,
but a much lower accuracy on the test set due to an overfit on the training set.

The first function we are going to use to perform image augmentation can be found at the top of 
https://keras.io/preprocessing/image/

Question: What is image augmentation? 

Answer: To begin with, one of the situations that lead to overfitting is when we have few data to train our 
model. When this occurs, the model finds correlations in the few observations of the training set but fails to
GENERALIZE these correlations on some new set of observations.

When it comes to images, we need a lot of images to find and generalize some correlations.

In computer vision, our ML model does not simply need to find some correlations between some independent variables
and dependent variables, it needs to find patterns in the pixels, and to do this requires many examples and test 
images.

Data augmentation will create many batches of our images and in each batch it will apply some random transformations
on a random selection of our images. Like rotating, flipping, shifting, or even shearing(pushing) the images. 

What we will end up with is many more diverse images inside these batches and therefore a lot more images to train.

That's why its called, image augmentation; because the training images are augmented.

And since the transformations are random transformations the model will never find the same picture across the 
batches.

Summary: Image augmentation is a technique that allows us to enrich our training set without adding more images
and therefore allows us to get good performance results with little to no overfitting, even with a small number of
images.

The following code was copy and pasted(with a few name and number edits) from https://keras.io/preprocessing/image/
under the .flow_from_directory(directory) section.

Importing the ImageDataGenerator class below to perform the image augmentation

In [24]:
from keras.preprocessing.image import ImageDataGenerator

Now we are rescaling all of the pixel values. 

Pixels take values between 0 and 255 and by rescaling them with ${1/255}$, then all of the pixel values 
will be between 0 and 1; for sigmoid purposes.

Shear_range is to apply random transvections, we'll set it equal to 0.2

Zoom_range is to apply random zooms, we'll set it equal to 0.2

The 0.2 values are just how much we want to apply these random transformations.

Horizontal flip will flip the images horizontally so that we don't find the same image in the different batches. 

In [30]:
train_datagen = ImageDataGenerator( 
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

Line below, we are doing the same thing we just did to the training set, except now to the test_set
and we only need to rescale and leave everything else as default.

In [31]:
test_datagen = ImageDataGenerator(rescale=1./255)

We are beginning by setting the directory to where the data is located.
For example, the reason that it only says 'dataset/training_set' is because this jupyter notebook that I am currently 
typing is inside the same directory so its very simple. But if you need to, just modify the directory below to where
the cats and dogs training and test sets are located.

No we are going to specify the target_size; it is the size of the images expected in the CNN model; (64,64,3); but
just the size, so (64,64)

Then the batch_size; it is the size of the batches in which random samples of our images will be included and contains the number of images that will go through the CNN after which the weights will be updated.
We'll let it equal 32 to train our CNN. (The weights get updated after every batch).

Lastly is class_mode; its the parameter indicating if your dependent variable is binary or has more than two 
catgories and since we have cats and dogs, which is two, we'll let the class_mode = 'binary'.

In [32]:
training_set = train_datagen.flow_from_directory(
        'dataset/training_set',
        target_size = (64, 64),
        batch_size = 32, 
        class_mode ='binary')

Found 8000 images belonging to 2 classes.


Note above where it says "Found 8000 images belonging to 2 classes." when you run that block, it says that because
of how nice we set up the folder. When you're classifying, make sure the folder are organized :)
Similar parameter values for the test_set below:

In [33]:
test_set = test_datagen.flow_from_directory(
        'dataset/test_set',
        target_size=(64, 64),
        batch_size=32,
        class_mode='binary')

Found 2000 images belonging to 2 classes.


Finally, the model.fit_generator function below: (model in our case is the classifier object from above)

The first argument is the training set so we say training_set.

Second parameter is the samples per epoch; which is the number of images we have in our training set.
Because all of the observations of the training set pass through the convolutional neural network during each epoch,
and since we have 8000 images our training set we need to set steps_per_epoch = 8000.

Then the third parameter, epochs; the number of epochs we want to choose to train our CNN. We'll let it be 25;
But you could choose a larger number; it depends on how long you're willing to wait.

Then the fourth parameter, validation_data; corresponds to the test set on which we want to evaluate the 
performance of our CNN. So validation_data = test_set.

And finally, the validation_steps; corresponds to the number of images in our test set which is 2000.

The following code will take 20 minutes to run on a CPU.

In [None]:
classifier.fit_generator(
        training_set,
        steps_per_epoch = 8000,
        epochs = 1,
        validation_data = test_set,
        validation_steps = 2000)

We want to make our model more accurate. 

So we are going to increase the wideness; increase the number of hidden layers; make the model deeper.

How can we make it deeper? 

Two options: 

First option: Add another convolutional layer:

classifier.add(Convolution2D(32,3, 3, input_shape = (64, 64, 3), activation = 'relu'))

Second Option: Add another fully connected layer:

classifier.add(Dense(output_dim = 128, activation = 'relu'))

The best solution is to add another convolutional layer:

classifier.add(Convolution2D(32,3, 3, input_shape = (64, 64, 3), activation = 'relu'))

But you can always do both; which one works faster with greatest efficacy?

Here's all of the code in action with ANOTHER convolutional layer!

Goal: CNN with two convolutional layers; get a test set accuracy > 80%. Let's see what happens.

In [36]:
classifier = Sequential() # initializing the CNN as a sequence of layers.

classifier.add(Convolution2D(32,3, 3, input_shape = (64, 64, 3), activation = 'relu'))

classifier.add(MaxPooling2D(pool_size = (2,2)))

classifier.add(Convolution2D(32,3, 3, activation = 'relu')) #2nd Convolutional Layer, doesn't need input_shape param

classifier.add(MaxPooling2D(pool_size = (2,2))) #2nd Convolutional Layer; have to Pool after convolution.

classifier.add(Flatten())

classifier.add(Dense(output_dim = 128, activation = 'relu'))

classifier.add(Dense(output_dim = 1, activation = 'sigmoid'))

classifier.compile(optimizer = 'adam',loss = 'binary_crossentropy', metrics = ['accuracy'])

from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator( 
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

training_set = train_datagen.flow_from_directory(
        'dataset/training_set',
        target_size = (64, 64),
        batch_size = 32,
        class_mode ='binary')

test_set = test_datagen.flow_from_directory(
        'dataset/test_set',
        target_size=(64, 64), 
        batch_size=32,
        class_mode='binary')

classifier.fit_generator(
        training_set,
        steps_per_epoch = 8000,
        epochs = 1,
        validation_data = test_set,
        validation_steps = 2000)

  This is separate from the ipykernel package so we can avoid doing imports until
  import sys
  del sys.path[0]
  from ipykernel import kernelapp as app


Found 8000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.
Epoch 1/1


<keras.callbacks.History at 0xb296343c8>

If you wanted to improve the model even more, then you could choose a higher target_size for the images in the training_set and test_set functions so that you get more information about the pixel patterns.
Because if you increase the size of the images, then you will get more rows and more columns in the input images,
and therefore there will be more information to take on the pixels. 
(GPU Recommended (don't do it to your stock macbook like I did.)

To test an image run the following cell: We use 0.5 as our threshold because by convention thats whats used when
the activation function for the last hidden layer is sigmoid and we only have 2 categorical variables.

In [2]:
import numpy as np
from keras.preprocessing import image
test_image = image.load_img('random.jpg', target_size = (64,64)) #input a number from the dataset for random
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis = 0)
result = classifier.predict(test_image)
training_set.class_indices
if result[0][0] >= 0.5:
    prediction = 'dog'
else: 
    prediction = 'cat'

print(prediction)

FileNotFoundError: [Errno 2] No such file or directory: 'random.jpg'