# Cats vs Dogs

For this workshop you will be building a Convolutional neural network to classify cats vs dogs. You will need to be familiar with the theory of CNNs. Visit our lesson [here](http://caisplusplus.usc.edu/blog/curriculum/lesson7) for more info. Only fill in the TODO sections. 

In [59]:
# Imports, make sure you have cv2 installed!
import os
import numpy as np
from scipy.ndimage import imread
import cv2
import sklearn.utils

# DO NOT CHANGE ANY OF THIS

DATA_PATH = './data/'
TEST_PERCENT = 0.1
# This is just for sake of time. In real situations of course you would use the whole dataset.
SELECT_SUBSET_PERCENT = 0.4

# The cat and dog images are of variable size we have to resize them to all the same size.
# DO NOT CHANGE
RESIZE_WIDTH=32
RESIZE_HEIGHT=32
# We are setting this to be 5 epochs for fast training times. In practice we would have many more epochs. 
EPOCHS = 5

## Load the Data
Load the train and test data sets. Do not modify this code at all. Make sure that your data for cats and dogs images is in ``./data``. You can find that data at https://www.kaggle.com/c/dogs-vs-cats/data

In [60]:
# Lets get started by loading the data.
# Make sure you have the data downloaded to ./data
# To download the data go to https://www.kaggle.com/c/dogs-vs-cats/data and download train.zip

X = []
Y = []

files = os.listdir(DATA_PATH)
# Shuffle so we are selecting about an equal number of dog and cat images.
shuffled_files = sklearn.utils.shuffle(files)
select_count = int(len(shuffled_files) * SELECT_SUBSET_PERCENT)

print('Going to load %i files' % select_count)

subset_files_select = shuffled_files[:select_count]

DISPLAY_COUNT = 1000

for i, input_file in enumerate(subset_files_select):
    if i % DISPLAY_COUNT == 0 and i != 0:
        print('Have loaded %i samples' % i)
        
    img = imread(DATA_PATH + input_file)
    # Resize the images to be the same size.
    img = cv2.resize(img, (RESIZE_WIDTH, RESIZE_HEIGHT), interpolation=cv2.INTER_CUBIC)
    X.append(img)
    if 'cat' == input_file.split('.')[0]:
        Y.append(0.0)
    else:
        Y.append(1.0)
        
X = np.array(X)
Y = np.array(Y)

test_size = int(len(X) * TEST_PERCENT)

test_X = X[:test_size]
test_Y = Y[:test_size]
train_X = X[test_size:]
train_Y = Y[test_size:]

print('Train set has dimensionality %s' % str(train_X.shape))
print('Test set has dimensionality %s' % str(test_X.shape))

# Apply some normalization here.
train_X = train_X.astype('float32')
test_X = test_X.astype('float32')
train_X /= 255
test_X /= 255



Going to load 10000 files
Have loaded 1000 samples
Have loaded 2000 samples
Have loaded 3000 samples
Have loaded 4000 samples
Have loaded 5000 samples
Have loaded 6000 samples
Have loaded 7000 samples
Have loaded 8000 samples
Have loaded 9000 samples
Train set has dimensionality (9000, 32, 32, 3)
Test set has dimensionality (1000, 32, 32, 3)


## Preprocessing
While not necessary for this problem you can go ahead and try some preprocessing steps to try to get higher accuracies.

In [62]:
######################################
#TODO: (Optional)
# Perform any data preprocessing steps



######################################

### Defining the network
Here are some useful resources to help with defining a powerful network.
- Convolution layers (use the 2D convolution) https://keras.io/layers/convolutional/
- Batch norm layer https://keras.io/layers/normalization/
- Layer initializers https://keras.io/initializers/
- Dense layer https://keras.io/layers/core/#dense
- Activation functions https://keras.io/layers/core/#activation
- Regulizers: 
    - https://keras.io/layers/core/#dropout
    - https://keras.io/regularizers/
    - https://keras.io/callbacks/#earlystopping
    - https://keras.io/constraints/

In [66]:
######################################
#TODO:
# Import necessary layers.
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Activation, MaxPooling2D, Dropout, Flatten, BatchNormalization
from keras import optimizers
from keras import losses

######################################
#implementation of the network
model = Sequential()

model.add(Conv2D(kernel_size = 3, input_shape =(32, 32, 3), kernel_initializer='glorot_normal', filters = 32, activation = 'relu'))
          
#add batch normal before each layer that isn't the first layer
model.add(BatchNormalization()) 
model.add(Conv2D(kernel_size = 4, activation = 'relu', kernel_initializer='glorot_normal', filters = 32))


model.add(BatchNormalization())         
model.add(Conv2D( kernel_size = 3, activation = 'relu', kernel_initializer='glorot_normal', filters = 64))
model.add(MaxPooling2D())

model.add(BatchNormalization())         
model.add(Conv2D( kernel_size = 3, activation = 'relu', kernel_initializer='glorot_normal', filters = 64))
model.add(MaxPooling2D())

model.add(BatchNormalization())
model.add(Conv2D( kernel_size = 2, activation = 'relu', kernel_initializer='glorot_normal', filters = 128))


model.add(BatchNormalization()) 
model.add(MaxPooling2D())


model.add(Flatten())

model.add(BatchNormalization())
model.add(Dense(units=512))
model.add(Dropout(.5))
model.add(Dense(units=1))
model.add(Activation('sigmoid'))



######################################
# Define your loss and your objective
optimizer = 'rmsprop'
loss = 'binary_crossentropy'
######################################


model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])

## Train Time
Train the network. Be on the lookout for the validation loss and accuracy. Don't change any of the parameters here except for the batch size.

In [67]:
######################################
#TODO:
# Define the batch size
batch_size = 100
######################################


model.fit(train_X, train_Y, batch_size=batch_size, epochs=EPOCHS, validation_split=0.2, verbose=1, shuffle=True)

Train on 7200 samples, validate on 1800 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x17176f080>

## Test Time
Now it's time to actually test the network. 

Get above **65%**!

In [68]:
loss, acc = model.evaluate(test_X, test_Y, batch_size=batch_size, verbose=1)

print('')
print('Got %.2f%% accuracy' % (acc * 100.))


Got 67.40% accuracy
