# Dog vs. Cat with Neural Networks
#### By Phillip Kuznetsov 
_philkuz [at] ml.berkeley.edu_

In this notebook, you will be tasked with classifying whether an image contains a dog or a cat. I've given , but at the end of the day you will be doing most of the work yourself. 

Because you're dealing with image data and potentially large CNNs - your best bet is to try this out on a gpu. Luckily for you ML@B has written a [nice guide on how to set up an AWS server for Deep Learning](https://github.com/mlberkeley/resources-guides/tree/master/AWS). 

An alternative is to use google colab - which alleviates a lot of issues with GPU setup with the exception of getting the data on the machine. [This example notebook](https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/io.ipynb&scrollTo=BaCkyg5CV5jF) has a few ways to do it, the easiest in my mind is to follow the "upload_files()" cell which just has you directly upload from your file system.

As a note you need to download the data from this kaggle competition: https://www.kaggle.com/c/dogs-vs-cats

You'll have to sign in/up for kaggle, but fortunately it's free! 
Then you'll have to unzip the file, rename it to `data`, and move it to the same directory. From there you can run the rest of the code! 

Best of luck! If you encounter any bugs, please open an issue in the [issue tracker](https://github.com/mlberkeley/intro-dl-workshop/issues)! 

In [None]:
%matplotlib inline

In [None]:
!pip install keras pillow h5py

In [5]:
from __future__ import print_function

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Conv2D, MaxPool2D, Flatten
from keras.optimizers import SGD,Adam
import numpy as np
import matplotlib.pyplot as plt
import os
import shutil

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


prepare the data directory for the keras image generator

In [16]:
!unzip -q train.zip && mv train data

In [17]:
data_dir = './data'
train_dir = os.path.join(data_dir, 'train')
val_dir = os.path.join(data_dir, 'val')
val_ratio = 0.2

### split data by classes

In [18]:
classes = ['dog', 'cat']
# split by classes
for c in classes:
    class_dir = os.path.join(data_dir, c)
    if not os.path.exists(class_dir):
        os.mkdir(class_dir)
    for fname in os.listdir(data_dir):
        fpath = os.path.join(data_dir, fname)
        if not os.path.isdir(fpath) and c in fname:
            os.rename(fpath, os.path.join(class_dir, fname))

        

### Make train/val splits of data
This is just data preparation that works nicely with Keras' data loader api. If you're not familiar with what's going on that's ok

In [19]:
from random import shuffle

In [20]:
for d in [train_dir, val_dir]:
    if not os.path.exists(d):
        os.mkdir(d)
        os.mkdir(os.path.join(d, 'dog'))
        os.mkdir(os.path.join(d, 'cat'))
    
for c in classes:
    class_dir = os.path.join(data_dir, c)
    class_train_dir = os.path.join(train_dir,c)
    class_val_dir = os.path.join(val_dir,c)
    files = os.listdir(class_dir)
    shuffle(files)
    cutoff_idx = int(len(files)* ( 1-val_ratio))
    train_files = files[:cutoff_idx]
    val_files = files[cutoff_idx:]
    for f in train_files:
        fpath = os.path.join(class_dir, f)
        if not os.path.isdir(fpath):
            os.rename(fpath, os.path.join(class_train_dir, f))
    for f in val_files:
        fpath = os.path.join(class_dir, f)
        if not os.path.isdir(fpath):
            os.rename(fpath, os.path.join(class_val_dir, f))
    
    os.removedirs(class_dir)


In [None]:
for c in classes:
    class_dir = os.path.join(data)
    if os.path.exists(class_dir)
    

In [None]:
from keras.preprocessing.image import ImageDataGenerator
generator = ImageDataGenerator()
train_data = generator.flow_from_directory(train_dir,
                shuffle=True,
                batch_size=batch_size,
                target_size=(224,224))
val_data = generator.flow_from_directory(val_dir, 
                shuffle=True,
                batch_size=batch_size,
                target_size=(224,224))

### hyperparameters

In [None]:

batch_size=32
learning_rate=1e-4

In [None]:
# from keras.applications import VGG16

In [None]:
model = Sequential()
### You can use the layers in VGG16 (a pretrained network) if you want
### It'll probably train faster.
### Make sure that you uncomment and run the import above
model.add(VGG16(weights='imagenet', include_top=False,input_shape=(224,224,3))) 
model.add(Flatten()) # you must use this layer

# TODO add layers of the model
model.add(Dense(2, activation='softmax'))


In [None]:
# TODO compile the model with an ADAM optimizer (or whatever you choose) and the proper loss fn

In [None]:
model.summary()

In [None]:
# this is how you traine
model.fit_generator(train_data, steps_per_epoch=20000/batch_size, epochs=25, validation_steps=800, validation_data=val_data)

In [None]:
# fit the fancy model 
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])