## Introduction to our first task: 'Dogs vs Cats'

**to download the keras library:**
* pip install tensorflow-gpu keras

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
PATH = "data/dogscats/"
sz=224
batch_size=64

In [3]:
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
from keras.preprocessing import image
from keras.layers import Dropout, Flatten, Dense
from keras.applications import ResNet50
from keras.models import Model, Sequential
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K
from keras.applications.resnet50 import preprocess_input

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [4]:
train_data_dir = f'{PATH}train'
validation_data_dir = f'{PATH}valid'

### instead of creating a data object, in keras, we need to create data generator

##### 1. Define a data generator(s)
* data augmentation do you want to do
* what kind of normalization do we want to do
* create images from directly looking at it
* create a generator - then generate images from a directory
* tell it what image size, whats the mini-batch size you want
* do the same thing for the validation_generator, do it without shuffling, because then you can’t track how well you are doing

In [5]:
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input,
    shear_range=0.2, zoom_range=0.2, horizontal_flip=True)

test_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)

train_generator = train_datagen.flow_from_directory(train_data_dir,
    target_size=(sz, sz),
    batch_size=batch_size, class_mode='binary')

validation_generator = test_datagen.flow_from_directory(validation_data_dir,
    shuffle=False, ### don't shuffle dataset in validation/test set, if you do, you can't track how well you're doing
    target_size=(sz, sz),
    batch_size=batch_size, class_mode='binary')

Found 23000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.


 **Note: class_mode = 'categorical' for multi-class classification**

##### 2. Make the Keras model
* ResNet50 was used because Keras didn't have ResNet34. This is for comparing applies to apples.
* Make base model.
* Manually freeze/unfreeze the layers

In [6]:
base_model = ResNet50(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(1, activation='sigmoid')(x)

##### 3. Loop through and freeze the layers you want
* You need to compile the model
* Pass the type of optimizer, loss, and metrics

In [7]:
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers: layer.trainable = False ### freeze the base model layers
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])

##### 4. Fit
* Keras expects the size per epoch
* number of workers
* Batch size

In [8]:
%%time
model.fit_generator(train_generator, train_generator.n // batch_size, epochs=3, workers=4,
        validation_data=validation_generator, validation_steps=validation_generator.n // batch_size)

Epoch 1/3
Epoch 2/3
Epoch 3/3
CPU times: user 28min 28s, sys: 41.2 s, total: 29min 9s
Wall time: 15min 39s


<keras.callbacks.History at 0x7fbf7438fbe0>

##### 5. Retrain some of the layers
* lopp through and manually set layers to true or false.

In [9]:
len(model.layers)

177

In [10]:
# make the layers after the 
split_at = 140
for layer in model.layers[:split_at]: layer.trainable = False
for layer in model.layers[split_at:]: layer.trainable = True
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])    

In [11]:
%%time
model.fit_generator(train_generator, train_generator.n // batch_size, epochs=1, workers=3,
        validation_data=validation_generator, validation_steps=validation_generator.n // batch_size)

Epoch 1/1
CPU times: user 12min 41s, sys: 37.2 s, total: 13min 19s
Wall time: 9min 59s


<keras.callbacks.History at 0x7fbf1b8557f0>