[View in Colaboratory](https://colab.research.google.com/github/ilmimris/Google_Colaboratory_using_TPU_dogs_vs_cats/blob/master/Google_Colaboratory_using_TPU_dogs_vs_cats.ipynb)

# Import Library

In [1]:
from tensorflow.keras import layers,models
from keras.preprocessing.image import ImageDataGenerator
import tensorflow as tf

Using TensorFlow backend.


In [0]:
import os, cv2, re, random
import numpy as np
from random import shuffle 
from tqdm import tqdm

In [0]:
from keras.preprocessing.image import img_to_array, load_img
from keras import optimizers
from keras import backend as K
from sklearn.model_selection import train_test_split

# Download data from `Kaggle`

In [0]:
os.chdir('/content')

In [5]:
!mkdir .kaggle
!mkdir dogs-vs-cats

mkdir: cannot create directory ‘.kaggle’: File exists
mkdir: cannot create directory ‘dogs-vs-cats’: File exists


In [0]:
api_token = {"username":"ilmimris","key":"09e2cb3454f22a1c7df3e52e532f9383"} # YOUR KAGGLE API KEY 
import json
import zipfile
import os
with open('/content/.kaggle/kaggle.json', 'w') as file:
    json.dump(api_token, file)

In [0]:
!mkdir ~/.kaggle
!cp /content/.kaggle/kaggle.json ~/.kaggle/kaggle.json

In [0]:
!pip install kaggle
os.chdir('/content/dogs-vs-cats')
!kaggle competitions download -c dogs-vs-cats



# Extract data from zip

In [0]:
!rm -rf sampleSubmission.csv train test1

In [0]:
for file in os.listdir():
    zip_ref = zipfile.ZipFile(file, 'r')
    zip_ref.extractall()
    zip_ref.close()

# Load data

In [0]:
img_width = 150
img_height = 150
TRAIN_DIR = './train/'
TEST_DIR = './test1/'
train_images_dogs_cats = [TRAIN_DIR+i for i in os.listdir(TRAIN_DIR)] # use this for full dataset
test_images_dogs_cats = [TEST_DIR+i for i in os.listdir(TEST_DIR)]

Helper function to sort the image files based on the numeric value in each file name.

In [0]:
def atoi(text):
    return int(text) if text.isdigit() else text

def natural_keys(text):
    return [ atoi(c) for c in re.split('(\d+)', text) ]

Sort the traning set. Use 1300 images each of cats and dogs instead of all 25000 to speed up the learning process.

Sort the test set

In [0]:
train_images_dogs_cats.sort(key=natural_keys)
train_images_dogs_cats = train_images_dogs_cats[0:1300] + train_images_dogs_cats[12500:13800] 

test_images_dogs_cats.sort(key=natural_keys)

Now the images have to be represented in numbers. For this, using the openCV library read and resize the image.

Generate labels for the supervised learning set.

Below is the helper function to do so.

In [0]:
def prepare_data(list_of_images):
    """
    Returns two arrays: 
        x is an array of resized images
        y is an array of labels
    """
    x = [] # images as arrays
    y = [] # labels
    
    for image in list_of_images:
        x.append(cv2.resize(cv2.imread(image), (img_width,img_height), interpolation=cv2.INTER_CUBIC))
    
    for i in list_of_images:
        if 'dog' in i:
            y.append(1)
        elif 'cat' in i:
            y.append(0)
        #else:
            #print('neither cat nor dog name present in images')
            
    return x, y

Generate X and Y using the helper function above

Since K.image_data_format() is channel_last, input_shape to the first keras layer will be (img_width, img_height, 3). '3' since it is a color image

In [15]:
X, Y = prepare_data(train_images_dogs_cats)
print(K.image_data_format())

channels_last


Split the data set containing 2600 images into 2 parts, training set and validation set. Later, you will see that accuracy and loss on the validation set will also be reported while fitting the model using training set.

In [0]:
# First split the data in two sets, 80% for training, 20% for Val/Test)
X_train, X_val, Y_train, Y_val = train_test_split(X,Y, test_size=0.2, random_state=1)

In [0]:
nb_train_samples = len(X_train)
nb_validation_samples = len(X_val)
batch_size = 16

# Create Model

In [18]:
model = models.Sequential()

model.add(layers.Conv2D(32, (3, 3), input_shape=(img_width, img_height, 3)))
model.add(layers.Activation('relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))

model.add(layers.Conv2D(32, (3, 3)))
model.add(layers.Activation('relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))

model.add(layers.Conv2D(64, (3, 3)))
model.add(layers.Activation('relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))

model.add(layers.Flatten())
model.add(layers.Dense(64))
model.add(layers.Activation('relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1))
model.add(layers.Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 148, 148, 32)      896       
_________________________________________________________________
activation (Activation)      (None, 148, 148, 32)      0         
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 74, 74, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 72, 72, 32)        9248      
_________________________________________________________________
activation_1 (Activation)    (None, 72, 72, 32)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 36, 36, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 34, 34, 64)        18496     
__________

## Generate Augmentation Data

In [0]:
train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

val_datagen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

In [0]:
train_generator = train_datagen.flow(np.array(X_train), Y_train, batch_size=batch_size)
validation_generator = val_datagen.flow(np.array(X_val), Y_val, batch_size=batch_size)

## Convert to TPU MODEL

In [21]:
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
    model,
    strategy=tf.contrib.tpu.TPUDistributionStrategy(
        tf.contrib.cluster_resolver.TPUClusterResolver(tpu="grpc://" + os.environ['COLAB_TPU_ADDR'])))

INFO:tensorflow:Querying Tensorflow master (b'grpc://10.112.192.154:8470') for TPU system metadata.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 3340581412962874290)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 3148344655408965577)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 12528470306301961677)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 13923652748522337962)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 16460352434833523640)
INFO:tensorflow:*** Available Device: _Dev

# Training Model

In [22]:
history = tpu_model.fit_generator(
    train_generator, 
    steps_per_epoch=nb_train_samples // batch_size,
    epochs=30,
    validation_data=validation_generator,
    validation_steps=nb_validation_samples // batch_size
)

Epoch 1/30
INFO:tensorflow:New input shapes; (re-)compiling: mode=train, [TensorSpec(shape=(2, 150, 150, 3), dtype=tf.float32, name='conv2d_input0'), TensorSpec(shape=(2, 1), dtype=tf.float32, name='activation_4_target_10')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Remapping placeholder for conv2d_input
INFO:tensorflow:Cloning RMSprop {'lr': 0.0010000000474974513, 'rho': 0.8999999761581421, 'decay': 0.0, 'epsilon': 1e-07}
INFO:tensorflow:Get updates: Tensor("loss/mul:0", shape=(), dtype=float32)
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 5.363210916519165 secs
INFO:tensorflow:Setting weights on TPU model.
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Remapping placeholder for conv2d_input
INFO:tensorflow:Cloning RMSprop {'lr': 0.0010000000474974513, 'rho': 0.8999999761581421, 'decay': 0.0, 'epsilon': 1e-07}
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 4.3816156387329

In [23]:
tpu_model.save_weights('model_weights.h5')
tpu_model.save('model_keras.h5')

INFO:tensorflow:Copying TPU weights to the CPU
INFO:tensorflow:Copying TPU weights to the CPU


# Acknowledgment 
Thanks to [sarvajna](https://www.kaggle.com/sarvajna/dogs-vs-cats-keras-solution)