<H1>Importing the Library and Installing WGET

Wget is a free software package for retrieving files on the Internet using the most widely used network protocols. Wget is not a pre-installed library, and the following code installs the Python Package through the command/anaconda prompt. This library will be used to download the dataset of 25,000 images from the website.

In [None]:
import sys
!{sys.executable} -m pip install wget

<H2>Using WGET to Download the Dataset

The user is expected to feed the path for the base folder in the provided space. This portion of the code is used for creation of a folder to store the dataset followed by downloading the complete dataset from the website. This is the dataset that we will be training the model on. The Train/Test split can be decided in the upcoming sections.

In [None]:
import os
import wget

def bar_custom(current, total, width=80):
    print("Downloading: %d%% [%d / %d] bytes" % (current / total * 100, current, total))

base_folder_path = r'Insert Base Folder Path Here' #Insert Base Folder Path Here
os.mkdir('Dataset')
dataset = os.path.join(base_folder_path, 'Dataset')

url = "https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip"
wget.download(url, bar = bar_custom, out = "{}\Cats_Vs_Dogs.zip".format(dataset))

<H2>Extracting the Dataset

The dataset downloaded from the website is in the form of a .zip file. So, this section of the notebook is dedicated to extracting the zip file to the parent folder. To manage the storage better, the zip file gets deleted after extraction.

In [None]:
import zipfile

local_zip = "{}\Cats_Vs_Dogs.zip".format(dataset)
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('{}'.format(dataset))
zip_ref.close()

In [None]:
if os.path.exists("{}\Cats_Vs_Dogs.zip".format(dataset)):
    os.remove("{}\Cats_Vs_Dogs.zip".format(dataset))

In [None]:
print(len(os.listdir('{}/PetImages/Cat/'.format(dataset))))
print(len(os.listdir('{}/PetImages/Dog/'.format(dataset))))

<H1>Image Data Generator

Image Data Generator is a function within Tensorflow Preprocessing Library. It can be used to generate batches of tensor image data with real-time data augmentation. Image Data Generators requires that the dataset is labelled and sorted into right folders. Two primary folders within the Dataset folder is required for optimum functioning of the library, namely Training and Testing/Validation. Image Data Generator is particularly useful in cases of multi-class classification problems. It is required that the files are sorted in a certain way before the model is trained.

<H2>Creating Sub-Directories

Creating Sub-Directories for the Dataset to be used within the IDG function.

In [None]:
folders_to_create = [
    '{}/Cats_Vs_Dogs'.format(dataset),
    '{}/Cats_Vs_Dogs/training'.format(dataset),
    '{}/Cats_Vs_Dogs/testing'.format(dataset),
    '{}/Cats_Vs_Dogs/training/Cats'.format(dataset),
    '{}/Cats_Vs_Dogs/training/Dogs'.format(dataset),
    '{}/Cats_Vs_Dogs/testing/Cats'.format(dataset),
    '{}/Cats_Vs_Dogs/testing/Dogs'.format(dataset)
]

for directory in folders_to_create:
    try:
        os.mkdir(directory)
        print(directory, 'Created')
    except:
        print(directory, 'Failed')

<H2>Sorting the Files to Respective Sub-Directories

In [None]:
import random
from shutil import copyfile

def split_data(SOURCE, TRAINING, TESTING, SPLIT_SIZE):
    all_files = []
    
    for file_name in os.listdir(SOURCE):
        file_path = SOURCE + file_name

        if os.path.getsize(file_path):
            all_files.append(file_name)
        else:
            print('{} is being Discarded due to Zero File Size'.format(file_name))
    
    n_files = len(all_files)
    split_point = int(n_files * SPLIT_SIZE)
    
    shuffled = random.sample(all_files, n_files)
    
    train_set = shuffled[:split_point]
    test_set = shuffled[split_point:]
    
    for file_name in train_set:
        copyfile(SOURCE + file_name, TRAINING + file_name)
        
    for file_name in test_set:
        copyfile(SOURCE + file_name, TESTING + file_name)


CAT_SOURCE_DIR = '{}/PetImages/Cat/'.format(dataset)
TRAINING_CATS_DIR = '{}/Cats_Vs_Dogs/training/Cats/'.format(dataset)
TESTING_CATS_DIR = '{}/Cats_Vs_Dogs/training/Cats/'.format(dataset)
DOG_SOURCE_DIR = '{}/PetImages/Dog/'.format(dataset)
TRAINING_DOGS_DIR = '{}/Cats_Vs_Dogs/training/Dogs/'.format(dataset)
TESTING_DOGS_DIR = '{}/Cats_Vs_Dogs/testing/Dogs/'.format(dataset)

split_size = .9
split_data(CAT_SOURCE_DIR, TRAINING_CATS_DIR, TESTING_CATS_DIR, split_size)
split_data(DOG_SOURCE_DIR, TRAINING_DOGS_DIR, TESTING_DOGS_DIR, split_size)

In [None]:
base_dir = r'C:\Users\offic\Convolutional Neural Networks in Tensorflow\Cats vs Dogs'

train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')

# Directory with our training cat/dog pictures
train_cats_dir = os.path.join(train_dir, 'cats')
train_dogs_dir = os.path.join(train_dir, 'dogs')

# Directory with our validation cat/dog pictures
validation_cats_dir = os.path.join(validation_dir, 'cats')
validation_dogs_dir = os.path.join(validation_dir, 'dogs')

In [None]:
train_cat_fnames = os.listdir(train_cats_dir)
train_dog_fnames = os.listdir(train_dogs_dir)

print(train_cat_fnames[:10])
print(train_dog_fnames[:10])

In [None]:
print('Total Training Cat Images :', len(os.listdir(train_cats_dir)))
print('Total Training Dog Images :', len(os.listdir(train_dogs_dir)))

print('Total Validation Cat Images :', len(os.listdir(validation_cats_dir)))
print('Total Validation Dog Images :', len(os.listdir(validation_dogs_dir)))

<H2>Defining the layers of Convolutional Neural Network

In [None]:
import tensorflow as tf

In [None]:
model = tf.keras.models.Sequential([
    # Note the input shape is the desired size of the image 150x150 with 3 bytes color
    tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(256, 256, 3)),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2), 
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'), 
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(128, (3,3), activation='relu'), 
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(256, (3,3), activation='relu'), 
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(512, (3,3), activation='relu'), 
    tf.keras.layers.MaxPooling2D(2,2),
    # Flatten the results to feed into a DNN
    tf.keras.layers.Flatten(), 
    # 512 neuron hidden layer
    tf.keras.layers.Dense(512, activation='relu'), 
    # Only 1 output neuron. It will contain a value from 0-1 where 0 for 1 class ('cats') and 1 for the other ('dogs')
    tf.keras.layers.Dense(1, activation='sigmoid')  
])

In [None]:
model.summary()

In [None]:
from tensorflow.keras.optimizers import RMSprop

model.compile(optimizer=RMSprop(learning_rate=0.001),
              loss='binary_crossentropy',
              metrics = ['accuracy'])

<H2>Setting the ImageDataGenerator Parameters

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# All images will be rescaled by 1./255.
train_datagen = ImageDataGenerator(rescale=1 / 255,
    rotation_range=40,
    width_shift_range=.2,
    height_shift_range=.2,
    shear_range=.2,
    zoom_range=.2,
    horizontal_flip=True,
    fill_mode='nearest.)
test_datagen  = ImageDataGenerator(rescale=1 / 255,
    rotation_range=40,
    width_shift_range=.2,
    height_shift_range=.2,
    shear_range=.2,
    zoom_range=.2,
    horizontal_flip=True,
    fill_mode='nearest)

# --------------------
# Flow training images in batches of 20 using train_datagen generator
# --------------------
train_generator = train_datagen.flow_from_directory(train_dir,
                                                    batch_size=20,
                                                    class_mode='binary',
                                                    target_size=(256, 256))     
# --------------------
# Flow validation images in batches of 20 using test_datagen generator
# --------------------
validation_generator =  test_datagen.flow_from_directory(validation_dir,
                                                         batch_size=20,
                                                         class_mode = 'binary',
                                                         target_size = (256, 256))


In [None]:
history = model.fit(train_generator,
                    validation_data = validation_generator,
                    steps_per_epoch = 100,
                    epochs = 150,
                    validation_steps = 50,
                    verbose = 2)

In [None]:
import matplotlib.pyplot as plt

In [None]:
#-----------------------------------------------------------
# Retrieve a list of list results on training and test data
# sets for each training epoch
#-----------------------------------------------------------
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc)) # Get number of epochs

#------------------------------------------------
# Plot training and validation accuracy per epoch
#------------------------------------------------
plt.plot(epochs, acc, label = 'Training Accuracy', color = 'Red')
plt.plot(epochs, val_acc, label = 'Validation Accuracy', color = 'Blue')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.figure()

#------------------------------------------------
# Plot training and validation loss per epoch
#------------------------------------------------
plt.plot(epochs, loss, label = 'Training Loss', color = 'Red')
plt.plot(epochs, val_loss, label = 'Validation Loss', color = 'Blue')
plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()