<a href="https://colab.research.google.com/github/shuchimishra/Tensorflow_projects/blob/main/Tensorflow_Code/CNN/horses_v_humans_w_Imageaugmentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ungraded Lab: Data Augmentation on the Horses or Humans Dataset

In the previous lab, you saw how data augmentation helped improve the model's performance on unseen data. By tweaking the cat and dog training images, the model was able to learn features that are also representative of the validation data. However, applying data augmentation requires good understanding of your dataset. Simply transforming it randomly will not always yield good results.

In the next cells, you will apply the same techniques to the `Horses or Humans` dataset and analyze the results.

In [None]:
# Download the training set
!wget https://storage.googleapis.com/tensorflow-1-public/course2/week3/horse-or-human.zip

In [None]:
# Download the validation set
!wget https://storage.googleapis.com/tensorflow-1-public/course2/week3/validation-horse-or-human.zip

In [None]:
import os
import zipfile

#Extract the training data
zip_file = 'horse-or-human.zip'
local_handler = zipfile.ZipFile(zip_file,'r')
local_handler.extractall('./horse-or-human')

In [None]:
#Extract the validation data
zip_file = 'validation-horse-or-human.zip'
local_handler = zipfile.ZipFile(zip_file,'r')
local_handler.extractall('./validation-horse-or-human')

#close the handler
local_handler.close()

In [None]:
# Directory with training horse pictures
train_horse_dir = os.path.join('horse-or-human','horses')

# Directory with training human pictures
train_human_dir = os.path.join('horse-or-human','humans')

# Directory with validation horse pictures
validation_horse_dir = os.path.join('validation-horse-or-human','horses')

# Directory with validation human pictures
validation_human_dir = os.path.join('validation-horse-or-human','humans')

In [None]:
print(len(os.listdir(train_horse_dir)))
print(len(os.listdir(train_human_dir)))
print(len(os.listdir(validation_horse_dir)))
print(len(os.listdir(validation_human_dir)))

In [None]:
import tensorflow as tf
from tensorflow import keras
from keras.optimizers import RMSprop

#Function to build the model
def create_model():
  model = keras.models.Sequential([
      # Note the input shape is the desired size of the image 300x300 with 3 bytes color
      # This is the first convolution
      keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(300,300,3)),
      keras.layers.MaxPooling2D((2,2)),
      # The second convolution
      keras.layers.Conv2D(64, (3,3), activation='relu'),
      keras.layers.MaxPooling2D((2,2)),
      # The third convolution
      keras.layers.Conv2D(128, (3,3), activation='relu'),
      keras.layers.MaxPooling2D((2,2)),
      # The fourth convolution
      keras.layers.Conv2D(128, (3,3), activation='relu'),
      keras.layers.MaxPooling2D((2,2)),
      # The fifth convolution
      keras.layers.Conv2D(128, (3,3), activation='relu'),
      keras.layers.MaxPooling2D((2,2)),
      #Dropout layer,
      keras.layers.Dropout(0.3),
      # Flatten the results to feed into a DNN
      keras.layers.Flatten(),
      # 512 neuron hidden layer
      keras.layers.Dense(512, activation='relu'),
      # Only 1 output neuron. It will contain a value from 0-1 where 0 for 1 class ('horses') and 1 for the other ('humans')
      keras.layers.Dense(1, activation='sigmoid')
  ])

  # Set training parameters
  model.compile(optimizer='adam', #works better with image augmentation
              loss='binary_crossentropy',
              metrics='accuracy')

  return model

In [None]:
from keras.preprocessing.image import ImageDataGenerator

#Data Augmentation - Training data
train_datagen = ImageDataGenerator(
    # rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    # shear_range=0.2,
    zoom_range=0.2,
    fill_mode='nearest',
    horizontal_flip=True,
    rescale=1.0/255.0
)

#Data Augmentation - Validation data
validation_datagen = ImageDataGenerator(
    rescale=1.0/255.0
)

# Flow training images in batches of 32 using train_datagen generator
train_generator = train_datagen.flow_from_directory(
    './horse-or-human',     # This is the source directory for training images
    target_size=(300, 300), # All images will be resized to 300x300
    class_mode='binary',    # Since we use binary_crossentropy loss, we need binary labels
    batch_size=32
)

# Flow training images in batches of 32 using validation_datagen generator
validation_generator = validation_datagen.flow_from_directory(
    './validation-horse-or-human',  # This is the source directory for training images
    target_size=(300, 300),         # All images will be resized to 300x300
    class_mode='binary',            # Since we use binary_crossentropy loss, we need binary labels
    batch_size=32
)

In [None]:
# Create the model
model = create_model()

#Initialize the constant EPOCH
EPOCH = 30

#Fit the model
history = model.fit(
    train_generator,
    epochs = EPOCH,
    verbose = 2,
    validation_data = validation_generator,
    # steps_per_epoch = 32,
    # validation_steps = 8
)

In [None]:
# # Constant for epochs
# EPOCHS = 20

# # Train the model
# history = model.fit(
#       train_generator,
#       steps_per_epoch=8,
#       epochs=EPOCHS,
#       verbose=1,
#       validation_data = validation_generator,
#       validation_steps=8)

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.fugure_format = 'retina'

def plot_loss_acc(history):
  #-----------------------------------------------------------
  # Retrieve a list of list results on training and test data
  # sets for each training epoch
  #-----------------------------------------------------------
  acc      = history.history[     'accuracy' ]
  val_acc  = history.history[ 'val_accuracy' ]
  loss     = history.history[    'loss' ]
  val_loss = history.history['val_loss' ]

  epochs   = range(len(acc)) # Get number of epochs

  #------------------------------------------------
  # Plot training and validation accuracy per epoch
  #------------------------------------------------
  plt.plot  ( epochs,     acc, label='Training accuracy' )
  plt.plot  ( epochs, val_acc, label='Validation accuracy' )
  plt.title ('Training and validation accuracy')
  plt.grid()
  plt.legend()
  plt.xlabel("Epochs")
  plt.ylabel("Accuracy")
  plt.figure()

  #------------------------------------------------
  # Plot training and validation loss per epoch
  #------------------------------------------------
  plt.plot  ( epochs,     loss, label='Training loss' )
  plt.plot  ( epochs, val_loss, label='Validation loss' )
  plt.grid()
  plt.legend()
  plt.xlabel("Epochs")
  plt.ylabel("Loss")
  plt.title ('Training and validation loss'   )


In [None]:
# Plot training results
plot_loss_acc(history)

In [None]:
model.summary()

In [None]:
no_of_gpu =len(tf.config.experimental.list_physical_devices('GPU'))
print("Total GPUS: ", no_of_gpu)


In [None]:
from tensorflow.python.client import device_lib
tf.test.gpu_device_name()
device_lib.list_local_devices()

As you can see in the results, the preprocessing techniques used in augmenting the data did not help much in the results. The validation accuracy is fluctuating and not trending up like the training accuracy. This might be because the additional training data generated still do not represent the features in the validation data. For example, some human or horse poses in the validation set cannot be mimicked by the image processing techniques that `ImageDataGenerator` provides. It might also be that the background of the training images are also learned so the white background of the validation set is throwing the model off even with cropping. Try looking at the validation images in the `tmp/validation-horse-or-human` directory (note: if you are using Colab, you can use the file explorer on the left to explore the images) and see if you can augment the training images to match its characteristics. If this is not possible, then at this point you can consider other techniques and you will see that in next week's lessons.