# **Project 3: Hot Dog Not Dog**


In [None]:
from google.colab import drive

# Mount the shared Google Drive folder
drive.mount('/content/gdrive', force_remount=True)

# Get the folder ID from the URL
folder_id = '1RdqEY7EuOLFJiNlVzFkAfuFKJ8si82bb'

# Construct the path to the shared folder
shared_folder_path = '/content/gdrive/My Drive/' + folder_id

# Change the current working directory to the shared folder
%cd '{shared_folder_path}'

## **Import the necessary libraries**

In [None]:
from PIL import Image
import pandas as pd
import requests
import numpy as np
import os
import matplotlib.pyplot as plt

# TensorFlow modules
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam


# scikit-learn modules
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

# Library for randomly selecting data points
import random

## **Load the dataset**

- Let us now load the dataset that is available as a .h5 file.
- Split the data into the train and the test dataset.

In [None]:
# Define a Function to Load Images and Labels
# This function walks through the directory structure, loads the images, and extracts labels from the directory names:

def load_images_and_labels(base_dir):
    images = []
    labels = []
    
    # Walk through the directory
    for dirpath, dirnames, filenames in os.walk(base_dir):
        # Extract the label from the last part of dirpath
        label = os.path.basename(dirpath)
        
        for file in filenames:
            # Check for .jpg files
            if file.endswith('.jpg'):
                file_path = os.path.join(dirpath, file)
                # Open and convert the image to RGB
                image = Image.open(file_path).convert('RGB')
                images.append(image)
                labels.append(label)
    
    return images, labels

In [None]:
# Call the Function with the Directory
base_directory = ''
images, labels = load_images_and_labels(base_directory)

In [None]:
# Print a random image from the list to ensure the import was successful
images[40]

Check the number of images in the training and the testing dataset.

In [None]:
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=10)

**Observation:**

## **Visualizing images**

- Use X_train to visualize the first 10 images.
- Use Y_train to print the first 10 labels.

In [None]:
# Visualizing the first 10 images in the dataset and printing their classification labels

categories = np.unique(y_train)

# Set the number of columns and rows for the grid
cols = 3
rows = 3

# Create a 3x3 grid for displaying the first 9 images
for i in range(rows):
    for j in range(cols):
        if i * cols + j >= 9:  # Display only the first 9 images
            break
        random_index = i * cols + j
        ax = plt.subplot(rows, cols, i * cols + j + 1)
        ax.imshow(X_train[random_index], cmap='gray')
        ax.set_title(categories[y_train[random_index]])
        plt.axis('off')

plt.show()

## **Data preparation**

- Print the shape and the array of pixels for the first image in the training dataset.
- Normalize the train and the test dataset by dividing by 255.
- Print the new shapes of the train and the test dataset.
- One-hot encode the target variable.

In [None]:
# Get all the sizes into a list, then convert to a set
sizes = set([img.size for img in images])
sizes

In [None]:
# Determine the number of sets of different sizes. 
len(sizes)

In [None]:
# Resize all images

def resize_images(images, new_size=(256, 256)):
    resized_images = []
    for img in images:
        # Resize the image using the LANCZOS filter for high-quality downsampling
        img = img.resize(new_size, Image.Resampling.LANCZOS)
        img = img.convert('RGB')
        resized_images.append(img)
    return resized_images

# Apply the resize function
images_resized = resize_images(images, new_size=(256, 256))

In [None]:
# Verify the resizing of all images
# Get all the sizes into a list, then convert to a set
sizes = set([img.size for img in images_resized])
sizes

In [None]:
# Convert all images to floating point numpy arrays
float_images = [np.array(img).astype(np.float32) for img in images_resized]

# Display the pixel values of the first image
print("Pixel Values:")
print(float_images[0])

### **Normalize the train and the test data**

In [None]:
# Normalizing the image pixel inputs
X_train = X_train / 255.0
X_test = X_test / 255.0

In [None]:
# Displaying the first normalized training image
print('Test Dataset:', X_test.shape, y_test.shape)
print('Training Dataset:', X_train.shape, y_train.shape)

In [None]:
# Displaying the first normalized testing image and its pixel values
print("Shape of the first normalized testing image:", X_test[0].shape)
print("Pixel values of the first normalized testing image:")
print(X_test[0])

### **Encode Labels**

In [None]:
# Encode the y data
# Initialize the LabelEncoder
label_encoder = LabelEncoder()

# Fit label encoder and return encoded labels
encoded_labels = label_encoder.fit_transform()

**Observation:**


# Augmentation

With 500 images of "hotdog" and 500 images of "not hotdog," we have a decently sized dataset to start with, especially considering the balance between the classes. The need to augment the dataset to increase its size depends on several factors:

1. **Model Complexity**
If you are using a complex model or deep neural network, these typically require large amounts of data to generalize well without overfitting. In such cases, even a dataset of 1,000 images might be insufficient, and data augmentation could help by artificially expanding the diversity and size of your training data.

2. **Performance Goals**
Consider your performance metrics and goals. If initial training results are not satisfactory or if the model performs well on training data but poorly on validation data (a sign of overfitting), then data augmentation might be necessary to improve the model’s ability to generalize.

3. **Variability in Data**
Data augmentation is particularly useful when you need the model to be robust against variations in inputs that are not well-represented in your dataset. For example, if your "hotdog" images are all taken under similar lighting conditions or from similar angles, your model might not perform well when presented with "hotdog" images under different conditions. Augmenting data to include transformed images (e.g., different rotations, lighting conditions, and crops) can help the model learn to recognize the key features of "hotdog" and "not hotdog" under a broader range of conditions.

4. **Computational Resources**
More data typically requires more computational power and longer training times. If resources are limited, you might start with your existing dataset to see how well you can optimize the model and consider augmentation if improvements are needed and computationally feasible.



In [None]:
# Apply augmentation to the whole training dataset
# Create an ImageDataGenerator
datagen = ImageDataGenerator(
    rotation_range=20,      # Random rotation (degrees)
    width_shift_range=0.1,  # Random horizontal shift
    height_shift_range=0.1, # Random vertical shift
    shear_range=0.2,        # Shear intensity
    zoom_range=0.2,         # Random zoom
    horizontal_flip=True,   # Random horizontal flip
    vertical_flip=False,    # No vertical flip for face images
    fill_mode='nearest'     # Fill mode for handling newly created pixels
)

# Create variables to hold the X and y training data
X_train_aug = []
y_train_aug = []

# Loop through all the images.
for i in range(len(X_train)):
    # Select the image
    img = X_train[i]
    # Select the label from the training data
    label = y_train[i]
    
    # Add a channel dimension for grayscale images
    img = np.expand_dims(img, axis=-1)  # Add channel dimension

    # Ensure that the input data has the correct shape
    img = np.expand_dims(img, axis=0)  # Add batch dimension

    # Add 5 images for every original image
    for j in range(5):
        # Append a new image to the X list
        X_train_aug.append(datagen.flow(img, batch_size=1).next()[0])
        # Append the label for the original image to the y list
        y_train_aug.append(label)

# Print the length of each list
print(len(X_train_aug))
print(len(y_train_aug))


In [None]:
# Reshape test data for the model
X_test_np = []
for img in X_test:
    # Add a channel dimension for grayscale images
    img = np.expand_dims(img, axis=-1)  # Add channel dimension
    # Append the image to the list
    X_test_np.append(img)

# Convert to numpy array
X_test_np = np.array(X_test_np)

# Check the shape of the first image
X_test_np[0].shape

## **Model Building**

**ANN model**

In [None]:
### Fix the seed for random number generators
np.random.seed(42)
random.seed(42)
tf.random.set_seed(42)

### **Model Architecture**
- Write a function that returns a sequential model with the following architecture:
 - First hidden layer with **64 nodes and the relu activation** and the **input shape = (1024, )**
 - Second hidden layer with **32 nodes and the relu activation**
 - Output layer with **activation as 'softmax' and number of nodes equal to the number of classes, i.e., 10**
 - Compile the model with the **loss equal to categorical_crossentropy, optimizer equal to Adam(learning_rate = 0.001), and metric equal to 'accuracy'**. Do not fit the model here, just return the compiled model.
- Call the nn_model_1 function and store the model in a new variable. 
- Print the summary of the model.
- Fit on the train data with a **validation split of 0.2, batch size = 128, verbose = 1, and epochs = 20**. Store the model building history to use later for visualization.

### **Build and train an ANN model as per the above mentioned architecture.**

#### Transformer-based Model

In [None]:
from transformers import ViTForImageClassification
from transformers import ViTFeatureExtractor

# Initialize the feature extractor
feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224-in21k')

# Prepare the images for the transformer model
def prepare_images_for_vit(images):
    return [feature_extractor(images[i], return_tensors='tf').pixel_values[0] for i in range(len(images))]

# Prepare training and testing datasets
X_train_vit = prepare_images_for_vit(X_train)
X_test_vit = prepare_images_for_vit(X_test)

# Load the pre-trained Vision Transformer model
vit_model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224-in21k', num_labels=2)

# Compile the model
vit_model.compile(optimizer=Adam(learning_rate=0.0001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Fit the transformer model on the training data
history_vit = vit_model.fit(X_train_vit, y_train, epochs=5, batch_size=16, validation_split=0.1)

In [None]:
# Evaluate the model on the test data
test_loss, test_accuracy = vit_model.evaluate(X_test_vit, y_test)

# Print the results
print(f"Test Loss: {test_loss}")
print(f"Test Accuracy: {test_accuracy}")

#### CNN Model Building 

In [None]:
# CNN Model Architecture
def build_cnn_model(input_shape, num_classes):
    model = keras.Sequential([
        layers.Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation='relu'),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dense(128, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation='softmax')
    ])
    
    model.compile(loss='categorical_crossentropy',
                  optimizer=Adam(learning_rate=0.001),
                  metrics=['accuracy'])
    return model

# Assuming input_shape is (256, 256, 3) for RGB images and num_classes is 2 for 'hotdog' and 'not hotdog'
cnn_model = build_cnn_model((256, 256, 3), 2)
cnn_model.summary()

# Fit the CNN model on the training data
history_cnn = cnn_model.fit(X_train, y_train, batch_size=32, epochs=10, validation_split=0.2)

In [None]:
# Evaluate the model on the test data
test_loss, test_accuracy = cnn_model.evaluate(X_test, y_test)

# Print the results
print(f"Test Loss: {test_loss}")
print(f"Test Accuracy: {test_accuracy}")

### **Plot the Training and Validation Accuracies and write down your Observations.**

**Observations:_______**

Let's build one more model with higher complexity and see if we can improve the performance of the model. 

First, we need to clear the previous model's history from the Keras backend. Also, let's fix the seed again after clearing the backend.

### **Second Model Architecture**
- Write a function that returns a sequential model with the following architecture:
 - First hidden layer with **256 nodes and the relu activation** and the **input shape = (1024, )**
 - Second hidden layer with **128 nodes and the relu activation**
 - Add the **Dropout layer with the rate equal to 0.2**
 - Third hidden layer with **64 nodes and the relu activation**
 - Fourth hidden layer with **64 nodes and the relu activation**
 - Fifth hidden layer with **32 nodes and the relu activation**
 - Add the **BatchNormalization layer**
 - Output layer with **activation as 'softmax' and number of nodes equal to the number of classes, i.e., 10**
 -Compile the model with the **loss equal to categorical_crossentropy, optimizer equal to Adam(learning_rate = 0.0005), and metric equal to 'accuracy'**. Do not fit the model here, just return the compiled model.
- Call the nn_model_2 function and store the model in a new variable.
- Print the summary of the model.
- Fit on the train data with a **validation split of 0.2, batch size = 128, verbose = 1, and epochs = 30**. Store the model building history to use later for visualization.

### **Build and train the new ANN model as per the above mentioned architecture**

### **Plot the Training and Validation Accuracies and write down your Observations.**

**Observations:_______**

## **Predictions on the test data**

- Make predictions on the test set using the second model.
- Print the obtained results using the classification report and the confusion matrix.
- Final observations on the obtained results.

**Note:** Earlier, we noticed that each entry of the target variable is a one-hot encoded vector but to print the classification report and confusion matrix, we must convert each entry of y_test to a single label.

### **Print the classification report and the confusion matrix for the test predictions. Write your observations on the final results.**

**Final Observations:__________**

## **Using Convolutional Neural Networks**

### **Load the dataset again and split the data into the train and the test dataset.**

Check the number of images in the training and the testing dataset.

**Observation:**


## **Data preparation**

- Print the shape and the array of pixels for the first image in the training dataset.
- Reshape the train and the test dataset because we always have to give a 4D array as input to CNNs.
- Normalize the train and the test dataset by dividing by 255.
- Print the new shapes of the train and the test dataset.
- One-hot encode the target variable.

Reshape the dataset to be able to pass them to CNNs. Remember that we always have to give a 4D array as input to CNNs

Normalize inputs from 0-255 to 0-1

Print New shape of Training and Test

### **One-hot encode the labels in the target variable y_train and y_test.**

**Observation:**


## **Model Building**

Now that we have done data preprocessing, let's build a CNN model.
Fix the seed for random number generators

### **Model Architecture**
- **Write a function** that returns a sequential model with the following architecture:
 - First Convolutional layer with **16 filters and the kernel size of 3x3**. Use the **'same' padding** and provide the **input shape = (32, 32, 1)**
 - Add a **LeakyRelu layer** with the **slope equal to 0.1**
 - Second Convolutional layer with **32 filters and the kernel size of 3x3 with 'same' padding**
 - Another **LeakyRelu** with the **slope equal to 0.1**
 - A **max-pooling layer** with a **pool size of 2x2**
 - **Flatten** the output from the previous layer
 - Add a **dense layer with 32 nodes**
 - Add a **LeakyRelu layer with the slope equal to 0.1**
 - Add the final **output layer with nodes equal to the number of classes, i.e., 10** and **'softmax' as the activation function**
 - Compile the model with the **loss equal to categorical_crossentropy, optimizer equal to Adam(learning_rate = 0.001), and metric equal to 'accuracy'**. Do not fit the model here, just return the compiled model.
- Call the function cnn_model_1 and store the output in a new variable.
- Print the summary of the model.
- Fit the model on the training data with a **validation split of 0.2, batch size = 32, verbose = 1, and epochs = 20**. Store the model building history to use later for visualization.

### **Build and train a CNN model as per the above mentioned architecture.**

### **Plot the Training and Validation Accuracies and Write your observations.**

**Observations:__________**

Let's build another model and see if we can get a better model with generalized performance.

First, we need to clear the previous model's history from the Keras backend. Also, let's fix the seed again after clearing the backend.

### **Second Model Architecture**

- Write a function that returns a sequential model with the following architecture:
 - First Convolutional layer with **16 filters and the kernel size of 3x3**. Use the **'same' padding** and provide the **input shape = (32, 32, 1)**
 - Add a **LeakyRelu layer** with the **slope equal to 0.1**
 - Second Convolutional layer with **32 filters and the kernel size of 3x3 with 'same' padding**
 - Add **LeakyRelu** with the **slope equal to 0.1**
 - Add a **max-pooling layer** with a **pool size of 2x2**
 - Add a **BatchNormalization layer**
 - Third Convolutional layer with **32 filters and the kernel size of 3x3 with 'same' padding**
 - Add a **LeakyRelu layer with the slope equal to 0.1**
 - Fourth Convolutional layer **64 filters and the kernel size of 3x3 with 'same' padding** 
 - Add a **LeakyRelu layer with the slope equal to 0.1**
 - Add a **max-pooling layer** with a **pool size of 2x2**
 - Add a **BatchNormalization layer**
 - **Flatten** the output from the previous layer
 - Add a **dense layer with 32 nodes**
 - Add a **LeakyRelu layer with the slope equal to 0.1**
 - Add a **dropout layer with the rate equal to 0.5**
 - Add the final **output layer with nodes equal to the number of classes, i.e., 10** and **'softmax' as the activation function**
 - Compile the model with the **categorical_crossentropy loss, adam optimizers (learning_rate = 0.001), and metric equal to 'accuracy'**. Do not fit the model here, just return the compiled model.
- Call the function cnn_model_2 and store the model in a new variable.
- Print the summary of the model.
- Fit the model on the train data with a **validation split of 0.2, batch size = 128, verbose = 1, and epochs = 30**. Store the model building history to use later for visualization.

### **Build and train the second CNN model as per the above mentioned architecture.**

### **Plot the Training and Validation accuracies and write your observations.**

**Observations:________**

## **Predictions on the test data**

- Make predictions on the test set using the second model.
- Print the obtained results using the classification report and the confusion matrix.
- Final observations on the obtained results.

### **Make predictions on the test data using the second model.** 

**Note:** Earlier, we noticed that each entry of the target variable is a one-hot encoded vector, but to print the classification report and confusion matrix, we must convert each entry of y_test to a single label.

### **Write your final observations on the performance of the model on the test data.**

**Final Observations:_________**