# Image Preprocessing and Binary Classification with Keras

## Objective
In this week's exercise, you will:
1. Learn how to image preprocessing in keras.
2. Build and train a multilayer neural network for binary classification on a real-world dataset of cats and dogs.

---

## Step 1: Import Libraries
Let's start by importing the necessary libraries.


In [1]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout


---

## Step 2: Load and Preprocess the Data
We will use the Keras `ImageDataGenerator` for image augmentation and preprocessing.
First, unzip the uploaded dataset.


In [6]:
!unzip -q dogs-vs-cats/train.zip
!unzip -q dogs-vs-cats/test1.zip

In [5]:
# Install opendatasets
#!pip install opendatasets --quiet

# Import the library
import opendatasets as od

# Download the Cats and Dogs dataset from Kaggle
# Replace `kaggle-url` with the actual Kaggle URL for the dataset
od.download("https://www.kaggle.com/c/dogs-vs-cats/data")

Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: maciejlipski
Your Kaggle Key: ··········
Downloading dogs-vs-cats.zip to ./dogs-vs-cats


100%|██████████| 812M/812M [00:11<00:00, 71.7MB/s]



Extracting archive ./dogs-vs-cats/dogs-vs-cats.zip to ./dogs-vs-cats


## Step 3: Learn about undersampling and implement it
Research online what undersampling and random undersampling is. It is a very powerful technique used often in machine Learning. Find out when it is used and undersample your dataset using "random undersampling"


In [7]:
import os
import numpy as np
import pandas as pd
from sklearn.utils import resample
from glob import glob

# Define the path to the dataset (replace with your path)
data_dir = 'train'

# Load all image paths
image_paths = glob(os.path.join(data_dir, '*.jpg'))

# Create a DataFrame with image paths and labels
df_cats_vs_dogs = pd.DataFrame({
    'image_path': image_paths,
    'label': ['dog' if 'dog' in os.path.basename(path) else 'cat' for path in image_paths]
})

# Check initial class distribution
print("Initial class distribution:")
print(df_cats_vs_dogs['label'].value_counts())

# Separate majority and minority classes
majority_class = df_cats_vs_dogs[df_cats_vs_dogs['label'] == 'dog']
minority_class = df_cats_vs_dogs[df_cats_vs_dogs['label'] == 'cat']

# Perform random undersampling on the majority class
majority_undersampled = resample(
    majority_class,
    replace=False,
    n_samples=len(minority_class),  # Match the number of samples in the minority class
    random_state=42
)

# Combine the undersampled majority class with the minority class
df_undersampled = pd.concat([majority_undersampled, minority_class])

# Shuffle the dataset
df_undersampled = df_undersampled.sample(frac=1, random_state=42).reset_index(drop=True)

# Check the new class distribution
print("Class distribution after undersampling:")
print(df_undersampled['label'].value_counts())

# Save the file paths and labels to a CSV if needed
df_undersampled.to_csv('cats_vs_dogs_undersampled.csv', index=False)

Initial class distribution:
label
cat    12500
dog    12500
Name: count, dtype: int64
Class distribution after undersampling:
label
dog    12500
cat    12500
Name: count, dtype: int64


---

## Step 4: Set Up ImageDataGenerator (or well more specifically the new version)
Were Sorry - the videos from the coursera course are sometimes not the most up to date. In this case the 'ImageDataGenerator' function is deprecated (look here https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator) and will be removed in the future versions. The concept behind the new reccomended function is very similar though.
The new reccomendation is loading images with tf.keras.utils.image_dataset_from_directory and transforming the output tf.data.Dataset with preprocessing layers.

You may use Chat GPT for this task and you can also check the following tutorials <br>
https://www.tensorflow.org/tutorials/load_data/images <br>
https://www.tensorflow.org/tutorials/load_data/images <br>
https://www.tensorflow.org/guide/keras/preprocessing_layers <br>

---

## Step 5: Build a Multilayer Neural Network
Now, let's build a multilayer neural network for binary classification.


In [None]:
# TODO build a model

# TODO compile the model


---

## Step 6: Train the Model
Train the model using the Dataset you created


---

## Step 7: Evaluate the Model
After training, you may upload some test images to evaluate your model.


In [None]:
from tensorflow.keras.preprocessing import image
import numpy as np
from google.colab import files

def load_and_predict(model):
    uploaded_files = files.upload()

    for fn in uploaded_files.keys():
        path = '/content/' + fn
        img = image.load_img(path, target_size=(150, 150))

        x = image.img_to_array(img)
        x = np.expand_dims(x, axis=0) / 255.0

        classes = model.predict(x)
        result = "a dog" if classes[0] > 0.5 else "a cat"

        print(f'The model predicts that the image is of {result}')

# Call the function to upload images and get predictions
load_and_predict(model)