# Image Preprocessing and Binary Classification with Keras

## Objective
In this week's exercise, you will:
1. Learn how to image preprocessing in keras.
2. Build and train a multilayer neural network for binary classification on a real-world dataset of cats and dogs.

---

## Step 1: Import Libraries
Let's start by importing the necessary libraries.


In [29]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import os
import PIL
import PIL.Image
from collections import Counter
from imblearn.under_sampling import RandomUnderSampler
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout


---

## Step 2: Load and Preprocess the Data
We will use the Keras `ImageDataGenerator` for image augmentation and preprocessing.
First, unzip the uploaded dataset.


In [2]:
!unzip -q kagglecatsanddogs_5340.zip

## Step 3: Learn about undersampling and implement it
Research online what undersampling and random undersampling is. It is a very powerful technique used often in machine Learning. Find out when it is used and undersample your dataset using "random undersampling"

In [39]:
# undersample your dataset here
import pathlib
data_dir = pathlib.Path("/content/PetImages").with_suffix('')
train_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(180, 180),
  batch_size=100)

val_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  seed=123,
  image_size=(180, 180),
  batch_size=100)

normalization_layer = tf.keras.layers.Rescaling(1./255)
normalized_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
image_batch_train, labels_batch_train = next(iter(normalized_ds))
image_batch_train = image_batch_train.numpy()
labels_batch_train = labels_batch_train.numpy()

randomUnderSampler = RandomUnderSampler(random_state=42)

batch_size, height, width, channels = image_batch_train.shape
image_batch_train_reshaped = image_batch_train.reshape(
    batch_size, height * width * channels)

# Perform random undersampling
randomUnderSampler = RandomUnderSampler(random_state=42)
image_batch_train_sampled, labels_batch_train_sampled = randomUnderSampler.fit_resample(
    image_batch_train_reshaped, labels_batch_train)

# Reshape the images back to 4D
image_batch_train_sampled = image_batch_train_sampled.reshape(
    -1, height, width, channels)
Counter(labels_batch_train_sampled)


Found 25000 files belonging to 2 classes.
Using 20000 files for training.
Found 25000 files belonging to 2 classes.
Using 5000 files for validation.


Counter({0: 39, 1: 39})

---

## Step 4: Set Up ImageDataGenerator (or well more specifically the new version)
Were Sorry - the videos from the coursera course are sometimes not the most up to date. In this case the 'ImageDataGenerator' function is deprecated (look here https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator) and will be removed in the future versions. The concept behind the new reccomended function is very similar though.
The new reccomendation is loading images with tf.keras.utils.image_dataset_from_directory and transforming the output tf.data.Dataset with preprocessing layers.

You may use Chat GPT for this task and you can also check the following tutorials <br>
https://www.tensorflow.org/tutorials/load_data/images <br>
https://www.tensorflow.org/tutorials/load_data/images <br>
https://www.tensorflow.org/guide/keras/preprocessing_layers <br>

In [None]:
# TODO create a dataset using the recommended methods

---

## Step 5: Build a Multilayer Neural Network
Now, let's build a multilayer neural network for binary classification.


In [40]:
# TODO build a model
model = tf.keras.models.Sequential([
		tf.keras.Input(shape=(180,180,3)),
        tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
        tf.keras.layers.MaxPooling2D(2, 2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(256, activation='relu'),
        tf.keras.layers.Dense(2, activation='softmax')
    ])

model.compile(
		optimizer='adam',
		loss='sparse_categorical_crossentropy',
		metrics=['accuracy']
	)
# TODO compile the model


---

## Step 6: Train the Model
Train the model using the Dataset you created


In [41]:
model.fit(image_batch_train_sampled, labels_batch_train_sampled, epochs=10)

Epoch 1/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 4s/step - accuracy: 0.4512 - loss: 22.5349
Epoch 2/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 3s/step - accuracy: 0.5156 - loss: 42.4401
Epoch 3/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 3s/step - accuracy: 0.5312 - loss: 15.5235
Epoch 4/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 3s/step - accuracy: 0.5968 - loss: 3.5953
Epoch 5/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 3s/step - accuracy: 0.4570 - loss: 6.2309
Epoch 6/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 3s/step - accuracy: 0.8026 - loss: 0.6186
Epoch 7/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 4s/step - accuracy: 0.6305 - loss: 1.7651
Epoch 8/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 3s/step - accuracy: 0.8929 - loss: 0.3067
Epoch 9/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s

<keras.src.callbacks.history.History at 0x781eed744100>

---

## Step 7: Evaluate the Model
After training, you may upload some test images to evaluate your model.


In [63]:
from tensorflow.keras.preprocessing import image
import numpy as np
from google.colab import files

def load_and_predict(model):
    uploaded_files = files.upload()

    for fn in uploaded_files.keys():
        path = '/content/' + fn
        img = image.load_img(path, target_size=(180, 180))

        x = image.img_to_array(img)
        x = np.expand_dims(x, axis=0) / 255.0

        classes = model.predict(x)
        print(classes)
        result = "a dog" if classes[0][1] > 0.5 else "a cat"

        print(f'The model predicts that the image is of {result}')

# Call the function to upload images and get predictions
load_and_predict(model)

Saving 10066.jpg to 10066 (3).jpg
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 95ms/step
[[0.9413221  0.05867789]]
The model predicts that the image is of a cat
