# (PyTorch) Notebook 1B. Data Augmentation for Convolutional Neural Networks (CNN)
---
Organized and prepared by Christopher Monterola, updated by Kenneth Co.

This notebook was conceptualized, organized, and primarily prepared for the **Machine Learning** courses.

### This notebook uses the following references:
1. Python Machine Learning, Second Edition, Sebastian Raschka and Vahid Mirjalili, Packt Publishing Ltd. Birmingham B3 2PB, UK Sept 2017.
2. Hands-On Machine Learning with Scikit-Learn and TensorFlow, Aurélien Géron, O'Reilly 2017.
3. Deep Learning with Python, Francois Chollet, Manning New York 2018.
4. 2018 Google: https://colab.research.google.com/github/google/eng-edu/blob/master/ml/pc/exercises/image_classification_part1.ipynb

Here we illustrate how data augmentation using Keras for CNN can improve accuracy (avoid overfitting). We use the 2013 Cats vs. Dogs dataset to illustrate the methodology.

## Google Colab Setup

In [None]:
# !pip install -U keras
# !pip install -U tensorflow

In [1]:
# from google.colab import drive
# drive.mount('/content/drive')
# DATA_DIR = '/content/drive/MyDrive/COSCI224 Machine Learning 3 Notebooks/data/'
# IMG_DIR = '/content/drive/MyDrive/COSCI224 Machine Learning 3 Notebooks/images/'
# MODEL_DIR = '/content/drive/MyDrive/COSCI224 Machine Learning 3 Notebooks/models/'

DATA_DIR = 'data/'
IMG_DIR = 'images/'
MODEL_DIR = 'models/'

# The Dogs vs. Cats Dataset (Kaggle 2013)

The Dogs vs. Cats dataset that you’ll use isn’t packaged with Keras. It was made available by Kaggle as part of a computer-vision competition in late 2013, back when CNN weren’t mainstream. You can download the original dataset from [here](https://www.kaggle.com/c/dogs-vs-cats/data).

The pictures are medium-resolution color JPEGs, and here are some examples:

![cats_vs_dogs_samples](https://user-images.githubusercontent.com/25600601/134775470-7cf33e7e-f2d0-4a89-85b1-4963bf1c99da.jpg)

Unsurprisingly, the dogs-vs-cats Kaggle competition in 2013 was won by entrants who used CNN. The best entries achieved up to 95% accuracy. In this example, you’ll get up to $83\%$ accuracy for test set even though you’ll train your models on less than 10% of the data that was available to the competitors.

This dataset contains 25,000 images of dogs and cats (12,500 from each class) and is 543 MB (compressed). After downloading and uncompressing it, you’ll create a new dataset containing three subsets: a training set with 1,000 samples of each class, a validation set with 500 samples of each class, and a test set with 500 samples of each class.


# Step 1. Get the data
- Copy images to training, validation, and test directories

Use these commands if you want download and run on your own directory

In [2]:
##import os, shutil

#The path to the directory where the original
## dataset was uncompressed
#original_dataset_dir = DATA_DIR +'dogs_vs_cats/train'

#The directory where we will
## store our smaller dataset
#base_dir = 'dogs_vs_cats/results/'
#os.makedirs(base_dir, exist_ok=True)

In [3]:
# # Use this to download the data
# !wget --no-check-certificate \
#     https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip \
#     -O /tmp/cats_and_dogs_filtered.zip

In [4]:
import os
import zipfile

local_zip = 'cats_and_dogs_filtered.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('/tmp')
zip_ref.close()

In [5]:
base_dir = '/tmp/cats_and_dogs_filtered'
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')

# Directory with our training cat pictures
train_cats_dir = os.path.join(train_dir, 'cats')

# Directory with our training dog pictures
train_dogs_dir = os.path.join(train_dir, 'dogs')

# Directory with our validation cat pictures
validation_cats_dir = os.path.join(validation_dir, 'cats')

# Directory with our validation dog pictures
validation_dogs_dir = os.path.join(validation_dir, 'dogs')

Here are now the filenames of the `cats` and `dogs` `train` directories (file naming conventions are the same in the `validation` directory):

In [6]:
train_cat_fnames = os.listdir(train_cats_dir)
print(train_cat_fnames[:10])

train_dog_fnames = os.listdir(train_dogs_dir)
train_dog_fnames.sort()
print(train_dog_fnames[:10])



In [7]:
print('total training cat images:', len(os.listdir(train_cats_dir)))



In [8]:
print('total training dog images:', len(os.listdir(train_dogs_dir)))



In [9]:
print('total validation cat images:', len(os.listdir(validation_cats_dir)))



In [10]:
print('total validation dog images:', len(os.listdir(validation_dogs_dir)))



Let's look at the Dogs and Cat Images

In [11]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

# Parameters for our graph; we'll output images in a 4x4 configuration
nrows = 4
ncols = 4

# Index for iterating over images
pic_index = 0

In [12]:
# Set up matplotlib fig, and size it to fit 4x4 pics
fig = plt.gcf()
fig.set_size_inches(ncols * 4, nrows * 4)

pic_index += 8
next_cat_pix = [os.path.join(train_cats_dir, fname)
                for fname in train_cat_fnames[pic_index-8:pic_index]]
next_dog_pix = [os.path.join(train_dogs_dir, fname)
                for fname in train_dog_fnames[pic_index-8:pic_index]]

for i, img_path in enumerate(next_cat_pix+next_dog_pix):
  # Set up subplot; subplot indices start at 1
  sp = plt.subplot(nrows, ncols, i + 1)
  sp.axis('Off') # Don't show axes (or gridlines)

  img = mpimg.imread(img_path)
  plt.imshow(img)

plt.show()



So you do indeed have 2,000 training images, 1,000 validation images, and 1,000 test images. Each split contains the same number of samples from each class: this is a balanced binary-classification problem, which means classification accuracy will be an appropriate measure of success.

# Step 2. Build your NN
In the previous example, we built a small CNN for MNIST. You’ll reuse the same general structure: the CNN will be a stack of alternated Conv2D (with relu activation) and MaxPooling2D layers. But because you’re dealing with bigger images and a more complex problem (previously we used MNIST with $28 \times 28$ size and single channel), you’ll have to make your network larger.

Accordingly, it will have one more Conv2D + MaxPooling2D stage. This serves both to augment the capacity of the network and to further reduce the size of the feature maps so they aren’t overly large when you reach the Flatten layer. Here, because you start from inputs of size 150 × 150 (a somewhat arbitrary choice), you end up with feature maps of size 7 × 7 just before the Flatten layer.

<div class="alert alert-block alert-success">

**Note:** The depth of the feature maps progressively increases in the network (from 32 to 128), whereas the size of the feature maps decreases (from 148 × 148 to 7 × 7).

This is a pattern you’ll see in almost all CNN. Because you’re attacking a binary-classification problem, you’ll end the network with a single unit (a dense layer of size 1) and a threshold function (we will be using sigmoid activation). This unit will encode the probability that the network is looking at one class or the other.

</div>

## Instantiate a small CNN for Dogs vs. Cats

In [13]:
import torch
import torch.nn as nn

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=(3, 3), padding=0) # Added padding to match Keras' default behavior
        self.relu1 = nn.ReLU()
        self.pool1 = nn.MaxPool2d(kernel_size=(2, 2))
        self.conv2 = nn.Conv2d(32, 64, kernel_size=(3, 3), padding=0)
        self.relu2 = nn.ReLU()
        self.pool2 = nn.MaxPool2d(kernel_size=(2, 2))
        self.conv3 = nn.Conv2d(64, 128, kernel_size=(3, 3), padding=0)
        self.relu3 = nn.ReLU()
        self.pool3 = nn.MaxPool2d(kernel_size=(2, 2))
        self.conv4 = nn.Conv2d(128, 128, kernel_size=(3, 3), padding=0)
        self.relu4 = nn.ReLU()
        self.pool4 = nn.MaxPool2d(kernel_size=(2, 2))
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(6272, 512) # Calculated the input size to the first dense layer
        self.relu5 = nn.ReLU()
        self.fc2 = nn.Linear(512, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.pool1(self.relu1(self.conv1(x)))
        x = self.pool2(self.relu2(self.conv2(x)))
        x = self.pool3(self.relu3(self.conv3(x)))
        x = self.pool4(self.relu4(self.conv4(x)))
        x = self.flatten(x)
        x = self.relu5(self.fc1(x))
        x = self.sigmoid(self.fc2(x))
        return x

Let’s look at how the dimensions of the feature maps change with every successive layer:

In [14]:
# !pip install -U torch --force-reinstall

In [15]:
# !pip install -U torchinfo

In [16]:
from torchinfo import summary

model = Net()
summary(model, input_size=(1, 3, 150, 150), col_names=["input_size", "output_size", "num_params"], depth=4)



For the compilation step, you’ll go with the RMSprop optimizer, as usual. Because you ended the network with a single sigmoid unit, you’ll use binary crossentropy as the loss.

# Step 3. Data preprocessing

As you know by now, data should be formatted into appropriately preprocessed floating point tensors before being fed into the network. Currently, the data sits on a drive as JPEG files, so the steps for getting it into the network are roughly as follows:

1. Read the picture files.   
2. Decode the JPEG content to RGB grids of pixels.   
3. Convert these into floating-point tensors.   
4. Rescale the pixel values (between 0 and 255) to the [0, 1] interval (neural networks prefer to deal with small input values).

In [17]:
# pip show torch torchvision

In [18]:
# %pip install torch==2.5.0 torchvision==0.20.0 --force-reinstall
%pip install torch==2.0.0 torchvision==0.15.0 --force-reinstall



In [19]:
from torchvision import transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader

# Define the batch size
BATCH_SIZE = 25

# Define the transformations for the training data
train_transform = transforms.Compose([
    transforms.Resize((150, 150)),  # Resize to the target size
    transforms.ToTensor(),  # Convert to a PyTorch tensor
])

# Define the transformations for the validation data
validation_transform = transforms.Compose([
    transforms.Resize((150, 150)),
    transforms.ToTensor(),
])

# Create the training & validation dataset
train_dataset = ImageFolder(root=train_dir, transform=train_transform)
validation_dataset = ImageFolder(root=validation_dir, transform=validation_transform)

# Create the training & validation dataloaders
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=4, pin_memory=True)
validation_loader = DataLoader(validation_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=4, pin_memory=True)

# Optional: Verify the class mapping (important for consistency with Keras)
print("Class mapping:", train_dataset.class_to_idx)



Let’s look at the output of one of these generators: it yields batches of 150 × 150 RGB images (shape (25, 150, 150, 3)) and binary labels (shape (25,)). There are 25 samples in each batch (the batch size). Note that the generator yields these batches indefinitely: it loops endlessly over the images in the target folder.

For this reason, you need to break the iteration loop at some point:    

In [20]:
# Example of how to iterate through the data loaders
for images, labels in train_loader:
    print("Image batch shape:", images.shape)
    print("Labels batch shape:", labels.shape)
    break  # Just to show one batch



Let’s fit the model to the data using the generator. It expects as its first argument a Python generator that will yield batches of inputs and targets indefinitely, like this one does. Because the data is being generated, the Keras model needs to know how many samples to draw from the generator before declaring an epoch to be over.

This is the role of the `steps_per_epoch` argument: after having drawn `steps_per_epoch` batches from the generator—that is, after having run for `steps_per_epoch` gradient descent steps—the fitting process will go to the next epoch. In this case, batches are 25 samples, so it will take 80 batches until you see your target of 2,000 samples.    

When using `fit`, you can pass a `validation_data` argument, much as with the fit method. It’s important to note that this argument is allowed to be a data generator, but it could also be a tuple of Numpy arrays. If you pass a generator as `validation_data`, then this generator is expected to yield batches of `validation data` endlessly; thus you should also specify the `validation_steps` argument, which tells the process how many batches to draw from the validation generator for evaluation.

# Step 4. Fit the model

Set up parameters and settings.

In [21]:
# Declare relevant parameters
NUM_EPOCHS = 50

In [None]:
from tqdm import tqdm
import torch.optim as optim

# Check CUDA availability
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("CUDA is available. Training on GPU.")
else:
    device = torch.device("cpu")
    print("CUDA is not available. Training on CPU.")

# Move model to device
model = Net()
model.to(device)

# Optimizer and loss function
optimizer = optim.Adam(model.parameters())
criterion = nn.BCELoss()

# Store training history
history = {'train_loss': [], 'train_acc': [], 'val_loss': [], 'val_acc': []}

for epoch in range(NUM_EPOCHS):
    # Training phase
    model.train()  # Set the model to training mode
    train_loss = 0.0
    train_correct = 0
    for images, labels in tqdm(train_loader, desc=f"Epoch {epoch+1}/{NUM_EPOCHS} [Train]"):
        images, labels = images.to(device), labels.to(device)

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels.float().unsqueeze(1)) # Calculate loss - ensure labels are float and have correct shape

        # Backward pass and optimization
        loss.backward()
        optimizer.step()

        # Statistics
        train_loss += loss.item() * images.size(0)  # Multiply by batch size to get total loss for this batch
        preds = (outputs > 0.5).float() # Convert probabilities to binary predictions
        train_correct += (preds == labels.float().unsqueeze(1)).sum().item() # Count correct predictions

    # Validation phase
    model.eval()  # Set the model to evaluation mode
    val_loss = 0.0
    val_correct = 0
    with torch.no_grad():  # No need to track gradients during validation
        for images, labels in tqdm(validation_loader, desc=f"Epoch {epoch+1}/{NUM_EPOCHS} [Val]"):
            images, labels = images.to(device), labels.to(device)

            # Forward pass
            outputs = model(images)
            loss = criterion(outputs, labels.float().unsqueeze(1))

            # Statistics
            val_loss += loss.item() * images.size(0)
            preds = (outputs > 0.5).float()
            val_correct += (preds == labels.float().unsqueeze(1)).sum().item()

    # Calculate average losses and accuracies for the epoch
    train_loss = train_loss / len(train_loader.dataset)
    train_acc = train_correct / len(train_loader.dataset)
    val_loss = val_loss / len(validation_loader.dataset)
    val_acc = val_correct / len(validation_loader.dataset)

    # Store the results in the history dictionary
    history['train_loss'].append(train_loss)
    history['train_acc'].append(train_acc)
    history['val_loss'].append(val_loss)
    history['val_acc'].append(val_acc)

    # Print epoch results
    print(f"Epoch {epoch+1}/{NUM_EPOCHS}: "
          f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}, "
          f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")































































































































































# Step 5. Saving the model
It’s good practice to always save your models after training.

## Step 5A Saving the whole model
One option is to save the entire model.

In [None]:
torch.save(model, MODEL_DIR + '1B-cats_and_dogs_small.pth')



To load the model, you first have to initialize the same model class (`Net` in this case), then load the weights from the saved file.

In [None]:
loaded_model = torch.load(MODEL_DIR + '1B-cats_and_dogs_small.pth')
loaded_model.eval()

## Step 5B Saving the trained model's state dictionary
Another option is to only save and load the state dictionary. This method will require to have a properly defined model class that is compatible with the saved `state_dict`.

In [None]:
torch.save(model.state_dict(), MODEL_DIR + '1B-cats_and_dogs_small_state-dict.pth')

In [None]:
loaded_model = Net()  # Assuming 'Net' is your model class
loaded_model.load_state_dict(torch.load(MODEL_DIR + '1B-cats_and_dogs_small_state-dict.pth', weights_only=True))
loaded_model.eval()

<div class="alert alert-block alert-warning">

## ❓ Question ❓
What are the advantages and disadvantages of using `state_dict` versus saving and loading the entire model.
</div>

Let’s plot the loss and accuracy of the model over the training and validation data during training

# Step 6: Displaying curves of loss and accuracy during training

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

acc = history['train_acc']
val_acc = history['val_acc']
loss = history['train_loss']
val_loss = history['val_loss']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()

These plots are characteristic of **overfitting**. The training accuracy increases linearly over time, until it reaches nearly 100%, whereas the validation accuracy stalls at 70–73%. The validation loss reaches its minimum after only five epochs and then stalls, whereas the training loss keeps decreasing linearly until it reaches nearly 0. Because you have relatively few training samples (2,000), overfitting will be your number-one concern.

You already know about a number of techniques that can help mitigate overfitting such as L2 regularization (or weight decay). Another way is dropout or forced pruning of weights. We’re now going to work below with a new one, specific to computer vision and used almost universally when processing images with deep-learning models: data augmentation. This is very important when you have limited data for images!

<div class="alert alert-block alert-info">

## ⚠️ Checkpoint ⚠️

In the next 5-10 minutes, check with your LT that you understand everything that has happened so far.
</div>

# Step 7: Data augmentation

Overfitting is caused by having too few samples to learn from, rendering you unable to train a model that can generalize to new data. Given infinite data, your model would be exposed to every possible aspect of the data distribution at hand: you would never overfit. Data augmentation takes the approach of generating more training data from existing training samples, by augmenting the samples via a number of random transformations that yield believable-looking images. The goal is that at training time, your model will never see the exact same picture twice. This helps expose the model to more aspects of the data and generalize better. In Keras, this can be done by configuring a number of random transformations to be performed on the images read by the ImageDataGenerator instance. Let’s get started with an example.

Note we only do augmentation for the training set.

In [None]:
data_augment = transforms.Compose([
    transforms.RandomRotation(degrees=30),
    transforms.RandomAffine(degrees=0, translate=(0.1, 0.1), scale=(0.9, 1.1), shear=0.1),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.Resize((150, 150)),
    transforms.ToTensor()
])

In [None]:
BATCH_SIZE = 80

In [None]:
# Create the training dataset with augmentation
train_dataset_augment = ImageFolder(root=train_dir, transform=data_augment)
train_loader_augment = DataLoader(train_dataset_augment, batch_size=BATCH_SIZE, shuffle=True, num_workers=6, pin_memory=True)

# Step 8: Adding Dropout
To further combat overfitting, let's also add a **Dropout** layer to your model, right before the densely connected classifier.

Dropout prevents overfitting by preventing a layer's "over-reliance" to some of the inputs that are normally biased by order of appearance or duplicity. Dropout was introduced by Srivastava and Hinton *et al*, and you can download the research paper here: http://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf

<img width="981" alt="dropout" src="https://user-images.githubusercontent.com/25600601/134776375-ad90c14a-1a42-4ce9-8d84-13f4e6a99e26.png">

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=(3, 3), padding='valid') # Added padding to match Keras' default behavior
        self.relu1 = nn.ReLU()
        self.pool1 = nn.MaxPool2d(kernel_size=(2, 2))
        self.conv2 = nn.Conv2d(32, 64, kernel_size=(3, 3), padding='valid')
        self.relu2 = nn.ReLU()
        self.pool2 = nn.MaxPool2d(kernel_size=(2, 2))
        self.conv3 = nn.Conv2d(64, 128, kernel_size=(3, 3), padding='valid')
        self.relu3 = nn.ReLU()
        self.pool3 = nn.MaxPool2d(kernel_size=(2, 2))
        self.conv4 = nn.Conv2d(128, 128, kernel_size=(3, 3), padding='valid')
        self.relu4 = nn.ReLU()
        self.pool4 = nn.MaxPool2d(kernel_size=(2, 2))
        self.flatten = nn.Flatten()
        self.dropout = nn.Dropout(0.3)
        self.fc1 = nn.Linear(6272, 512) # Calculated the input size to the first dense layer
        self.relu5 = nn.ReLU()
        self.fc2 = nn.Linear(512, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.pool1(self.relu1(self.conv1(x)))
        x = self.pool2(self.relu2(self.conv2(x)))
        x = self.pool3(self.relu3(self.conv3(x)))
        x = self.pool4(self.relu4(self.conv4(x)))
        x = self.flatten(x)
        x = self.dropout(x)
        x = self.relu5(self.fc1(x))
        x = self.sigmoid(self.fc2(x))
        return x

In [None]:
model = Net()  # Assuming Net is your model class
summary(model, input_size=(1, 3, 150, 150), col_names=["input_size", "output_size", "num_params"], depth=4)

Let’s train the network using data augmentation and dropout.

In [None]:
# Declare relevant parameters
NUM_EPOCHS = 100

In [None]:
# Check CUDA availability
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("CUDA is available. Training on GPU.")
else:
    device = torch.device("cpu")
    print("CUDA is not available. Training on CPU.")

# Move model to device
model = Net()
model.to(device)

# Optimizer and loss function
optimizer = optim.Adam(model.parameters())
criterion = nn.BCELoss()

# Store training history
history = {'train_loss': [], 'train_acc': [], 'val_loss': [], 'val_acc': []}

for epoch in range(NUM_EPOCHS):
    # Training phase
    model.train()  # Set the model to training mode
    train_loss = 0.0
    train_correct = 0
    for images, labels in tqdm(train_loader_augment, desc=f"Epoch {epoch+1}/{NUM_EPOCHS} [Train]"):
        images, labels = images.to(device), labels.to(device)

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels.float().unsqueeze(1)) # Calculate loss - ensure labels are float and have correct shape

        # Backward pass and optimization
        loss.backward()
        optimizer.step()

        # Statistics
        train_loss += loss.item() * images.size(0)  # Multiply by batch size to get total loss for this batch
        preds = (outputs > 0.5).float() # Convert probabilities to binary predictions
        train_correct += (preds == labels.float().unsqueeze(1)).sum().item() # Count correct predictions

    # Validation phase
    model.eval()  # Set the model to evaluation mode
    val_loss = 0.0
    val_correct = 0
    with torch.no_grad():  # No need to track gradients during validation
        for images, labels in tqdm(validation_loader, desc=f"Epoch {epoch+1}/{NUM_EPOCHS} [Val]"):
            images, labels = images.to(device), labels.to(device)

            # Forward pass
            outputs = model(images)
            loss = criterion(outputs, labels.float().unsqueeze(1))

            # Statistics
            val_loss += loss.item() * images.size(0)
            preds = (outputs > 0.5).float()
            val_correct += (preds == labels.float().unsqueeze(1)).sum().item()

    # Calculate average losses and accuracies for the epoch
    train_loss = train_loss / len(train_loader_augment.dataset)
    train_acc = train_correct / len(train_loader_augment.dataset)
    val_loss = val_loss / len(validation_loader.dataset)
    val_acc = val_correct / len(validation_loader.dataset)

    # Store the results in the history dictionary
    history['train_loss'].append(train_loss)
    history['train_acc'].append(train_acc)
    history['val_loss'].append(val_loss)
    history['val_acc'].append(val_acc)

    # Print epoch results
    print(f"Epoch {epoch+1}/{NUM_EPOCHS}: "
          f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}, "
          f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")

<div class="alert alert-block alert-info">

## ⚠️ Checkpoint ⚠️

In the next 5-10 minutes, check with your LT that you understand everything that has happened so far. In particular:
- How does **Data Augmentation** work and how did it help with overfitting?
- How does **Dropout** work and how did it help with overfitting?
</div>

# Step 9: Saving the model one more time.

In [None]:
torch.save(model, MODEL_DIR + '1B-cats_and_dogs_small_v2.pth')

Let’s plot the results again. Thanks to data augmentation and dropout, you’re no longer overfitting: the training curves are closely tracking the validation curves. You now reach an accuracy of 83+\%, a 11-12\% relative improvement over the non-regularized model.

In [None]:
%matplotlib inline

acc = history['train_acc']
val_acc = history['val_acc']
loss = history['train_loss']
val_loss = history['val_loss']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()

# Step 10: Prototype testing
Let's test below the performance of the system with images from the neural network.

In [None]:
from PIL import Image

img_path = IMG_DIR + 'dog.jpg'

# 1. Load the image using PIL (same as Keras' image.load_img)
img = Image.open(img_path)

# 2. Define the transformation to convert to a tensor (and resize)
transform = transforms.Compose([
    transforms.Resize((150, 150)),  # Optional: Resize if needed
    transforms.ToTensor()            # Converts to tensor and scales to [0, 1]
])

# 3. Apply the transformation
img_tensor = transform(img)

# 4. Add a batch dimension (if needed)
img_tensor = img_tensor.unsqueeze(0)  # Now shape is (1, C, H, W)

print(img_tensor.shape)
print(img_tensor.dtype)

with torch.no_grad():
    Prediction=model.forward(img_tensor.float().to(device))
if Prediction >= .5:
    print("DOG")
else:
    print("CAT")
print(Prediction)

In [None]:
img