In [1]:
# Install the watermark package.
# This package is used to record the versions of other packages used in this Jupyter notebook.
# https://github.com/rasbt/watermark
!pip install -q -U watermark

In [26]:
import torch  # Importing the PyTorch library for deep learning
import torch.nn as nn  # Importing the neural network module of PyTorch
import torch.optim as optim  # Importing the optimization module of PyTorch
import torchvision.transforms as transforms  # Importing image transformations from torchvision
import torchvision.models as models  # Importing pre-trained models from torchvision
from PIL import Image  # Importing the Python Imaging Library for image processing

In [30]:
import warnings

# Disable all warnings
warnings.filterwarnings("ignore")

In [31]:
# Load the watermark extension to display information about the Python version and installed packages.
%reload_ext watermark

# Display the versions of Python and installed packages.
%watermark -a 'Fabiano Falcão' -ws "https://fabianumfalco.github.io/" --python --iversions

Author: Fabiano Falcão

Website: https://fabianumfalco.github.io/

Python implementation: CPython
Python version       : 3.10.6
IPython version      : 8.11.0

PIL        : 9.0.1
torch      : 2.0.0
torchvision: 0.15.1



In [32]:
# Set the device to CUDA (GPU) if available, otherwise use CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")  
print(f"Using device: {device}")  # Print the device being used

Using device: cuda


# ResNet-50

The ResNet-50 is a convolutional neural network (CNN) architecture that was proposed as a solution to the problem of vanishing gradients in deep network training. It was first introduced in the paper "Deep Residual Learning for Image Recognition" by Kaiming He et al. in 2015.

The ResNet-50 consists of 50 layers, including convolutional, activation, and pooling layers. The main innovation of ResNet-50 is the use of residual connections, which allow the gradient information to be directly propagated through the layers, even in deep networks. These residual connections enable the model to learn richer and deeper representations of the images, thereby improving the accuracy of object recognition.

ResNet-50 was trained on a large dataset, such as ImageNet, which contains millions of images across various classes. This allowed the model to learn general features of a wide range of objects and textures. As a result, ResNet-50 has become a benchmark model for image classification tasks, achieving state-of-the-art performance on object recognition benchmarks.

Furthermore, due to its deep learning capacity and rich representations, ResNet-50 has also been used as a base for transfer learning, where the learned features can be transferred to related tasks such as object detection and semantic segmentation.

In summary, ResNet-50 is a powerful CNN architecture that overcame the limitations of deep networks by introducing residual connections. It excelled in image classification tasks and has served as a foundation for many other computer vision applications.

In [33]:
# Load the pre-trained ResNet-50 model from torchvision
# https://pytorch.org/vision/main/models/generated/torchvision.models.resnet50.html
# https://pytorch.org/hub/nvidia_deeplearningexamples_resnet50/
model = models.resnet50(pretrained=True)  

# Move the model to GPU if available
model = model.to(device)

# sets the model to evaluation mode. 
# In this mode, the model behaves differently during inference compared to training. 
# For example, dropout layers are deactivated, batch normalization layers use the running statistics, 
#and the gradients are not computed for parameters. Setting the model to evaluation mode is important 
#to ensure consistent and accurate results during inference.
model.eval()


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

# Cross entropy loss

Cross entropy loss, also known as softmax loss or log loss, is a commonly used loss function in machine learning for multi-class classification tasks. It measures the dissimilarity between the predicted probability distribution and the true distribution of the target labels.

In the context of classification, the cross entropy loss quantifies the difference between the predicted class probabilities and the actual class labels. It calculates the average negative log likelihood of the predicted probabilities for the true class labels. The goal is to minimize this loss function during the training process to improve the accuracy of the model.

The formula for cross entropy loss involves taking the logarithm of the predicted probabilities, multiplying them with the true label's one-hot encoded representation, and summing the values across all classes. The loss is then averaged over the batch or the entire dataset.

By optimizing the cross entropy loss, the model learns to assign higher probabilities to the correct class labels and lower probabilities to the incorrect ones. This loss function encourages the model to produce more confident predictions for the correct classes and penalizes uncertain or incorrect predictions.

The cross entropy loss is widely used because it is effective for optimizing models in multi-class classification tasks. It provides a gradient signal that guides the model's parameters towards better predictions, facilitating the learning process. Many deep learning frameworks, including PyTorch, provide an implementation of the cross entropy loss for easy integration into the training pipeline.

In [34]:
# Define the loss function for multi-class classification tasks
# initializing criterion with nn.CrossEntropyLoss(), we can use it later in the training loop 
# to compute and backpropagate the loss, updating the model's parameters to improve its performance 
# in the classification task.
criterion = nn.CrossEntropyLoss()

In [35]:
# The transforms.Compose() function is used to chain the transformations together into 
# a single pipeline, allowing for easy and efficient preprocessing of the input images before
# feeding them into the model.

preprocess = transforms.Compose([
    transforms.Resize(256),  # Resize the image to a square of size 256x256 pixels
    transforms.CenterCrop(224),  # Crop the center of the image to a size of 224x224 pixels
    transforms.ToTensor()  # Convert the image to a tensor representation
])

In [36]:
# Preprocess the input image and convert it into a tensor
# .unsqueeze(0): This adds an extra dimension to the tensor, making it a 4-dimensional tensor
# with a batch size of 1. This is necessary as many deep learning models expect inputs in batch format.
image = preprocess(Image.open('dog.jpg')).unsqueeze(0)  

# Move the image and model to GPU if available
image = image.to(device)

In [37]:
# Remove the extra batch dimension from the image tensor
image2 = torch.squeeze(image, 0)

# Create an instance of the ToPILImage transformation
T = transforms.ToPILImage()

# Convert the tensor image back to a PIL Image object
img = T(image2)

# Save the PIL Image as 'dog_original.jpg'
img.save('dog_original.jpg')

In [38]:
# Set the target label to 3
# In the ResNet-50 architecture of torchvision.models, the label 3 corresponds to the specific class "cat" 
# within the ImageNet dataset. 
# https://deeplearning.cms.waikato.ac.nz/user-guide/class-maps/IMAGENET/
target_label = 3 # tiger shark, Galeocerdo cuvieri


# epsilon (ε)

In the context of adversarial attacks or perturbation-based techniques, epsilon (ε) is a small positive constant used to control the magnitude of perturbation applied to an image. It determines the maximum allowable change in pixel values for generating adversarial examples while maintaining visual similarity to the original image.

The purpose of using epsilon is to strike a balance between the effectiveness of the attack and the perceptibility of the perturbation. By setting an appropriate value for epsilon, one can control the level of distortion introduced to the image to deceive a machine learning model.

During adversarial attacks, the original image is perturbed by adding imperceptible perturbations to the pixel values. The magnitude of these perturbations is constrained by epsilon. A smaller epsilon value limits the amount of perturbation, making it less likely to be detected by humans but potentially less effective in fooling the model. On the other hand, a larger epsilon value allows for more significant perturbations, increasing the likelihood of misleading the model but potentially introducing noticeable visual changes.

The choice of epsilon depends on factors such as the sensitivity of the targeted model, the specific attack technique being employed, and the desired trade-off between stealthiness and attack success rate. It is often determined through experimentation and fine-tuning to achieve the desired level of adversarial perturbation while minimizing perceptibility.

In [39]:
# Set the value of epsilon to 0.03
epsilon = 0.03

# Enable gradient calculation for the image tensor
image.requires_grad = True

In [41]:
# Perform forward pass through the model
output = model(image)

# Calculate the loss using the criterion of the model's output compared to a target label
loss = criterion(output, torch.tensor([target_label]).to(device))

In [42]:
# Reset gradients to zero
model.zero_grad()

# Perform backward pass to calculate gradients
loss.backward()

In [43]:
# Calculate the sign of the gradients of the image
# Determine the direction of perturbation that maximizes the loss function.
# image.grad represents the gradients of the image with respect to some loss function. Gradients capture 
# the direction and magnitude of the steepest ascent in the loss landscape.
sign_grad = torch.sign(image.grad.data)

# In the context of adversarial attacks, the sign of gradients helps determine the direction to perturb 
# the image in order to create an adversarial example that can mislead the model's predictions.

# Generate the adversarial image by adding perturbation based on the sign of gradients
adversarial_image = image + epsilon * sign_grad

# By multiplying the perturbation (scaled by epsilon) with the sign of the gradients and adding it 
# to the original image, we introduce a small distortion to the image in the direction indicated 
# by the sign of the gradients. This distortion is intended to create an adversarial example that 
# can potentially fool a machine learning model.

# The resulting adversarial_image is a tensor with the same shape as the original image, where 
# each pixel has been modified according to the sign of the gradients. The purpose is to create 
# a visually similar image that can lead to different predictions or misclassify the input when fed into 
# a machine learning model.

In [44]:
# Squeeze the dimensions of adversarial_image
adversarial_image2 = torch.squeeze(adversarial_image, 0)

# Convert the tensor to a PIL image
T = transforms.ToPILImage()
img = T(adversarial_image2)

# Save the adversarial image to disk
img.save('dog_adversarial.jpg')

In [51]:
# Predefined list of ImageNet labels
# https://deeplearning.cms.waikato.ac.nz/user-guide/class-maps/IMAGENET/

# Print the original prediction from the model
# 207 - golden retriever
print("Original prediction:", torch.argmax(output).item())

# Print the adversarial prediction obtained after applying the perturbation
# 222 - kuvasz
print("Adversarial prediction:", adversarial_prediction.item())

Original prediction: 207
Adversarial prediction: 222
