# CAM and Object Detection (modified)

## Authors: 
Sat Arora, sat.arora@uwaterloo.ca \
Richard Fan, r43fan@uwaterloo.ca

### Project Goal ***REMOVE THIS***:
"CAM and object detection". First, you should implement some standard method for CAM for some (simple) classification network trained on image-level tags. You should also obtain object detection (spacial localization of the object approximate "center"). You should apply your approach to one specific object type (e.g. faces, or anything else). Training should be done on image-level tags (e.g. face, no face). You can come up with your specialized dataset, but feel free to use subsets of standard data. You can also test the ideas on real datasets where label noise is present.



## Abstract

Class Activation Maps (CAMs) is a very important tool and concept in Computer Vision. During classification, the goal of CAMs is to indicate the regions of the image that were used by a Convolutional Neural Network to lead it to classifying an image as containing a certain object.

In order to understand what the Class Activation Maps do, this report will describe in detail the motivation, ideas & concepts that guide our process to making our own CAMs. Following this, we will do some deeper analysis of what happens in certain scenarios to better understand the algorithm's output.

The approach and motivation are inspired by [Learning Deep Features for Discriminative Localization](http://cnnlocalization.csail.mit.edu/Zhou_Learning_Deep_Features_CVPR_2016_paper.pdf) (Zhou, Khosla, Laperdriza, Oliva, Tarralba), a paper that was released in 2016. The appraoch is extended by comparing common classification CNNs (specifically, ResNet18) with a CNN that we train, analyzing the difference in image labelling and the heat map. These networks will be trained on a face/no-face dataset with labelling. 

## Team Contributions

Sat Arora: sat.arora@uwaterloo.ca
- Initial ResNet18 model for object detection.
- Heat map logic.
- Experimenting with multiple objects of same type.

Richard Fan: r43fan@uwaterloo.ca
- Creating custom model (and fine-tuning) for object detection.
- Heat map logic.
- Testing difference between custom model and ResNet18 model.

Fun fact: We are born on the same day.

## Motivation

### Conceptual Idea

As mentioned in the Abstract, the goal of CAMs is to indicate the regions of images that is used by the CNN to identify a certain category.

In the case of categorization, the last layer before output is a softmax layer (in order to determine which class is the most likely). Before running this last layer, if we run a technique called **Global Average Pooling (GAP)** on the convolutional feature maps at this point, then we can use these as features for a fully-connected layer that produces our categorization.

Note: The idea of GAP is straight forward. An implementation can be seen here:
$$\text{GAP}(F_d) = \frac{1}{H \times W} \sum_{i=1}^{H} \sum_{j=1}^{W} F_d(i, j)$$
Simply put, it averages the values of the maps into a singular number, and by doing so it reduces the dimensionality of the image.

With this structure, we can leverage our knowledge of how the softmax works: we can project the weights of the output layer onto the convolution feature maps. This essentially leaves us with a heatmap of the "most important" features (since higher weights in the classification will be where the object is). This technique is known as "Class Activation Mapping".

### How can this be more formally seen?
Say that $\forall (x,y)$, the activation of unit $k$ in the last convolutional layer in the CNN is $f_k(x,y)$. Then, after performing GAP, we have the average for unit $k$ to be $$F^k = \sum_{x,y}{f_k(x,y)}$$

Thus, we have that for some arbitrary class $c$, the input to the softmax in the final decision layer is $$S_c = \sum_k{w_k^cF_k}$$ where $w_k^c$ is exactly the "importance", or weight, of class $c$ for the unit $k$. Recall that the otuput of softmax is thus $$\frac{\exp(S_c)}{\sum_{c_0}{\exp(S_{c_0})}}$$ for class $c$. If we plug in $F^k = \sum_{x,y}{f_k(x,y)}$, we get $$S_c = \sum_{x,y}{\sum_k{w_k^cf_k(x,y)}}$$

Define $M_c$ to be the CAM for $c$, with each spatial element $M_c(x,y) = \sum_k{w_k^cf_k(x,y)}$. Then we can rewrite the definition of the class score $S_c$ to be $$S_c = \sum_{x,y}{M_c(x,y)}$$

As such, we see that $M_c(x,y)$ is exactly the importance of the activation for $c$ at spatial coordinate $(x,y)$. 

### What does this mean?

Thus, we can conclude that $f_k$ will be the map of the persence of the visual pattern corresponding to the location of the object. We have that the CAM is a weighted linear sum of these visual patterns, and so by upsamimagepling the CAM to the size of the input image, we can identify the image regions that played the biggest influence in the particular category. 

*Or, by a simple rethought, the regions that are highlighted correspond to the class that the CNN describes this image to be.*

## Code Libraries

Many libraries used in our implementation would be considered as "standard" in Computer Vision projects or courses, but we list out everything in the import order to get a better understanding of what each import is used for:

- ``PIL``: Used to read images from a directory. This image will get passed into the tensor layers. 

- ``torch`` / ``torchvision``: The main libraries for PyTorch (along with its own packages). These provide pre-set models (like ResNet18), and ability to create transformations and our own CNNs. This is extensively used for manipulating our tensors (along with ``numpy``, which is more forward-facing as will be seen), providing loaders for our training and testing process, and to perform training & computations on CUDA/MPS (GPU configurations) or the CPU.

- ``numpy``: Used to manipulate tensors from ``torch``, and acts as a middle layer to write data in a form that libarries such as ``cv2`` and ``PIL`` can understand. 

- ``cv2``: Used for dealing with image resizing, writing/drawing, and modifying. It is particularly useful in overlaying our heatmap on top of the image, and optionally writing an image to a directory for later use.

## Dataset
We use a dataset of face and non-face images found on Kaggle from Sagar Karar. To get this dataset and format it in the way that the program needs to read it, run the following commands.

**Note**: The first step assumes that you have the ``kaggle`` package installed on pip. Otherwise, click on [this link to the dataset page](https://www.kaggle.com/datasets/sagarkarar/nonface-and-face-dataset) and download the dataset. This will replace the first line in the bash script below.

In [14]:
import torch
import torchvision.models as models
import cv2
import numpy as np
from torchvision import transforms
import torch.nn as nn


In [2]:
# # Pre-trained model used: ResNet 18
# model = models.resnet(pretrained=True)
# final_convolution_layer = 'layer4'
# model.eval()




In [3]:
# # img = cv2.imread('river_hand.jpeg')

# if img is not None:
#     print("Image loaded successfully!")
# else:
#     print("Unable to load the image. Please check the file path.")
    
# img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

import torch
import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader

# Define transformations for data preprocessing
transform = transforms.Compose([
    transforms.Resize((224, 224)),  # Resize images to a uniform size
    transforms.ToTensor(),  # Convert images to PyTorch tensors
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # Normalize the images
])

# Load train and test datasets
train_dataset = ImageFolder('Dataset/train', transform=transform)
test_dataset = ImageFolder('Dataset/test', transform=transform)

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)


In [4]:
import torchvision.models as models
import torch.nn as nn

# Load a pre-trained ResNet18 model
model = models.resnet18(pretrained=True)

# for param in model.parameters():
#     param.requires_grad = False

# for param in model.fc.parameters():
#     param.requires_grad = True

# Modify the final fully connected layer for binary classification
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 2)  # 2 output classes: face and no-face

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
# optimizer = torch.optim.Adam(model.parameters(), lr=0.001)



In [5]:
# Training loop
# device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device = torch.device('mps')
model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# for param in model.parameters():
#     param.requires_grad = False
    
num_epochs = 5
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * images.size(0)

    epoch_loss = running_loss / len(train_dataset)
    print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {epoch_loss:.4f}')

# Evaluation on test set
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = correct / total
print(f'Test Accuracy: {accuracy * 100:.2f}%')



Epoch [1/5], Loss: 0.1183
Epoch [2/5], Loss: 0.0264
Epoch [3/5], Loss: 0.0315
Epoch [4/5], Loss: 0.0444
Epoch [5/5], Loss: 0.0151
Test Accuracy: 99.06%


In [66]:
# model.eval()

# Remove the fully connected layer
model2 = nn.Sequential(*list(model.children())[:-2])
# model2 = model

In [83]:
from torch.nn import functional as F
finalconv_name = 'layer4'
from torch.autograd import Variable
from PIL import Image

correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = correct / total
print(f'Test Accuracy: {accuracy * 100:.2f}%')



LABELS_file = 'imagenet-simple-labels.json'
image_file = 'sat.png'


# hook the feature extractor
features_blobs = []
def hook_feature(module, input, output):
    features_blobs.append(output.data.cpu().numpy())

model._modules.get(finalconv_name).register_forward_hook(hook_feature)

# get the softmax weight
params = list(model.parameters())
# print(params)
weight_softmax = np.squeeze(params[-2].data.cpu().numpy())

def returnCAM(feature_conv, weight_softmax, class_idx):
    # generate the class activation maps upsample to 256x256
    size_upsample = (256, 256)
    bz, nc, h, w = feature_conv.shape
    output_cam = []
    for idx in class_idx:
        cam = weight_softmax[idx].dot(feature_conv.reshape((nc, h*w)))
        cam = cam.reshape(h, w)
        cam = cam - np.min(cam)
        cam_img = cam / np.max(cam)
        cam_img = np.uint8(255 * cam_img)
        output_cam.append(cv2.resize(cam_img, size_upsample))
    return output_cam


normalize = transforms.Normalize(
   mean=[0.485, 0.456, 0.406],
   std=[0.229, 0.224, 0.225]
)
preprocess = transforms.Compose([
   transforms.Resize((224,224)),
   transforms.ToTensor(),
   normalize
])

# load test image
img_pil = Image.open(image_file)
if img_pil.mode == "RGBA":
    img_pil = img_pil.convert("RGB")
img_tensor = preprocess(img_pil)
# print("img", img_tensor)
img_variable = Variable(img_tensor.unsqueeze(0)).to(device)
logit = model(img_variable)

# load the imagenet category list
# with open(LABELS_file) as f:
#     classes = json.load(f)

print("output", logit)

classes = ['face', 'no_face']

h_x = F.softmax(logit, dim=1).data.squeeze()
probs, idx = h_x.sort(0, True)
probs = probs.cpu().numpy()
idx = idx.cpu().numpy()

# output the prediction
for i in range(0, 2):
    print('{:.3f} -> {}'.format(probs[i], classes[idx[i]]))

# generate class activation mapping for the top1 prediction
CAMs = returnCAM(features_blobs[0], weight_softmax, [idx[0]])

# render the CAM and output
print('output CAM.jpg for the top1 prediction: %s'%classes[idx[0]])
img = cv2.imread('sat.png')
height, width, _ = img.shape
heatmap = cv2.applyColorMap(cv2.resize(CAMs[0],(width, height)), cv2.COLORMAP_JET)
result = heatmap * 0.3 + img * 0.5
cv2.imwrite('CAM2.jpg', result)

import matplotlib.pyplot as plt
# plt.imshow(result)

# print(heatmap.shape)


Test Accuracy: 98.87%
output tensor([[ 8.9453, -9.5607]], device='mps:0', grad_fn=<LinearBackward0>)
1.000 -> face
0.000 -> no_face
output CAM.jpg for the top1 prediction: face


In [53]:
# model.eval()

# # Remove the fully connected layer
# model = nn.Sequential(*list(model.children())[:-2])

# Load and preprocess the image
img = cv2.imread('sat.png')
# img = cv2.imread('river_hand.jpeg')
# img = cv2.imread('image_2.jpg')
# img = cv2.imread('tejas.jpg')
# img = cv2.imread('shahan.jpg')
# img = cv2.imread('osama.jpg')
# img = cv2.imread('Human1250 copy.png')

if img is not None:
    print("Image loaded successfully!")
else:
    print("Unable to load the image. Please check the file path.")


img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
preprocess = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

input_img = preprocess(img).unsqueeze(0).to(device)

# print("input image", input_img)

# Forward pass to get feature maps
with torch.no_grad():
    feature_maps = model2(input_img)

# print("feature map", feature_maps)
# Get the weights of the final convolutional layer
final_conv_layer = None
for layer in reversed(model2):
    # if isinstance(layer, torch.nn.modules.container.Sequential):
    #     for l in reversed(layer):
    #         print("inside sequential", l)
    #         for d in reversed(l):
    #             print("inside sequential in", d)
    #             # if isinstance(d, torch.nn.modules.conv.Conv2d):
    #             #     final_conv_layer = d

    #     break
    if isinstance(layer, torch.nn.modules.conv.Conv2d):
        final_conv_layer = layer
        break

print(final_conv_layer)
if final_conv_layer is None:
    raise ValueError("Final convolutional layer not found in the model.")

final_conv_layer_weights = final_conv_layer.weight.detach().cpu()

# Compute the class activation map (CAM)
cam = np.zeros((feature_maps.shape[2], feature_maps.shape[3]), dtype=np.float32)
for i in range(final_conv_layer_weights.size(0)):
    weight = final_conv_layer_weights[i].detach().cpu().numpy()
    cam += np.sum(weight * feature_maps.squeeze(0)[i].cpu().numpy(), axis=0)

# cam = np.maximum(cam, 0)  # ReLU activation
cam = weight[0].dot(feature_maps[0].cpu().numpy().reshape(-1, 7 * 7))
cam = cv2.resize(cam, (img.shape[1], img.shape[0]))
cam = cam - np.min(cam)
cam = cam / np.max(cam)

# Apply heatmap on the original image
heatmap = cv2.applyColorMap(np.uint8(255 * cam), cv2.COLORMAP_JET)

print(heatmap.shape)
# heatmap = np.flip(heatmap, axis=0)
superimposed_img = heatmap * 0.3 + img.astype('float32') * 0.5
superimposed_img = superimposed_img / superimposed_img.max()

# Display the original image and the image with the heatmap
# cv2.imshow('Original Image', img)
# cv2.imshow('CAM', np.uint8(255 * superimposed_img))
import matplotlib.pyplot as plt 
plt.imshow(img)

plt.imshow(np.uint8(255 * superimposed_img))
# cv2.waitKey(0)
# cv2.destroyAllWindows()

Image loaded successfully!
Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)


ValueError: shapes (7,7) and (512,49) not aligned: 7 (dim 1) != 512 (dim 0)

In [75]:
model

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In [9]:
# model.eval()

# # Remove the fully connected layer
# model = nn.Sequential(*list(model.children())[:-2])

# Load and preprocess the image
# img = cv2.imread('sat.png')
img = cv2.imread('2 faces.png')
# img = cv2.imread('river_hand.jpeg')
# img = cv2.imread('image_2.jpg')
# img = cv2.imread('tejas.jpg')
# img = cv2.imread('shahan.jpg')
# img = cv2.imread('osama.jpg')
# img = cv2.imread('Human1250 copy.png')

if img is not None:
    print("Image loaded successfully!")
else:
    print("Unable to load the image. Please check the file path.")

features_blobs = []
def hook_feature(module, input, output):
    features_blobs.append(output.data.cpu().numpy())

print(model.eval())

model._modules.get('layer4').register_forward_hook(hook_feature)

img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
preprocess = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_img = preprocess(img).unsqueeze(0).to(device)


# Forward pass to get feature maps
with torch.no_grad():
    feature_maps = model(input_img)

# # Get the weights of the final convolutional layer
# final_conv_layer = None
# for layer in reversed(model2):
#     # print("layer", layer)
#     if isinstance(layer, torch.nn.modules.conv.Conv2d):
#         final_conv_layer = layer
#         break

print("feature blobs", len(features_blobs))
print(feature_maps.shape)

# print(final_conv_layer)
# if final_conv_layer is None:
#     raise ValueError("Final convolutional layer not found in the model.")

# final_conv_layer_weights = final_conv_layer.weight.detach().cpu()

# Compute the class activation map (CAM)
# cam = np.zeros((feature_maps.shape[2], feature_maps.shape[3]), dtype=np.float32)
# for i in range(final_conv_layer_weights.size(0)):
#     weight = final_conv_layer_weights[i].detach().cpu().numpy()
#     # print(weight.shape)
#     params = list(model.parameters())
#     # print(params)
#     weight = np.squeeze(params[-2].data.cpu().numpy())
#     print(feature_maps.squeeze(0)[i].cpu().numpy().shape)
#     cam += np.sum(weight * feature_maps.squeeze(0)[i].cpu().numpy(), axis=0)


params = list(model.parameters())
#     # print(params)
weight = np.squeeze(params[-2].data.cpu().numpy())
print("weight shape", weight.shape)
# cam = np.sum(weight[0].T * feature_maps[0].cpu().numpy(), axis=0)
# cam = weight[0].dot(feature_maps[0].cpu().numpy().reshape(-1, 7 * 7))
cam = weight[0].dot(features_blobs[0].reshape(-1, 7 * 7))

print("cam", cam)
# cam = np.maximum(cam, 0)  # ReLU activation
# cam = cv2.resize(cam, (img.shape[1], img.shape[0]))
cam = cam.reshape(7, 7)
cam = cam - np.min(cam)
cam = cam / np.max(cam)
cam = np.uint8(255 * cam)
cam = cv2.resize(cam, (256, 256))
cam = cv2.resize(cam, (img.shape[1], img.shape[0])) 
print("shape", cam.shape)

# Apply heatmap on the original image
heatmap = cv2.applyColorMap(cam, cv2.COLORMAP_JET)
result = heatmap * 0.3 + img * 0.5
cv2.imwrite('CAM3.jpg', result)

# print(heatmap.shape)
# # heatmap = np.flip(heatmap, axis=0)
# superimposed_img = heatmap * 0.3 + img.astype('float32') * 0.5
# superimposed_img = superimposed_img / superimposed_img.max()

# # Display the original image and the image with the heatmap
# # cv2.imshow('Original Image', img)
# # cv2.imshow('CAM', np.uint8(255 * superimposed_img))
# import matplotlib.pyplot as plt 
# plt.imshow(img)

# plt.imshow(np.uint8(255 * superimposed_img))
# # cv2.waitKey(0)
# # cv2.destroyAllWindows()

Image loaded successfully!
ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (r

True

In [25]:
model2.eval()

Sequential(
  (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU(inplace=True)
  (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Con

In [52]:
weight[0].dot(feature_maps[0].cpu().numpy().reshape(-1, 7 * 7)).shape

(49,)

In [8]:
import torch
import torchvision.models as models
import cv2
import numpy as np
from torchvision import transforms
import torch.nn as nn

# Load pre-trained model
model = models.resnet50(pretrained=True)
model.eval()

# Remove the fully connected layer
model = nn.Sequential(*list(model.children())[:-2])

# Load and preprocess the image
img = cv2.imread('image_1.jpg')

if img is not None:
    print("Image loaded successfully!")
else:
    print("Unable to load the image. Please check the file path.")

img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
preprocess = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_img = preprocess(img).unsqueeze(0)

# Forward pass to get feature maps
with torch.no_grad():
    feature_maps = model(input_img)

# Get the weights of the final convolutional layer
final_conv_layer = None
for layer in reversed(model):
    if isinstance(layer, torch.nn.modules.conv.Conv2d):
        final_conv_layer = layer
        break

if final_conv_layer is None:
    raise ValueError("Final convolutional layer not found in the model.")

final_conv_layer_weights = final_conv_layer.weight.detach().cpu()

# Compute the class activation map (CAM)
cam = np.zeros((feature_maps.shape[2], feature_maps.shape[3]), dtype=np.float32)
for i in range(final_conv_layer_weights.size(0)):
    weight = final_conv_layer_weights[i].detach().cpu().numpy()
    cam += np.sum(weight * feature_maps.squeeze(0)[i].cpu().numpy(), axis=0)

cam = np.maximum(cam, 0)  # ReLU activation
cam = cv2.resize(cam, (img.shape[1], img.shape[0]))
cam = cam - np.min(cam)
cam = cam / np.max(cam)

# Apply heatmap on the original image
heatmap = cv2.applyColorMap(np.uint8(255 * cam), cv2.COLORMAP_JET)
print(heatmap.shape)
superimposed_img = heatmap * 0.4 + img.astype('float32') * 0.6
superimposed_img = superimposed_img / superimposed_img.max()

# Display the original image and the image with the heatmap
cv2.imshow('Original Image', img)
cv2.imshow('CAM', np.uint8(255 * superimposed_img))
cv2.waitKey(0)
cv2.destroyAllWindows()




Image loaded successfully!
(395, 640, 3)
