# Explainable AI Final Project CAM Implmentation and CAM Occluded Cone Analysis
### Katie Hucker (kh509)

In this notebook we have the implmentation code to run CAM interpretations for Nuscenes traffic cone images. The CAM interpretations are ran on a finetuned YOLOv8 model that was trained to detect the traffic cones within the dataset.

The first section defines the code: functions, process, the HOW we get the CAM interpretations.

Then we compare 2 visibility level traffic cones: occluded (1-40% visible) and visible (80-100%). This is important to my Capstone groups analysis of how YOLO performs for very occluded traffic cones.  

[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1s8mPbGM1qWdGn4XyP372Eam2KCHGJ2t1?usp=sharing)

# Section 1: The Code -- How can we create the CAM analysis?

This section breaks down the functions created and used, as well, descriptions and implmentation how-to comments.

In [None]:
!pip install ultralytics


In [None]:
import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
import torch
from ultralytics import YOLO

## Set-Up Code

This function prepares an image for model input by converting it to a PyTorch tensor, handling both numpy arrays and tensors. It transposes the color channel order from HWC (height, width, channels) to CHW, normalizes pixel values to [0, 1], and adds a batch dimension.

In [None]:
def preprocess_image(input_img):


    if len(input_img.shape) == 3 and input_img.shape[2] == 3:  # HWC format
        input_tensor = torch.from_numpy(input_img.transpose(2, 0, 1)).unsqueeze(0).float()
    else:
        input_tensor = torch.from_numpy(input_img).unsqueeze(0).float()

    # Normalize to [0, 1]
    if input_tensor.max() > 1:
        input_tensor = input_tensor / 255.0

    return input_tensor

This function loads and runs a YOLO model on a given image. It reads the image using OpenCV, resizes it to the YOLO-compatible size (640x640), converts it to RGB format, and performs inference using the YOLO model.


In [None]:
def run_model(model_path, img_path):

    img = cv2.imread(img_path)
    img = cv2.resize(img, (640, 640))
    rgb_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    model = YOLO(model_path)
    # Run inference

    with torch.no_grad():
        results = model(rgb_img)

    return model, results

## CAM Code

This function extracts intermediate feature maps from a specified layer in the YOLO model using a forward hook. It preprocesses the input image, sends it through the model, and captures the output of the given layer.

In [None]:
def extract_feature_maps(model, layer, input_img):

  #followed process described by Zhou 2015 CAM Paper cited below
  #used lots of other resources all cited below

    # Preprocess image
    input_tensor = preprocess_image(input_img)

    # Move to device
    device = next(model.model.parameters()).device
    input_tensor = input_tensor.to(device)

    # Forward hook
    feature_maps = []

    def hook_fn(module, input, output):
        feature_maps.append(output.detach().cpu())

    handle = layer.register_forward_hook(hook_fn)

    # Run model
    with torch.no_grad():
        model(input_tensor)

    # Remove hook
    handle.remove()

    # Return feature maps if any were captured
    if feature_maps:
        return feature_maps[0]

    return None

This function creates a visual activation map from a tensor of feature maps—typically extracted from an intermediate layer of a neural network like YOLO. It sums the absolute values across the channel dimension to capture general feature intensity, resizes the result to the original image’s shape, and normalizes it to a [0, 1] range. This map highlights areas of the image that strongly activate the network and is useful in CAM-style visualizations to understand model focus areas.

In [None]:
def generate_activation_map(feature_maps, input_img_shape):

  #followed process described by Zhou 2015 CAM Paper cited below
  #used lots of other resources all cited below

    # check dimensions
    if feature_maps.dim() == 4:  # BCHW
        feature_map = feature_maps[0]  # CHW
    else:
        feature_map = feature_maps

    # Generate activation map by summing across channels
    activation_map = torch.sum(torch.abs(feature_map), dim=0).numpy()

    # Resize to input image size
    h, w = input_img_shape[:2]
    activation_map = cv2.resize(activation_map, (w, h))

    # Normalize
    activation_map = activation_map - np.min(activation_map)
    activation_map = activation_map / (np.max(activation_map) + 1e-7)

    return activation_map

This function calls the functions above and returns the activation map given the found features in each layer.

In [None]:
def create_cam(model, layer, input_img):
    feature_maps = extract_feature_maps(model, layer, input_img)


    print(f"Feature maps shape: {feature_maps.shape}")

    # Generate activation map
    activation_map = generate_activation_map(feature_maps, input_img.shape)
    print(f"Created activation map with shape {activation_map.shape}")

    return activation_map


## Plotting Code

This function visualizes a Class Activation Map (CAM) by applying a heatmap over the input image to highlight the most activated regions. It normalizes the CAM mask, applies a colormap, and blends it with the original image.

In [None]:
def visualize_cam(img, mask, use_rgb=False, colormap=cv2.COLORMAP_JET, image_weight=0.5):

    if mask.ndim > 2: #Claude Sonnet 3.7 was used to generate this if statement 4/15/2025
        mask = mask[0]

    # [0, 1] range
    mask = mask - mask.min()
    mask = mask / (mask.max() + 1e-7)

    # Convert to uint8 needed for colormap
    mask_uint8 = np.uint8(255 * mask)

    heatmap = cv2.applyColorMap(mask_uint8, colormap) #https://docs.opencv.org/4.x/d3/d50/group__imgproc__colormap.html
    if use_rgb:
        heatmap = cv2.cvtColor(heatmap, cv2.COLOR_BGR2RGB)
    heatmap = np.float32(heatmap) / 255

    #keep in [0, 1] range
    if np.max(img) > 1:
        img = img / 255.0

    # Apply the heatmap above the original image
    cam = (1 - image_weight) * heatmap + image_weight * img #Claude 3.7 was used to generate this line of code on 4/15/2025
    cam = cam / np.max(cam)
    return np.uint8(255 * cam)

In [None]:
def plot_grid(original_img, detection_img, cam_results, output_dir, filename="comparison.jpg"):
    """
    Make a grid of all the image outputs
    """
    rows = 2
    cols = 3

    fig, axes = plt.subplots(rows, cols, figsize=(cols * 5, rows * 5))
    axes = axes.flatten()

    axes[0].imshow(original_img)
    axes[0].set_title("Original Image")
    axes[0].axis('off')


    axes[1].imshow(detection_img)
    axes[1].set_title("YOLO Detections")
    axes[1].axis('off')

    for i, result in enumerate(cam_results[:4]):   # Claude 3.7 was used to generate this for loop on 4/14
        axes[i+2].imshow(result['image'])
        axes[i+2].set_title(f"Layer {result['layer_idx']}: {result['layer_name']}")
        axes[i+2].axis('off')

    for i in range(len(cam_results) + 2, rows * cols):
        axes[i].axis('off')

    plt.tight_layout()
    plt.savefig(os.path.join(output_dir, filename))
    print(f"Saved comparison to {output_dir}/{filename}")



In [None]:
def base_detect_plot(model, results, img_path):

    # preprocess image
    img = cv2.imread(img_path)
    img = cv2.resize(img, (640, 640))
    rgb_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img_float = np.float32(rgb_img) / 255.0

    #show original image
    plt.figure(figsize=(10, 10))
    plt.imshow(rgb_img)
    plt.axis('off')
    plt.title("Original Image")

    #show detections
    det_img = results[0].plot()
    plt.figure(figsize=(10, 10))
    plt.imshow(det_img)
    plt.axis('off')
    plt.title("YOLO Detections")

    return rgb_img, det_img

## Main Function


In [None]:
def test_cam(model, img_path, output_dir="cam_results"):

    # preprocess image
    img = cv2.imread(img_path)
    img = cv2.resize(img, (640, 640))
    rgb_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img_float = np.float32(rgb_img) / 255.0
    os.makedirs(output_dir, exist_ok=True)

    # Layers to test based on YOLO v8 architecture and resource/AI recommendations
    layers_to_test = [
        (16, model.model.model[16]),  # Conv layer
        (19, model.model.model[19]),  # Conv layer
        (6, model.model.model[6]),    # Feature layer
        (8, model.model.model[8])     # Feature layer
    ]
    cam_results = []
    #call other functions based on layers
    for idx, layer in layers_to_test:
        print(f"\nTesting CAM on layer {idx}: {layer.__class__.__name__}")

        cam = create_cam(model, layer, img_float)
        cam_image = visualize_cam(img_float, cam, use_rgb=True)
        filename = f"cam_layer_{idx}.jpg"

        print(f"Saved CAM for layer {idx} to {filename}")

        cam_results.append({
            'layer_idx': idx,
            'layer_name': layer.__class__.__name__,
            'image': cam_image,
            'filename': filename
        })
    return cam_results

In [None]:
#mount drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:

model_path = "/content/drive/MyDrive/Capstone/explainable models/best.pt"  # Path to your model
image_path = "/content/drive/MyDrive/Capstone/explainable models/images/n015-2018-07-18-11-41-49+0800__CAM_BACK__1531885800937525.jpg"  # Path to image


In [None]:
model, results = run_model(model_path, image_path)

In [None]:
rgb_image, detect_image = base_detect_plot(model, results, image_path)

In [None]:
output = test_cam(model, image_path)

In [None]:
plot_grid(rgb_image, detect_image, output, output_dir="cam_results")

# Section 2: Occluded Cone Analysis for CAM

This section will break down the visible cones and the occluded cones and how CAM focus on them.

## Visible Cones

First we will analyze the cones labeled 80-100% visible by the Nuscenes dataset.

In [None]:

model_path = "/content/drive/MyDrive/Capstone/explainable models/best.pt"  # Path to your model
image_path = "/content/drive/MyDrive/Capstone/explainable models/images/n008-2018-08-01-15-16-36-0400__CAM_BACK_RIGHT__1533151427028113.jpg"

In [None]:
model, results = run_model(model_path, image_path)

In [None]:
rgb_image, detect_image =base_detect_plot(model, results, image_path)

In [None]:
output = test_cam(model, image_path)

In [None]:
plot_grid(rgb_image, detect_image, output, output_dir="cam_results")

We analyze 4 different Class Activation Mapping layers. This is due to us finetuning YOLO so the architecture's weights may be different looking for a traffic cone then the traditional YOLO model. We see layer 16 and 19 perform the best as these are deeper in the architechture.

The better performance of Layer 16 likely stems from its position in YOLOv8's architecture. Layer 16 is deeper in the convolutional network, so it benefits from more complex feature representations while maintaining the spatial resolution necessary for precise localization. The convolutional layer also provides the ability to integrate both low-level features (edges, colors) and higher-level semantic understanding. The other layers show poorer performance both in accuracy and response. Layers 6 and 8 (C2f) show more diffuse activations, indicating they're still processing basic features like edges and textures with less object-specific understanding. Layer 19, a deep layer, shows poorer performance than its earlier layer 16. Layer 19 may encode overly abstract representations that sacrifice fine-grained spatial information so that it may be tuned for larger objects or global scene understanding. We will keep this in mind as we assess the occluded cone.

We see strong responses on the cones in 16 and 18. Specificall in 16 the cone shape seems to really shine through which is a good sign. We see strong response in the form of red in both these alyers as well. In the early layers we see either incorrect strong responses or minimal response. These layers were probably processing and understand the broad feature map of the image.


## Occluded Cone Analysis


In [None]:

model_path = "/content/drive/MyDrive/Capstone/explainable models/best.pt"  # Path to your model
image_path = "/content/drive/MyDrive/Capstone/explainable models/images/n015-2018-07-11-11-54-16+0800__CAM_BACK_LEFT__1531281493197423.jpg"

In [None]:
model, results = run_model(model_path, image_path)

In [None]:
rgb_image, detect_image =base_detect_plot(model, results, image_path)

In [None]:
output = test_cam(model, image_path)

In [None]:
plot_grid(rgb_image, detect_image, output, output_dir="cam_results")

## Occluded Cone Discussion

This shows great results for layer 16 and 19! We see the cone shape very specifically with layer 16 getting the cone pointy shap exactly. It even avoids the gap in between with no response there. There is some diffuse response on the builing within these layer which is interesting. It could be due to the similar circular nature of the building columns or just YOLO assessing the image. However we see a similar poor response inlayer 6 and 8.

## Overall Discussion

CAM shows accurate and strong response on both the occluded and visible cones this is a promising result. We also canclearly say that layer 16 is a good layer to assess YOLO's performance for detecting traffic cones. In addition this is the simple CAM method, where there are more complex saliency methods like GRAD-CAM, Score CAM, etc. The CAM methos performs well as is, so with more advanced methods it would be iteresting to see the improved performance. This code took about a minute to run completely on a L4 GPU so very quick compared to the other methods, and more accurate. Howver we are unsure of specific response strength or the contributing specific pixels like we do in other methods.

## References

https://paperswithcode.com/method/cam

https://arxiv.org/abs/1512.04150


https://zilliz.com/learn/class-activation-mapping-CAM

https://medium.com/@stepanulyanin/implementing-grad-cam-in-pytorch-ea0937c31e82

https://arxiv.org/abs/1910.01279

https://github.com/rigvedrs/YOLO-V11-CAM

https://github.com/orgs/ultralytics/discussions/1985

https://github.com/ultralytics/ultralytics/issues/2020

