# Mask R-CNN Inference Visualization Pipeline (ResNet Backbone)

This notebook provides comprehensive visualization of **every stage** of Mask R-CNN with **ResNet-50/101 backbone** during inference.

## Pipeline Stages Visualized:
1. **Input Image** - Original image preprocessing
2. **ResNet Backbone** - Feature extraction at multiple scales (C2-C5)
3. **Feature Pyramid Network (FPN)** - Multi-scale feature fusion (P2-P5)
4. **Region Proposal Network (RPN)** - Anchor-based proposals
5. **RoI Align** - Feature extraction with bilinear sampling
6. **Box Head** - FC layers for classification and regression
7. **Mask Head** - Convolutional layers for segmentation
8. **Grad-CAM Heatmaps** - Class activation maps showing "where the model looks"
9. **Final Predictions** - Boxes, classes, and masks

**Note:** This notebook uses torchvision's pretrained ResNet-FPN Mask R-CNN instead of our custom EfficientNet + CBAM backbone.

---

## 1. Setup and Configuration

In [None]:
import sys
import os
from pathlib import Path

# Add project root to path
PROJECT_ROOT = Path(os.getcwd()).parent if 'notebooks' in os.getcwd() else Path(os.getcwd())
sys.path.insert(0, str(PROJECT_ROOT))

print(f"Project root: {PROJECT_ROOT}")

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import cv2

# Enable interactive matplotlib
%matplotlib inline
plt.rcParams['figure.figsize'] = (16, 10)
plt.rcParams['figure.dpi'] = 100

# Check device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")
if device == 'cuda':
    print(f"GPU: {torch.cuda.get_device_name(0)}")

In [None]:
!git clone https://github.com/michaelo-ponteski/isaid-instance-segmentation.git
%cd /kaggle/working/isaid-instance-segmentation
!git pull
!git switch gradcam
!pip install --upgrade wandb

In [None]:
import importlib
import visualization.gradcam_pipeline

importlib.reload(visualization.gradcam_pipeline)

from visualization.gradcam_pipeline import (
    GradCAMConfig,
    denormalize_image,
    overlay_heatmap,
    visualize_fpn_features,
    visualize_roi_align_grid,
    visualize_final_predictions,
    ISAID_CLASS_LABELS,
    ISAID_COLORS,
)

# Import torchvision's pretrained Mask R-CNN models
from torchvision.models.detection import (
    maskrcnn_resnet50_fpn,
    maskrcnn_resnet50_fpn_v2,
    MaskRCNN_ResNet50_FPN_Weights,
    MaskRCNN_ResNet50_FPN_V2_Weights,
)
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor

print("Visualization pipeline and ResNet models imported successfully")

In [None]:
import wandb
from kaggle_secrets import UserSecretsClient

user_secrets = UserSecretsClient()
my_secret = user_secrets.get_secret("wandb_key")
wandb.login(key=my_secret)

In [None]:
run = wandb.init()
# Update artifact path if you have a trained ResNet model
artifact = run.use_artifact('marek-olnk-put-pozna-/isaid-custom-segmentation/isaid-maskrcnn-resnet50-final:v0', type='model') 
artifact_dir = artifact.download()

## 2. Configuration

Set your model checkpoint path and image path here:

In [None]:
# =============================================================================
# CONFIGURATION 
# =============================================================================

# Backbone choice: "resnet50", "resnet50_v2", or "resnet101"
BACKBONE_CHOICE = "resnet50"

# Path to your trained model checkpoint
CHECKPOINT_PATH = artifact_dir

# Path to an image for inference (can be from validation set)
IMAGE_PATH = "/kaggle/input/isaid-patches/iSAID_patches/test/images/P0017_2400_3200_0_800.png"
# Buses and cars "/kaggle/input/isaid-patches/iSAID_patches/test/images/P0015_0_800_600_1400.png" 
# Airport "/kaggle/input/isaid-patches/iSAID_patches/test/images/P0017_2400_3200_0_800.png"

# Or use a sample from the dataset
USE_DATASET_SAMPLE = False  # Set to True to load from dataset
DATASET_ROOT = "data"  # Path to iSAID dataset
SAMPLE_INDEX = 0  # Which sample to visualize

# Model configuration (must match training)
NUM_CLASSES = 16  # iSAID has 15 classes + background

# Visualization settings
CONF_THRESHOLD = 0.5  # Confidence threshold for predictions
OUTPUT_DIR = "./gradcam_outputs_resnet"  # Where to save visualizations

print("Configuration set!")
print(f"  Backbone: {BACKBONE_CHOICE}")
print(f"  Checkpoint: {CHECKPOINT_PATH}")
print(f"  Output directory: {OUTPUT_DIR}")

## 3. Load Model and Create Pipeline

In [None]:
def create_maskrcnn_resnet(num_classes, backbone_type="resnet50", pretrained_coco=False):
    """
    Create Mask R-CNN with pretrained ResNet backbone.

    Args:
        num_classes: Number of classes (including background)
        backbone_type: "resnet50", "resnet50_v2", or "resnet101"
        pretrained_coco: Whether to use COCO pretrained weights

    Returns:
        Mask R-CNN model
    """
    if backbone_type == "resnet50":
        if pretrained_coco:
            weights = MaskRCNN_ResNet50_FPN_Weights.COCO_V1
            model = maskrcnn_resnet50_fpn(weights=weights)
        else:
            model = maskrcnn_resnet50_fpn(
                weights=None, weights_backbone="IMAGENET1K_V1"
            )

    elif backbone_type == "resnet50_v2":
        if pretrained_coco:
            weights = MaskRCNN_ResNet50_FPN_V2_Weights.COCO_V1
            model = maskrcnn_resnet50_fpn_v2(weights=weights)
        else:
            model = maskrcnn_resnet50_fpn_v2(
                weights=None, weights_backbone="IMAGENET1K_V1"
            )

    elif backbone_type == "resnet101":
        from torchvision.models.detection.backbone_utils import resnet_fpn_backbone
        from torchvision.models.detection import MaskRCNN
        from torchvision.models import ResNet101_Weights

        backbone = resnet_fpn_backbone(
            backbone_name="resnet101",
            weights=ResNet101_Weights.IMAGENET1K_V1,
            trainable_layers=5,
        )
        model = MaskRCNN(backbone, num_classes=num_classes)
        return model

    else:
        raise ValueError(f"Unknown backbone type: {backbone_type}")

    # Replace heads for our num_classes
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

    in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
    hidden_layer = 256
    model.roi_heads.mask_predictor = MaskRCNNPredictor(
        in_features_mask, hidden_layer, num_classes
    )

    return model


# Create model
print(f"Creating Mask R-CNN with {BACKBONE_CHOICE} backbone...")
model = create_maskrcnn_resnet(
    num_classes=NUM_CLASSES,
    backbone_type=BACKBONE_CHOICE,
    pretrained_coco=False,  # We'll load trained weights
)

# Print model summary
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"\nModel architecture:")
print(f"  - Backbone: {BACKBONE_CHOICE} with FPN")
print(f"  - Number of classes: {NUM_CLASSES}")
print(f"  - Total parameters: {total_params:,}")
print(f"  - Trainable parameters: {trainable_params:,}")

In [None]:
# Load checkpoint
checkpoint_path = PROJECT_ROOT / CHECKPOINT_PATH / "best_model.pth"

if checkpoint_path.exists():
    checkpoint = torch.load(str(checkpoint_path), map_location=device)
    if 'model_state_dict' in checkpoint:
        model.load_state_dict(checkpoint['model_state_dict'])
    else:
        model.load_state_dict(checkpoint)
    print("Model weights loaded successfully!")
else:
    print(f"Checkpoint not found at: {checkpoint_path}")
    print("Using randomly initialized weights for demonstration.")
    print("\nTo use trained weights, update CHECKPOINT_PATH in the configuration cell.")

model = model.to(device)
model.eval()
print(f"Model moved to {device} and set to eval mode")

## 4. Load Test Image

In [None]:
# Load image
if USE_DATASET_SAMPLE:
    try:
        from datasets.isaid_dataset import get_isaid_dataset
        from training.transforms import get_transform
        
        val_dataset = get_isaid_dataset(
            root=str(PROJECT_ROOT / DATASET_ROOT),
            split='val',
            transforms=get_transform(train=False),
        )
        
        image_tensor, target = val_dataset[SAMPLE_INDEX]
        image_np = denormalize_image(image_tensor)
        
        print(f"Loaded sample {SAMPLE_INDEX} from validation set")
        print(f"  Image shape: {image_tensor.shape}")
        print(f"  Number of GT objects: {len(target['boxes'])}")
        
        gt_classes = [ISAID_CLASS_LABELS[l.item()] for l in target['labels']]
        print(f"  GT classes: {gt_classes}")
        
    except Exception as e:
        print(f"Could not load from dataset: {e}")
        print("Falling back to image path...")
        USE_DATASET_SAMPLE = False

if not USE_DATASET_SAMPLE:
    image_path = Path(IMAGE_PATH)
    
    if image_path.exists():
        image_np = np.array(Image.open(image_path).convert('RGB'))
        print(f"Loaded image from {image_path}")
        print(f"  Image shape: {image_np.shape}")
    else:
        print(f"Image not found at {image_path}")
        print("Creating a dummy test image...")
        image_np = np.random.randint(0, 255, (800, 800, 3), dtype=np.uint8)

# Display the input image
plt.figure(figsize=(10, 10))
plt.imshow(image_np)
plt.title('Input Image', fontsize=14)
plt.axis('off')
plt.show()

---

# Stage-by-Stage Visualization

Now we'll visualize each stage of the inference pipeline step by step.

## Stage 1: Preprocessing & Feature Extraction Setup

In [None]:
import torchvision.transforms.functional as F

# Preprocess image for ResNet (standard ImageNet normalization)
def preprocess_image(image_np):
    """Preprocess image for inference."""
    # Convert to tensor and normalize
    image_tensor = F.to_tensor(image_np)
    # ImageNet normalization
    mean = torch.tensor([0.485, 0.456, 0.406]).view(3, 1, 1)
    std = torch.tensor([0.229, 0.224, 0.225]).view(3, 1, 1)
    image_tensor = (image_tensor - mean) / std
    return image_tensor

# Store extracted features using hooks
extracted_features = {}
fpn_features = {}
hooks = []

def make_hook(name):
    def hook(module, input, output):
        if isinstance(output, dict):
            for k, v in output.items():
                extracted_features[f"{name}_{k}"] = v.detach()
        else:
            extracted_features[name] = output.detach() if isinstance(output, torch.Tensor) else output
    return hook

# Register hooks for ResNet backbone
# Backbone layers
hooks.append(model.backbone.body.layer1.register_forward_hook(make_hook('resnet_layer1')))
hooks.append(model.backbone.body.layer2.register_forward_hook(make_hook('resnet_layer2')))
hooks.append(model.backbone.body.layer3.register_forward_hook(make_hook('resnet_layer3')))
hooks.append(model.backbone.body.layer4.register_forward_hook(make_hook('resnet_layer4')))

# FPN layers
def fpn_hook(module, input, output):
    if isinstance(output, dict):
        for k, v in output.items():
            fpn_features[k] = v.detach()
hooks.append(model.backbone.fpn.register_forward_hook(fpn_hook))

# Box head
hooks.append(model.roi_heads.box_head.register_forward_hook(make_hook('box_head')))

print("Registered forward hooks for feature extraction")

In [None]:
# Run inference
print("Running inference with feature extraction hooks...")

image_tensor = preprocess_image(image_np).to(device)

with torch.no_grad():
    predictions = model([image_tensor])[0]

# Move predictions to CPU
predictions = {k: v.cpu() if isinstance(v, torch.Tensor) else v for k, v in predictions.items()}

print(f"\nâœ“ Inference complete!")
print(f"   Detections found: {len(predictions['boxes'])}")
print(f"   Extracted features: {len(extracted_features)} tensors")
print(f"   FPN levels: {list(fpn_features.keys())}")

In [None]:
# List all extracted features
print("\nðŸ“Š Extracted Features Summary:")
print("=" * 60)

for name, tensor in sorted(extracted_features.items()):
    if isinstance(tensor, torch.Tensor):
        shape = list(tensor.shape)
        print(f"  {name:40s} -> {shape}")

print("\nðŸ“Š FPN Features:")
for name, tensor in sorted(fpn_features.items()):
    if isinstance(tensor, torch.Tensor):
        shape = list(tensor.shape)
        print(f"  {name:40s} -> {shape}")

## Stage 2: ResNet Backbone Features

In [None]:
# Visualize ResNet backbone stages
backbone_stages = {
    'resnet_layer1': 'Layer 1 (C2)',
    'resnet_layer2': 'Layer 2 (C3)', 
    'resnet_layer3': 'Layer 3 (C4)',
    'resnet_layer4': 'Layer 4 (C5)',
}

fig, axes = plt.subplots(2, 4, figsize=(20, 10))
axes = axes.flatten()

ax_idx = 0
for stage_key, stage_name in backbone_stages.items():
    if stage_key in extracted_features:
        feat = extracted_features[stage_key]
        if feat.dim() == 4:
            feat = feat[0]
        
        # Mean activation
        feat_mean = feat.mean(dim=0).cpu().numpy()
        feat_mean = (feat_mean - feat_mean.min()) / (feat_mean.max() - feat_mean.min() + 1e-8)
        
        axes[ax_idx].imshow(feat_mean, cmap='viridis')
        axes[ax_idx].set_title(f'{stage_name}\nMean ({feat.shape[0]} channels)', fontsize=10)
        axes[ax_idx].axis('off')
        ax_idx += 1
        
        # Max activation
        feat_max = feat.max(dim=0)[0].cpu().numpy()
        feat_max = (feat_max - feat_max.min()) / (feat_max.max() - feat_max.min() + 1e-8)
        
        axes[ax_idx].imshow(feat_max, cmap='hot')
        axes[ax_idx].set_title(f'{stage_name}\nMax Activation', fontsize=10)
        axes[ax_idx].axis('off')
        ax_idx += 1

# Hide unused axes
for i in range(ax_idx, len(axes)):
    axes[i].axis('off')

fig.suptitle('Stage 2: ResNet Backbone Feature Maps', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

## Stage 3: Feature Pyramid Network (FPN)

In [None]:
# Visualize FPN multi-scale features
fig, axes = plt.subplots(2, 4, figsize=(20, 10))

fpn_levels = ['0', '1', '2', '3']  # ResNet FPN uses numeric keys
level_names = ['P2', 'P3', 'P4', 'P5']

for idx, (level, name) in enumerate(zip(fpn_levels, level_names)):
    if level in fpn_features:
        feat = fpn_features[level]
        if feat.dim() == 4:
            feat = feat[0]
        
        # Mean activation
        feat_mean = feat.mean(dim=0).cpu().numpy()
        feat_mean = (feat_mean - feat_mean.min()) / (feat_mean.max() - feat_mean.min() + 1e-8)
        
        # Top row: Raw feature map
        axes[0, idx].imshow(feat_mean, cmap='plasma')
        axes[0, idx].set_title(f'{name}: {feat.shape[1]}x{feat.shape[2]}\n({feat.shape[0]} channels)', fontsize=10)
        axes[0, idx].axis('off')
        
        # Bottom row: Overlay on image
        feat_resized = cv2.resize(feat_mean, (image_np.shape[1], image_np.shape[0]))
        overlay = overlay_heatmap(image_np, feat_resized, alpha=0.5, colormap='plasma')
        
        axes[1, idx].imshow(overlay)
        axes[1, idx].set_title(f'{name} Overlay on Image', fontsize=10)
        axes[1, idx].axis('off')

fig.suptitle('Stage 3: FPN Feature Maps at Each Scale', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

## Stage 4: RPN Proposals (Region Proposal Network)

In [None]:
# Visualize detected boxes as RPN-like proposals
import matplotlib.patches as patches

fig, axes = plt.subplots(1, 2, figsize=(16, 8))

boxes = predictions['boxes'].cpu().numpy()
scores = predictions['scores'].cpu().numpy()
labels = predictions['labels'].cpu().numpy()

# Left: All predictions colored by confidence
axes[0].imshow(image_np)
cmap = plt.get_cmap('RdYlGn')

for box, score in zip(boxes, scores):
    x1, y1, x2, y2 = box
    color = cmap(score)
    rect = patches.Rectangle(
        (x1, y1), x2-x1, y2-y1,
        linewidth=1, edgecolor=color, facecolor='none', alpha=0.7
    )
    axes[0].add_patch(rect)

axes[0].set_title(f'All Predictions ({len(boxes)} boxes)\nColored by confidence (red=low, green=high)', fontsize=12)
axes[0].axis('off')

# Right: High confidence predictions only
axes[1].imshow(image_np)

high_conf = scores >= CONF_THRESHOLD
for box, score, label in zip(boxes[high_conf], scores[high_conf], labels[high_conf]):
    x1, y1, x2, y2 = box
    color = np.array(ISAID_COLORS.get(label, [255, 255, 255])) / 255.0
    
    rect = patches.Rectangle(
        (x1, y1), x2-x1, y2-y1,
        linewidth=2, edgecolor=color, facecolor='none'
    )
    axes[1].add_patch(rect)
    axes[1].text(x1, y1-3, f'{ISAID_CLASS_LABELS[label]}: {score:.2f}',
                 fontsize=8, color='white', backgroundcolor=color)

axes[1].set_title(f'High Confidence (>{CONF_THRESHOLD}) Predictions ({high_conf.sum()} boxes)', fontsize=12)
axes[1].axis('off')

fig.suptitle('Stage 4: Region Proposals and NMS Results', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

## Stage 5: RoI Align Sampling Grid

In [None]:
# Visualize RoI Align sampling grids
if len(predictions['boxes']) > 0:
    top_k = min(4, len(predictions['boxes']))
    top_boxes = predictions['boxes'][:top_k]
    
    fig = visualize_roi_align_grid(
        image_np,
        top_boxes,
        output_size=7,
        sampling_ratio=2,
        max_boxes=top_k,
        figsize=(20, 6)
    )
    fig.suptitle('Stage 5: RoI Align - Bilinear Sampling Grid Visualization', fontsize=14, fontweight='bold')
    plt.show()
else:
    print("No detections to visualize RoI Align grid")

## Stage 6: Box Head Analysis

In [None]:
# Visualize box head features
if 'box_head' in extracted_features and len(predictions['boxes']) > 0:
    box_feats = extracted_features['box_head']
    
    # Box head outputs flattened features per RoI
    n_rois = min(8, box_feats.shape[0])
    
    fig, axes = plt.subplots(2, 4, figsize=(16, 8))
    axes = axes.flatten()
    
    for i in range(n_rois):
        feat = box_feats[i].cpu().numpy()
        
        # Reshape to 2D for visualization if needed
        if feat.ndim == 1:
            side = int(np.sqrt(len(feat)))
            if side * side == len(feat):
                feat_2d = feat.reshape(side, side)
            else:
                feat_2d = feat[:256].reshape(16, 16) if len(feat) >= 256 else feat.reshape(-1, 1)
        else:
            feat_2d = feat
        
        axes[i].imshow(feat_2d, cmap='viridis')
        if i < len(predictions['labels']):
            label = predictions['labels'][i].item()
            score = predictions['scores'][i].item()
            axes[i].set_title(f'RoI {i+1}: {ISAID_CLASS_LABELS[label]}\nConf: {score:.2f}', fontsize=10)
        else:
            axes[i].set_title(f'RoI {i+1}', fontsize=10)
        axes[i].axis('off')
    
    for i in range(n_rois, 8):
        axes[i].axis('off')
    
    fig.suptitle('Stage 6: Box Head Features per RoI', fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.show()
else:
    print("Box head features not captured or no detections")

## Stage 7: Mask Predictions

In [None]:
# Visualize individual mask predictions
if 'masks' in predictions and len(predictions['masks']) > 0:
    masks = predictions['masks'].cpu().numpy()
    
    n_masks = min(8, len(masks))
    fig, axes = plt.subplots(2, n_masks, figsize=(4*n_masks, 8))
    if n_masks == 1:
        axes = axes.reshape(2, 1)
    
    for i in range(n_masks):
        mask = masks[i]
        if mask.ndim == 3:
            mask = mask[0]  # Remove channel dim
        
        label = predictions['labels'][i].item()
        score = predictions['scores'][i].item()
        box = predictions['boxes'][i].cpu().numpy()
        
        # Top: Raw mask probability
        axes[0, i].imshow(mask, cmap='hot', vmin=0, vmax=1)
        axes[0, i].set_title(f'{ISAID_CLASS_LABELS[label]}\nP={score:.2f}', fontsize=10)
        axes[0, i].axis('off')
        
        # Bottom: Binary mask on image crop
        x1, y1, x2, y2 = box.astype(int)
        crop = image_np[max(0,y1):y2, max(0,x1):x2].copy()
        mask_binary = (mask > 0.5).astype(np.uint8)
        mask_crop = mask_binary[max(0,y1):y2, max(0,x1):x2]
        
        if crop.size > 0 and mask_crop.size > 0:
            if mask_crop.shape != crop.shape[:2]:
                mask_crop = cv2.resize(mask_crop, (crop.shape[1], crop.shape[0]))
            
            color = np.array(ISAID_COLORS.get(label, [255, 0, 0]))
            overlay = crop.copy().astype(np.float32)
            overlay[mask_crop > 0] = overlay[mask_crop > 0] * 0.5 + color * 0.5
            overlay = np.clip(overlay, 0, 255).astype(np.uint8)
            
            axes[1, i].imshow(overlay)
        axes[1, i].set_title('Masked Region', fontsize=10)
        axes[1, i].axis('off')
    
    fig.suptitle('Stage 7: Individual Mask Predictions', fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.show()
else:
    print("No mask predictions available")

## Stage 8: Grad-CAM Heatmaps

In [None]:
# Generate Grad-CAM style heatmaps from different FPN levels
fig, axes = plt.subplots(2, 4, figsize=(20, 10))

levels = ['0', '1', '2', '3']
level_names = ['P2', 'P3', 'P4', 'P5']

for idx, (level, name) in enumerate(zip(levels, level_names)):
    if level in fpn_features:
        feat = fpn_features[level]
        if feat.dim() == 4:
            feat = feat[0]
        
        # Compute activation-based heatmap (Grad-CAM approximation)
        heatmap = feat.mean(dim=0).cpu().numpy()
        heatmap = (heatmap - heatmap.min()) / (heatmap.max() - heatmap.min() + 1e-8)
        heatmap = cv2.resize(heatmap, (image_np.shape[1], image_np.shape[0]))
        
        # Top row: Heatmap only
        axes[0, idx].imshow(heatmap, cmap='jet')
        axes[0, idx].set_title(f'{name} Activation Map', fontsize=12)
        axes[0, idx].axis('off')
        
        # Bottom row: Overlay on image
        overlay = overlay_heatmap(image_np, heatmap, alpha=0.5, colormap='jet')
        axes[1, idx].imshow(overlay)
        axes[1, idx].set_title(f'{name} Overlay', fontsize=12)
        axes[1, idx].axis('off')

fig.suptitle('Stage 8: Grad-CAM Heatmaps - Where the Model Looks at Each FPN Level', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
# Combined multi-scale heatmap
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

# Original image
axes[0].imshow(image_np)
axes[0].set_title('Original Image', fontsize=14)
axes[0].axis('off')

# Combine heatmaps from multiple scales
combined_heatmap = np.zeros((image_np.shape[0], image_np.shape[1]), dtype=np.float32)
weights = {'0': 0.1, '1': 0.2, '2': 0.3, '3': 0.4}  # Higher weight for deeper features

for level, weight in weights.items():
    if level in fpn_features:
        feat = fpn_features[level]
        if feat.dim() == 4:
            feat = feat[0]
        
        heatmap = feat.mean(dim=0).cpu().numpy()
        heatmap = (heatmap - heatmap.min()) / (heatmap.max() - heatmap.min() + 1e-8)
        heatmap = cv2.resize(heatmap, (image_np.shape[1], image_np.shape[0]))
        combined_heatmap += weight * heatmap

# Normalize combined
combined_heatmap = (combined_heatmap - combined_heatmap.min()) / (combined_heatmap.max() - combined_heatmap.min() + 1e-8)

# Multi-scale heatmap
axes[1].imshow(combined_heatmap, cmap='jet')
axes[1].set_title('Multi-Scale Combined Heatmap', fontsize=14)
axes[1].axis('off')

# Overlay
overlay = overlay_heatmap(image_np, combined_heatmap, alpha=0.5, colormap='jet')
axes[2].imshow(overlay)
axes[2].set_title('Combined Heatmap Overlay', fontsize=14)
axes[2].axis('off')

fig.suptitle('Multi-Scale Grad-CAM: Combined Feature Importance', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

## Stage 9: Final Predictions with Full Visualization

In [None]:
# Final comprehensive visualization
fig = visualize_final_predictions(
    image_np,
    predictions,
    gradcam_heatmap=combined_heatmap,
    class_labels=ISAID_CLASS_LABELS,
    conf_threshold=CONF_THRESHOLD,
    figsize=(18, 6)
)
fig.suptitle('Stage 9: Final Predictions with Masks and Grad-CAM', fontsize=14, fontweight='bold')
plt.show()

In [None]:
# Print detection summary
print("\n" + "="*60)
print("DETECTION SUMMARY")
print("="*60)

boxes = predictions['boxes'].cpu().numpy()
labels = predictions['labels'].cpu().numpy()
scores = predictions['scores'].cpu().numpy()

high_conf = scores >= CONF_THRESHOLD
print(f"\nTotal detections: {len(boxes)}")
print(f"High confidence (>{CONF_THRESHOLD}): {high_conf.sum()}")

print(f"\n{'#':<4} {'Class':<20} {'Confidence':<12} {'Box (x1,y1,x2,y2)'}")
print("-" * 70)

for i, (box, label, score) in enumerate(zip(boxes[high_conf], labels[high_conf], scores[high_conf])):
    class_name = ISAID_CLASS_LABELS.get(label, f'Class {label}')
    box_str = f"({box[0]:.0f}, {box[1]:.0f}, {box[2]:.0f}, {box[3]:.0f})"
    print(f"{i+1:<4} {class_name:<20} {score:<12.4f} {box_str}")

# Class distribution
if high_conf.sum() > 0:
    print("\nðŸ“Š Class Distribution:")
    unique_labels, counts = np.unique(labels[high_conf], return_counts=True)
    for label, count in zip(unique_labels, counts):
        class_name = ISAID_CLASS_LABELS.get(label, f'Class {label}')
        print(f"   {class_name}: {count}")

In [None]:
# Clean up hooks
for hook in hooks:
    hook.remove()
print("Hooks removed")

---

## Summary

This notebook visualized the complete inference pipeline of Mask R-CNN with **ResNet backbone**:

| Stage | Component | What We Saw |
|-------|-----------|-------------|
| 1 | Input | Original image preprocessing |
| 2 | Backbone | ResNet feature extraction (C2-C5) |
| 3 | FPN | Multi-scale feature fusion (P2-P5) |
| 4 | RPN | Region proposals generation |
| 5 | RoI Align | Bilinear sampling grid for each proposal |
| 6 | Box Head | Classification and bbox regression |
| 7 | Mask Head | Instance segmentation prediction |
| 8 | Grad-CAM | Where the model "looks" to make predictions |
| 9 | Output | Final boxes, classes, masks, and confidence scores |

**Key Differences from Custom EfficientNet + CBAM:**
- No CBAM attention modules (standard ResNet blocks)
- Deeper backbone with more parameters
- Standard FPN without additional attention
- Pretrained on COCO (transfer learning)