# Model Understanding with Captum

## Introduction

Captum’s approach to model interpretability is in terms of attributions. There are three kinds of attributions available in Captum:

- Feature Attribution - seeks to explain a particular output in terms of features of the input that generated it. Explaining whether a movie review was positive or negative in terms of certain words in the review is an example of feature attribution.  
    
    “Which features of the input were most important for this prediction?”
        - In NLP: Which words made a sentence classified as positive or negative?
        - In vision: Which pixels most influenced the model to recognize a cat?

- Layer Attribution - examines the activity of a model’s hidden layer subsequent to a particular input. Examining the spatially-mapped output of a convolutional layer in response to an input image in an example of layer attribution. 
    - Goal: Explain what happens inside the network, at the layer level.
    - Question: “Which features or regions in this layer respond most to a given input?”

    Example user cases:
        - Visualizing activations of a CNN layer for an image
        - Understanding how attention layers behave in a transformer

- Neuron Attribution - is analagous to layer attribution, but focuses on the activity of a single neuron.
    - Goal: Go even deeper — analyze the contribution to a single neuron.
    - Question: “How important is this input for activating neuron X in layer Y?”

    Use cases:
    - Understanding what specific neurons detect (edges, textures, or semantic features) 
    - Debugging neuron saturation or dead neurons

In this interactive notebook, we’ll look at Feature Attribution and Layer Attribution.

Each of the three attribution types has multiple attribution algorithms associated with it. Many attribution algorithms fall into two broad categories:

- Gradient-based algorithms - calculate the backward gradients of a model output, layer output, or neuron activation with respect to the input. Integrated Gradients (for features), Layer Gradient * Activation, and Neuron Conductance are all gradient-based algorithms.

- Perturbation-based algorithms - examine the changes in the output of a model, layer, or neuron in response to changes in the input. The input perturbations may be directed or random. Occlusion, Feature Ablation, and Feature Permutation are all perturbation-based algorithms.

We’ll be examining algorithms of both types below.


Note:  pip install captum Flask-Compress - it uninstalled numpy 2.3.2 an replaced it with 1.26.4

# A First Example
To start, let’s take a simple, visual example. We’ll start with a ResNet model pretrained on the ImageNet dataset. We’ll get a test input, and use different Feature Attribution algorithms to examine how the input images affect the output, and see a helpful visualization of this input attribution map for some test images.

In [1]:
import torch
import torch.nn.functional as F
import torchvision.transforms as transforms
import torchvision.models as models

import captum
from captum.attr import IntegratedGradients, Occlusion, LayerGradCam, LayerAttribution
from captum.attr import visualization as viz

import os, sys
import json

import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap

Now we’ll use the TorchVision model library to download a pretrained ResNet. Since we’re not training, we’ll place it in evaluation mode for now.

In [3]:
model = models.resnet18(weights='IMAGENET1K_V1')
model = model.eval()

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to C:\Users\marvi/.cache\torch\hub\checkpoints\resnet18-f37072fd.pth


100%|██████████| 44.7M/44.7M [00:01<00:00, 43.8MB/s]
