### Module 3: Explaining Models (Concepts)

Let's see how concept activation vectors can be used to explain the Google LeNet
image classification model. Specifically, we'll follow the `captum` package
tutorial to see whether the model uses the concept of "stripes" to classifying
images into the "zebra" class. To define the concepts, we download reference
images from the Broden catalog. Both the concepts and the images to classify are
stored in [this zip file](https://drive.google.com/file/d/18dDYwSH-OiovV8vDfmu8eMKIVFuweOy9/view?usp=sharing),
which can be unzipped using

```
tar -zxvf data.tar.gz
```

Make sure the results are stored in a directory `data/concepts/` relative to
this notebook (or be prepared to change the paths below). The block below loads
libraries which are already installed in the `iisa312` conda environment, which
can be setup by downloading this [environment.yaml file](https://github.com/krisrs1128/talks/blob/master/2024/20241230/examples/environment-iisa312.yaml) and running

```
conda env create -yf environment-iisa312.yaml
```

In [1]:
from pathlib import Path
from torchvision import transforms
import PIL
import captum as cp
import captum.concept._utils.data_iterator as di
import glob
import torch
import torchvision

We'll need the helper functions below. These are doing some routine image loading and transformation steps and are not specific to concept models in particular.

In [None]:
def load_tensor(filename):
    """Load a single image file as a tensor"""
    img = PIL.Image.open(filename).convert("RGB")
    return transform(img)

def load_tensors(class_name, root_path="data/concepts/", transform=True):
    """Load all the images belonging to a class as a tensor. This assumes that
    each class gets a subdirectory of root_path."""
    path = Path(root_path) / class_name
    filenames = glob.glob(str(path / '*.jpg'))

    tensors = []
    for filename in filenames:
        img = PIL.Image.open(filename).convert('RGB')
        tensors.append(transform(img) if transform else img)
    return tensors

def transform(img):
    """Transform an image into a form that can be used for classification"""
    return transforms.Compose(
        [
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize(
                mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
            ),
        ]
    )(img)

`captum` defines a `Concept` class. This is basically a glorified data loader,
only really different from ordinary data loaders because it is associated with a
human interpretable label, `name`. The concept objects created by the factory
below will loop over the images within the `concept_path/name` subdirectory.

In [None]:
def assemble_concept(name, id, concept_path):
    dataset = di.CustomIterableDataset(load_tensor, f"{str(concept_path / name)}/")
    concept_iter = di.dataset_to_dataloader(dataset)
    return cp.concept.Concept(id=id, name=name, data_iter=concept_iter)

We now define 5 different instances of this Concept object. We are mainly interested in the `striped` concept, because this seems like it should be related to the zebra classification. All the other concepts are a form of control. `dotted` is a control where there is some systematic structure in the concept, but we don't think it should be related to zebra classification at all. The other three are random subsets of imagenet which don't share any systematic structure. These are a somewhat more distant control.

In [None]:
concept_path = Path("data/concepts/")
stripes_concept = assemble_concept("striped", 0, concept_path=concept_path)
dotted_concept = assemble_concept("dotted", 1, concept_path=concept_path)
random0_concept = assemble_concept("random500_0", 2, concept_path=concept_path)
random1_concept = assemble_concept("random500_1", 3, concept_path=concept_path)
random2_concept = assemble_concept("random500_2", 4, concept_path=concept_path)

We will explain the Google LeNet model loaded in the block below. We need to
access activations, but we won't be doing any training, so we call
`model.eval()`.

In [5]:
model = torchvision.models.googlenet(pretrained=True)
model = model.eval()

The TCAV implementation in captum has an object-oriented design. We first define a generic TCAV explainer associated with a model and set of layers of interest (in this cass, a few of the `inception4` layers). We can then apply this object to arbitrary concept objects and response classes. Since the object is already associated with a model/layers, we no longer need to specify the model or layers in each call -- we can just ask whether a concept is related to a class, and the TCAV explainer object will know to look it up in the correct model. Notice that we are using integrated gradients to define the class' sensitivity to perturbations to the model activations. This is different from the original TCAV paper, which just ordinary gradients.

In [None]:
layers=['inception4c', 'inception4d', 'inception4e']

tcav = cp.concept.TCAV(
    model=model, 
    layers=layers,
    layer_attr_method=cp.attr.LayerIntegratedGradients(model, None, multiply_by_inputs=False)
)

Finally, we can apply this object to see whether stripes are related to zebra
classification.  Our first version defines the concept direction by classifying
stripes images vs. random imagenet images. Here, 340 is the label for the zebra
class in imagenet.  `n_steps` refers to the number of steps in the integrated
gradients approximation.

In [7]:
classification_data = [[stripes_concept, random0_concept]]
zebra_images = load_tensors('zebra', transform=False)
zebra_tensors = torch.stack([transform(img) for img in zebra_images])

response_id = 340
tcav_scores = tcav.interpret(
    inputs=zebra_tensors, 
    experimental_sets=classification_data,
    n_steps=5,
    target=response_id
)

The `sign_count` output below is the test statistic $T_{k}$ introduced in our notes. Values close to one mean that the class gradients $\nabla y_{k}\left(h\left(\mathbf{x}\right)\right)$ are almost always in the same half space as the concept activation direction $\mathbf{v}$. We see that the striped class is close to 1 while the random class is always less than 0.15.

In [8]:
tcav_scores

defaultdict(<function captum.concept._core.tcav.TCAV.interpret.<locals>.<lambda>()>,
            {'0-2': defaultdict(None,
                         {'inception4c': {'sign_count': tensor([0.9231, 0.0769]),
                           'magnitude': tensor([ 0.6405, -0.6405])},
                          'inception4d': {'sign_count': tensor([1., 0.]),
                           'magnitude': tensor([ 1.2210, -1.2210])},
                          'inception4e': {'sign_count': tensor([0.8846, 0.1154]),
                           'magnitude': tensor([ 0.3268, -0.3268])}})})

The blocks below repeat this analysis with the dotted rather than random control. Again, we see that the striped class defines as concept activation direction $v$ that is much more relevant to zebra class prediction.

In [9]:
classification_data = [[stripes_concept, dotted_concept]]

tcav_scores = tcav.interpret(
    inputs=zebra_tensors, 
    experimental_sets=classification_data,
    n_steps=5,
    target=response_id
)

In [10]:
tcav_scores

defaultdict(<function captum.concept._core.tcav.TCAV.interpret.<locals>.<lambda>()>,
            {'0-1': defaultdict(None,
                         {'inception4c': {'sign_count': tensor([1., 0.]),
                           'magnitude': tensor([ 1.4850, -1.4850])},
                          'inception4d': {'sign_count': tensor([1., 0.]),
                           'magnitude': tensor([ 1.3086, -1.3086])},
                          'inception4e': {'sign_count': tensor([1., 0.]),
                           'magnitude': tensor([ 0.6868, -0.6868])}})})