# Testing with Concept Activation Vectors (TCAV) - A step by step tutorial

Testing with Concept Activation Vectors (TCAV) is a concept-based interpretability method introduced by [Kim et al. (2018)](https://arxiv.org/pdf/1711.11279.pdf). It quantitatevely measures how much a pre-defined, human-understandable concept might be influencing the predictions made by a trained deep neural network (DNN).

The authors of the paper provide one illustrative example of such question: How does the concept of _stripes_ guides the DNN to predict that an image belongs to the _zebra_ class?


## 1) Define the concept and class of interest

We start by manually selecting a set of example images containing the concept of interest. There images can reflect the concept in different ways. In the case of stripes, for example, they can be pictures of the texture, or images containing striped objects. These images also don't need to be part of the training set of the DNN. 

In [1]:
concept = "striped"

Let's create a function for loading the images of interest:

In [5]:
from pathlib import Path
from PIL import Image

import torch
from torchvision import transforms

preprocessing = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

def get_images_input(images_path, transform):
    imgs_files = list(Path(images_path).iterdir())
    
    prepro_imgs = []
    for file in imgs_files:
        img = Image.open(file).convert("RGB")
        img_prepro = transform(img)
        img_unsq = img_prepro.unsqueeze(0)
        prepro_imgs.append(img_unsq)
    
    imgs_tensor = torch.cat(prepro_imgs)
    
    return imgs_tensor


Let's load the concept images and random images:

In [4]:
concept_images = get_images_input(f"/Users/martina.gonzales/data/tcav/image/concepts/{concept}", preprocessing)
random_images = get_images_input(f"/Users/martina.gonzales/data/tcav/image/concepts/random_0", preprocessing)

print(f"Shape of concept images input: {concept_images.shape}")
print(f"Shape of random images input: {concept_images.shape}")

FileNotFoundError: [Errno 2] No such file or directory: '/Users/martina.gonzales/data/tcav/image/concepts/striped'

In [None]:
zebra_images = get_images_input(f"/Users/martina.gonzales/data/tcav/image/imagenet/zebra", preprocessing)

## 2) Get the DNN's internal representations of the concept

The second step is to pass the images of the concept as well as the random images to the pre-trained DNN and extract their internal representations (a.k.a. activations). Usually, we only explore the activations of one or a couple of layers in the network.

For this example we will explore how the concepts activate `layer3` of a `resnet50` model:

In [None]:
from torchvision.models import resnet50

model = resnet50(pretrained=True)
model.eval();

layers = ["layer3"]

There are multiple ways we can extract the internal representations of a network.

Let's create the hooks for the activations:

In [None]:
def get_representation(mod, inp, output):
    output = output.detach()
    features.append(output)

for layer_name, layer in model.named_modules():
    if layer_name in layers:
        handle = layer.register_forward_hook(get_representation)

Let's obtain the activations and create the feature matrix:

In [None]:
feature_matrix = []

for images_input in [concept_images, random_images]:
    features = []
    out = model(images_input)
    features = torch.cat(features)
    features = features.reshape((features.shape[0], -1))
    feature_matrix.append(features)

feature_matrix = torch.cat(feature_matrix)

Let's very the shape of the feature matrix is the expected one:

In [None]:
feature_matrix.shape

Let's create a vector of class ids:

In [None]:
import numpy as np

class_ids = np.concatenate(
    (np.zeros(len(concept_images)), np.ones(len(random_images))),
    axis=0
)

## 3) compute Concept Activation Vectors (CAVs)

Create classifier:

In [None]:
from sklearn.linear_model import LogisticRegression

clf = LogisticRegression(C=0.01, random_state=0)

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(feature_matrix, class_ids, test_size=0.33)

clf.fit(X_train.detach().numpy(), y_train)

Let's inspect how accurate was the classifier in distinguishing these concepts:

In [None]:
score = clf.score(X_test, y_test)
print(score)

Finally, let's create our CAV vectors, These are the weights of the model:

In [None]:
cavs = torch.tensor(np.array([-1 * clf.coef_[0], clf.coef_[0]]))

Let's inspect the shape of the CAVs:

In [None]:
cavs.shape

## 4) compute directional derivatives

$$S_{C,\,k,\,l}\left(x\right)\,=\,\lim_{\epsilon\to 0}\frac{h_{l,k}\left(f_{l}\left(x\right)+\epsilon v_{C}^{l}\right)-h_{l,k}\left(f_{l}\left(x\right)\right)}{\epsilon}$$

> where:
>
> --> $h_{l,k}(x)$ is the logit for a data point $x$ for class $k$ (and l?)
>
> --> $f_l(x)$ is the activations for input x at layer $l$
>
> --> $v^l_C$  is a unit CAV vector for a concept C in layer $l$

$$S_{C,\,k,\,l} = \triangledown h_{l,k}(f_l(x))\cdot v^l_C$$

> the dot product of the gradient of the logit $h_{l,k}(x)$ at a point $f_l(x)$  with another tangent vector $v^l_C$  equals the directional derivative of $h_{l,k}(x)$ at $f_l(x)$ of the function along $v^l_C$

In [None]:
model = resnet50(pretrained=True)
model.eval();
# TODO: try handle.remove()

#with torch.autograd.set_grad_enabled(True):
def get_representation(mod, inp, output):
    activations.append(output)

model.layer3.register_forward_hook(get_representation)
model.fc.register_forward_hook(get_representation)

In [None]:
activations = []

out = model(zebra_images)

The variable `activations` should now be a list containing the activations for the layer of our choise, and the obtained logits for all classes:

In [None]:
print(f"Layer activations: {activations[0].shape}")
print(f"Logits: {activations[1].shape}")

But we only want to explore the effects on the logit corresponding to our class of interest:

In [None]:
zebra_id = 340

logits_class = activations[1][(slice(None), zebra_id)]
print(logits_class)

Let's prepare the inputs:

In [None]:
logits_class = torch.unbind(logits_class)
layer_activations = (activations[0],)

Let's now compute the gradients using the `autograd` functionality of pytorch:

In [None]:
grads = torch.autograd.grad(logits_class, layer_activations)

The result will be a torch tensor, of the same shape as the layers activations.

In [None]:
print(type(grads[0]))
print(grads[0].shape)

We can now obtain our directional derivatives. This is computed by taking the dot product between the gradients and the concept vector:

In [None]:
# Get tensor into right format for dot product
grads_flat = (
    torch.squeeze(grads[0].reshape(grads[0].shape[0], -1))
    .type(torch.float64)
)
grads_flat.shape

dir_derivative = torch.matmul(grads_flat, cavs[0])
dir_derivative

## test CAVs

$$TCAV_{Q_{C,k,l}}=\,\frac{\left|\left\{x\epsilon X_{k}:S_{C,k,l}\left(x\right)>0\right\}\right|}{\left|X_{k}\right|}$$

- scores the fraction of k-class inputs whose activation vector on layer l was positively influenced by the concept


In [None]:
tcav_score = torch.sum(dir_derivative > 0) / dir_derivative.shape[0]
print(f"TCAV score: {tcav_score}")

## Statistical testing

At least 500 times.

## Explore results

In [None]:
from torch.nn.functional import cosine_similarity

cos_sim = []
for img_idx in range(len(zebra_images)):
    img_activations = layer_activations[0][img_idx].reshape(-1).unsqueeze(dim=0)
    cos = cosine_similarity(img_activations, cavs[0].unsqueeze(dim=0))
    cos_sim.append(cos)

cos_sim = torch.tensor(cos_sim)

In [None]:
sorted_vals = torch.argsort(cos_sim)
min_vals = sorted_vals[:3]
max_vals = sorted_vals[-3:]

In [None]:
import matplotlib.pyplot as plt

for val in min_vals:
    plt.figure()
    plt.imshow(zebra_images[val].permute(1, 2, 0))

## Disadvantages

- Assumes linearity
- Manual annotation

-------
## Resources

- [TCAV with Captum](https://captum.ai/tutorials/TCAV_Image) tutorial.
- [Towards better understanding of gradient-based attribution methods for deep neural networks](https://arxiv.org/pdf/1711.06104.pdf) by Ancona et al. (2018), _ICLR_.