# Feature Attribution Image Generation

This notebook allows you to generate images where the feature attribution data can be visualized (overlayed over the original image).

The following image generation options are supported:

## __Single Images__ 
Out of a single instance of data (which contains 4 images of the fish) each of those images is turned into their own SEPERATE image.

## __Confusion Matrices__ 
NOTE: THIS REQUIRES YOU TO HAVE PERFORMED ATTRIBUTIONS AGAINST ALL CLASSES versus just a single class. Failure to do so results in the dimensions of the data being incompatible for this operation. Please refer to the "Feature Attribution Data Generation" notebook for how to do this. 

A Feature Attribution Confusion Matrix has the same underlying data instance's attributions against ALL possible classes visualized in one image. Each column is the class the attribution was done against (0, 1, or 2) while the row indicates the image (out of 4 from the underlying data) the attribution was done against.

## Inputs
The settings that control the notebook are determined by a set of variables (all expressed as capital letters, with underscoring used for spaces ex: `ORIGINAL_IMAGES_PATH`). The values of these variables can be changed prior to execution of the notebook.

### Required Arguments
The notebook requires the following information to be provided. 

## File Paths
* `ORIGINAL_IMAGES_PATH` (string) - A path to the original images saved as a numpy array `(*.npy)`
* `ATTRIBTUION_DATA_PATH` (string) - A path to the attribution data generated by the "Feature Attribution Data Generation" Notebook, saved as a PyTorch tensor `(*.pt)`
* `TRUE_LABELS_PATH` (string)  - A path to the True Labels of the dataset, saved as a numpy array `(*.npy)`
* `PREDICTED_LABELS_PATH` (string) - A path to the Predicted Labels of the dataset, saved as a numpy array `(*.npy)`
* `FINAL_IMAGES_FOLDER_PATH` (string) - A path to an EMPTY FOLDER where you would like the images to be stored
* `CONFUSION_MATRIX_ENABLED` (boolean) - Whether you would like single images or confusion matrices generated
* `ATTRIBUTION_DONE_AGAINST` (Enum, refer to `AttributionDoneAgainst` class options `TRUE_LABELS` and `PREDICTED_LABELS`). - This only applies when `CONFUSION_MARTRIX_ENABLED = False`, determines if the title of the image should indicate the attribution was done against a true label or a predicted label 

### Optional Arguments
The notebook has default options for these but they can be tweaked for custom results. Note that the following variables are passed DIRECTLY into a call to the `visualize_image_attr` function that Captum provides, meaning it should align with the information found in the Captum documentation: https://captum.ai/api/utilities.html#visualization. The most relevant parts have been summarized/taken straight from the documentation and provided below.

* `METHOD` (string) (default value: "blended_heat_map") - the method for visualization attribution. They are:
  * "heat_map" - display a heatmap of attributions
  * "blended_heat_map" - put the heatmap over a greyscale version of the image
  * "original_image" - Just show the original image
  * "masked_image" - mask image (pixel-wise multiply) by normalized attribution values
  * "alpha_scaling" - set the alpha channel of each pixel to normalized attribution value
* `SIGN` (string) (default value: "all") - Determines which attribution values to show. The options for this method are:
  * "positive" - only display positive attributions
  * "absolute_value" - display the absolute value of all attributions
  * "negative" - only display negative attributions
  * "all" - display both positive and negative attributions. Note that if you set `METHOD` to "masked_image" or "alpha_scaling" the "all" option is NOT supported.
* `ALPHA_OVERLAY` (default value: 0.8) (float between 0 and 1) - controls the "brightness" or rather how prominently the zebrafish appears in the background. Higher Alpha values correspond to fainter background images.
* `SHOW_COLORBAR` (default value: True) (boolean) - Determines if a colorbar is added that shows a mapping between the color on the image (red/green) and its associated attribution value.

## Outputs
* Confusion Matrices OR individual images at the location specified by `FINAL_IMAGES_FOLDER_PATH`. For a dataset with 285 instances, expect 285 x 4 individual images or 285 images of a confusion matrix. 
  * If individual images are generated, the file will be named along the lines of: `Index: (main instance index), (subindex).png`. The main instance index allows you to locate the exact piece of data in the PyTorch tensor/dataset while the subindex refers to the specific image in the instance
  * If confusion matrices are generated, the file will only posses the main index as the file name: `Index:(main instance index).png`

In [1]:
# Load Dependencies
import torch
import numpy as np
import matplotlib.pyplot as plt

!pip install -q captum
!pip install tqdm

# tqdm allows you to see the progress of the generation process as a 
# "loading" bar, with "it/s" or iterations per second, giving you a rough
# idea of how long the process should take as its going
from tqdm.auto import tqdm
from tqdm.contrib import tenumerate
from captum.attr import visualization as viz

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
# Fixed options for controlling the title of individual images,
# please refer to the Cell below for more information

from enum import Enum

class AttributionDoneAgainst(Enum):
  TRUE_LABELS = 1
  PREDICTED_LABELS = 2

In [3]:
# Control Variables 

## File Paths

### Path to the original images used for feature attribution, must be a .npy file
ORIGINAL_IMAGES_PATH = "/content/drive/Shareddrives/Exploding Gradients/X_cropped_b.npy"
### Path to the attribution data generated from the "Feature Attribution Data Generation" Notebook,
### This should be a ".pt" or saved PyTorch Tensor file
ATTRIBUTION_DATA_PATH = "/content/drive/MyDrive/Fish Attribution/model1e-050.5.2022-05-22 12:13:10.pt Work/Deconvolution All Class.pt"
### Path to the true labels of the dataset, should have the dimensions
### [1,x] where x is the number of pieces of data
TRUE_LABELS_PATH = "/content/drive/Shareddrives/Exploding Gradients/y_b.npy"
### Path to the predicted labels of the dataset, should have the dimensions
### [1,x] where x is the number of pieces of data
PREDICTED_LABELS_PATH =  "/content/drive/MyDrive/Fish Attribution/model1e-050.5.2022-05-22 12:13:10.pt Work/predicted_labels.npy"
### Path to where you want to save your images
FINAL_IMAGES_FOLDER_PATH = "/content/drive/MyDrive/Fish Attribution/model1e-050.5.2022-05-22 12:13:10.pt Work/Deconvolution Confusion Matrices"

## Denote if confusion matrix is desired.
## If set to False then only single images will be generated.
## Requires that feature attribution against ALL possible classes
## was performed in "Feature Attribtuion Data Generation" notebook, otherwise
## the dimensions of the data will not line up!
CONFUSION_MATRIX_ENABLED = True

## Denote if attribution was done against the true labels or predicted labels
## This is ONLY USED WHEN CONFUSION_MATRIX_ENABLED = FALSE and single
## image output is desired. This will affect if the 
ATTRIBTUION_DONE_AGAINST = AttributionDoneAgainst.PREDICTED_LABELS

## Image Processing Options

### These methods are passed straight to Captum's Visualization function.
### More information about the function can be found here: 
### https://captum.ai/api/utilities.html
METHOD = "blended_heat_map"
SIGN = "all"
ALPHA_OVERLAY = 0.8
SHOW_COLORBAR = True

In [4]:
# mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [5]:
# Load original images from the .npy file
# and convert to Tensor, reshaping the dimensions as needed
images = np.load(ORIGINAL_IMAGES_PATH)
images_tensor = torch.Tensor(images) 
images_tensor = torch.swapaxes(images_tensor,2,4)
images_tensor = torch.swapaxes(images_tensor,3,4)

# Load attribution data
attributions_tensor = torch.load(ATTRIBUTION_DATA_PATH, map_location=torch.device('cpu'))

# Load labels
true_labels = np.squeeze(np.load(TRUE_LABELS_PATH))
predicted_labels = np.squeeze(np.load(PREDICTED_LABELS_PATH))

In [6]:
# If the user chooses to generate a confusion matrix
if CONFUSION_MATRIX_ENABLED:
  # For each piece of data (an "instance"), iterate over its associated feature attribution tensor
  # (In this case, it's three tensors combined as one because the attribution was done against all
  # possible classes) and the image tensor. Because the SAME set of images were used for the 
  # attribution, we will reuse those images for each attribution in the confusion matrix.
  for instance_idx, (multi_class_attribution_tensor, images) in tenumerate(zip(attributions_tensor, images_tensor),
                                                                           total = len(attributions_tensor)):
    # Create a plot with 4 rows, 3 columns
    # Each row will be for one image of the fish in the instance,
    # Each column will be for one class that attribution was done against
    main_fig, main_ax = plt.subplots(4,3, constrained_layout=True)
    plt.subplots_adjust(wspace = 0.11, bottom=0.05)
    # Constrained layout tries to minimize the whitespace in the plot
    #main_fig.constrained_layout()
    # Set the size of the plot
    main_fig.set_size_inches(10.5, 6, forward=True)
    # Create a subtitle
    main_fig.suptitle("Index {0}".format(instance_idx), fontsize=16)
    # Iterate of each individual attribution tensor out of the combined three.
    # "col" gives the column index for the plot
    for col, single_class_attribution in enumerate(multi_class_attribution_tensor):
      # Out of each individual attribution tensor, iterate over the four subattrbituions
      # which target each image
      for row, (attribution, image) in enumerate(zip(single_class_attribution, images)):
        # Generate the title for the image, which includes its subindex,
        # The label, and whether or not the label attributed against was the 
        # True and Predicted label, both, one of them, or none at all.
        subimage_name_title_prefix = "Subindex: {0}, Label:{1}".format(row, col)
        if(col == int(true_labels[instance_idx]) and col == int(predicted_labels[instance_idx])):
          subimage_label = "True & Predicted"
        elif(col == int(predicted_labels[instance_idx])):
          subimage_label = "Predicted"
        elif(col == int(true_labels[instance_idx])):
          subimage_label = "True"
        else:
          subimage_label = ""

        subimage_name = subimage_name_title_prefix + "\n" + subimage_label
        # The main title of the confusion matrix plot
        image_name = "Index:{0}".format(instance_idx)
        _ = viz.visualize_image_attr(attribution.permute(1,2,0).detach().numpy(), 
                                    image.permute(1,2,0).detach().numpy(), 
                                    method=METHOD, 
                                    sign=SIGN, 
                                    alpha_overlay=ALPHA_OVERLAY,
                                    show_colorbar=SHOW_COLORBAR, 
                                    use_pyplot = False,
                                    plt_fig_axis = (main_fig, main_ax[row][col]),
                                    title=subimage_name)
    # We don't want the image to load in the notebook, especially if we're
    # producing a large number of confusion matrices, so we "close" it
    plt.close()
    # Save the confusion matrix image to the user specified location
    main_fig.savefig(FINAL_IMAGES_FOLDER_PATH + "/" + image_name, bbox_inches='tight')
# If the user only wants single images to be generated
else:
  # iterate over all instances, including the images and attributions associated with the instance
  for instance_idx, (images, attributions) in tenumerate(zip(images_tensor, attributions_tensor),
                                                         total = len(attributions_tensor)):
    # for each instance (which contains 4 images), iterate over each subinstance
    # (1 image, 1 piece of data for attribution)
    for subinstance_idx, (image, attribution) in enumerate(zip(images, 
                                                               attributions)):
      # Indicate the Index and Subindex
      image_name_index = "Index: {0}, {1}".format(instance_idx, 
                                                  subinstance_idx)
      # Indicate the True and Predicted Labels associated with the image
      image_name_labels = " True Label: {0}, Predicted Label: {1}".format(int(true_labels[instance_idx]), 
                                                                          int(predicted_labels[instance_idx]))
      
      # If the attribution was done against true or predicted labels,
      # then the title should reflect that.
      if(ATTRIBTUION_DONE_AGAINST == AttributionDoneAgainst.TRUE_LABELS):
        image_name = image_name_index + "\n" +  image_name_labels + "\n"  + " Attributed Against: True Label"
      elif(ATTRIBTUION_DONE_AGAINST == AttributionDoneAgainst.PREDICTED_LABELS):
        image_name = image_name_index + "\n" + image_name_labels + "\n"  + " Attributed Against: Predicted Label"

      # Call the visualization function provided by Captum. 
      # Note that the "visualize_image_attr" function
      # will automatically create a plot if none are passed in. Thus, there is
      # no need to instantiate an empty plot like the Confusion Matrix handling
      # code above.
      fig, _ = viz.visualize_image_attr(attribution.permute(1,2,0).detach().numpy(), 
                                        image.permute(1,2,0).detach().numpy(), 
                                        method=METHOD, 
                                        sign=SIGN, 
                                        alpha_overlay=ALPHA_OVERLAY,
                                        show_colorbar=SHOW_COLORBAR, 
                                        use_pyplot = False,
                                        title=image_name)
      # Save the figure to the user-specified directory.
      fig.savefig(FINAL_IMAGES_FOLDER_PATH + "/" + image_name_index, bbox_inches='tight')

  0%|          | 0/285 [00:00<?, ?it/s]

  del sys.path[0]
