# Feature Attribution Data Generation

This notebook is the main workhorse for the Feature Attribution pipeline, responsible for performing the feature attribution process that the "Feature Attribution Image Generation" and "Feature Attribution Bokeh Generation" notebooks will rely on, depending on what output you would like to see.

## Known Incompatibilities/Computational Intensity Issues

It should be noted that not all classification models support all attribution methods (or are practical with the standard computational resources provided by Colab/Colab Pro. Consider Colab Pro+ or downloading all necessary notebooks and running locally.)

* Any ResNet-based model is incompatible with DeepLift over a known issue with repeated ReLU use (refer to the official Captum Github Issue here: https://github.com/pytorch/captum/issues/378)
* LRP (Layer-based Relevance Propagation) does not work with EfficientNet. Even after removal of augmentation layers, it seems to fail upon detecting SiLU layers.
* Methods such as Shapley Value Sampling and Occlusion take a great deal of time, this can be reduced by either:
  * Using a larger mask, in which case you sacrifice the granularity of the prediction for faster evaluation time
  * Using larger strides, in the case of Occlusion
* There is an option to *try* to run the process on Google's TPU (Tensor Processing Unit) from an attempt during development to see if it would be any faster but this fails.
* Integrated Gradients can be so memory intensive it ends up causing Colab to auto-terminate. This seems to be dependent on which model is being used.

## Input 

### Required Arguments
* `NUMBER_TO_PROCESS` (integer or None. If integer, should NEVER EXCEED the total number of instances of data) - The number of data instances to perform attribution on. This is useful for debugging purposes or if you only want to obtain the attributions for the first n pieces of data. If this is set to None then all data is processed. 
* `MODEL_PATH` (string) - A path to the saved classifier model, should saved as a *.pt file
* `X_PATH` (string) - A path to the dataset containing the images to be classified, should be saved in *.npy format
* `Y_PATH` (string) - A path to the TRUE labels of the dataset, should be stored as a *.npy file
& `ATTRIBUTIONS_PATH` (string)- A path to where the tensor containing all the attribution data should be saved, the resulting file with be saved as a *.pt file.

### Optional Arguments
* `BATCH_SIZE` (integer) (default value of 1) - The number of data instances to load onto the selected backend for processing. This is set to 1 by default.
* `ATTRIBUTION_METHOD`  (AttributionMethods Enum) (default value of AttributionMethod.DECONVOLUTION) - Selects the method that the attribution should be done on. The ATTRIBUTION_METHOD variable can be set to any of the following:
  * `AttributionMethods.INTEGRATED_GRADIENTS` - If this is being used please considering modifying the values for better performance/accuracy of variables prefixed with (INTEGRATED_GRADIENTS)
  * `AttributionMethods.DEEPLIFT`
  * `AttributionMethods.SALIENCY_MAPS`
  * `AttributionMethods.INPUT_X_GRADIENT`
  * `AttributionMethods.GUIDED_BACKPROP`
  * `AttributionMethods.DECONVOLUTION`
  * `AttributionMethods.LRP` - Note: this has NOT been thoroughly tested and per the prior text, does not seem to work with EfficientNet. Its behavior with ResNet is unknown.
  * `AttributionMethods.OCCLUSION` - If this is being used please considering modifying the values for better performance/accuracy of variables prefixed with OCCLUSION
  * `AttributionMethods.SHAPLEY_VALUE_SAMPLING` - If this is being used please considering modifying the values for better performance/accuracy of variables prefixed with SHAPLEY_VALUE_SAMPLING
* `ATTRIBUTION_TARGET_MODE` (AttributionTargetMode Enum) - Determines if the attributions should be done against a single class target (specified by the Y_PATH variable) or against all classes (0, 1, and 2). This can be controlled by setting the `ATTRIBUTION_TARGET_MODE` variable to the following:
  * `AttributionTargetMode.SINGLE_CLASS`
  * `AttributionTargetMOde.ALL_CLASS`
* `BACKEND` (Backend Enum) - The device that the feature attribution will be done on. This  variable must be set equal to an Enum from the Backend class (ex: BACKEND = Backends.GPU). BACKEND can be set to the following:
  * `Backends.CPU`
  * `Backends.GPU`
  * `Backends.TPU`: WARNING the TPU option has NOT been thoroughly tested and is a byproduct of attempting to find faster methods for Feature Attribution. It is highly recommended to NOT use this setting. If you would like to test it, make sure that the commented lines in the library import Cell are uncommented to install the necessary dependencies as recommended by Google.
* `OCCLUSION_SLIDING_WINDOWS_SHAPES` (4 Integer tuple) (Default value of (1, 3, 26, 75) - The shape to Occlude. This should have dimensions of 1 - input tensor. The input tensor to the classifier is (1, 4, 3, 130, 750) so when creating the sliding window we drop the “1”. Thus, the window occupies 1 of the 4 images, covers all 3 color channels (we aren’t interested in occluding individual color channels), and covers a 26 x 75 subsection of the 130 x 750 image.
* `OCCLUSION_STRIDES` (4 Integer tuple) (Default value of (1, 3, 26, 75)) - This should follow similar dimensions to whatever was provided to OCCLUSION_SLIDING_WINDOWS_SHAPES. This specifies how much to slide the window by. The default arguments here do not overlap with the previous occlusion, so once 26 x 75 pixels have been accounted for, we immediately move to the next untouched 26 x 75 pixels. 
* `OCCLUSION_PERTURBATIONS_PER_EVAL` (Integer) (Default value of 160) - how many occlusions should be passed to the classifier for processing. If there is sufficient GPU/System RAM this number should be made as large as possible to speed up processing.
* `SHAPLEY_VALUE_SAMPLING_MASK` (numpy array) - this should be set to a numpy array that has the same dimensions as a single instance of data, but which contains integers indicating which pixels are to be clumped together as a “hyperpixel” (for example, for a single instance of data which is 4 x 3 x 130 x 750, a 4 x 3 x 13 x 75 section of the array could be all 0’s, then the next section be all 1’s, indicating that those sections are to be treated as one large pixel).
* `SHAPLEY_VALUE_SAMPLING_N_SAMPLES` (Integer) (Default value of 10) - number of feature permutations tested, Captum defaults to 25 but in an earlier attempt to reduce the memory intensity this was reduced to 10.
* `SHAPLEY_VALUE_SAMPLING_PERTURBATIONS_PER_EVAL` (Integer) (Default of 80) - Allow for multiple ablations to be processed simultaneously when data is passed to the classifier. Captum defaults to 1 but 80 is provided from an earlier attempt to speed up the process while maintaining sufficient memory.
* `INTEGRATED_GRADIENTS_N_STEPS` (Integer) (Default value of 20, Captum recommends 50 by default) - number of steps to be used for approximation method  used by Integrated Gradients. 50 is the Captum default but to try and reduce the memory intensity of the attribution method (and avoid crashes) it is set to 20.
* `INTEGRATED_GRADIENTS_METHOD` (String) (Default value of “riemann_trapezoid”) - The integration method to be used for Integrated Gradients. By default, Captum always resorts to “gausslegendre” but “riemann_trapezoid” seems to be less memory intensive, in conjunction with a reduced number of steps for `INTEGRATED_GRADIENTS_N_STEPS`. Captum accepts the following arguments:
  * “riemann_right”
  * “riemann_left”
  * “riemann_middle”
  * “riemann_trapezoid”
  * “gausslegendre”
* `INTEGRATED_GRADIENTS_INTERNAL_BATCH_SIZE` (int) (Default value of None) - essentially take n_steps * example data points and divide that into chunks which are calculated sequentially by the classifier. The value must at least be equal to whatever the BATCH_SIZE option is set to IF THIS VARIABLE IS USED.

## Output

* A single tensor containing all the attribution data, outputted as a saved PyTorch tensor. The dimensions will vary depending on if attribution was done against a SINGLE class or ALL classes.

In [None]:
import torch
import torch.optim as optim
from torch.optim import lr_scheduler

import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader

import torchvision
from torchvision import datasets, models, transforms
import copy

import numpy as np
import pandas as pd

%matplotlib inline 
import matplotlib.pyplot as plt
import time
import os
import copy
import random
import math
import string


from skimage.filters import sobel
from skimage.color import rgb2gray

# TPU support
## Insufficient testing has been done to see what methods support using the TPU but the ability to enable the TPU has been provided here.
## Ensure that the Runtime for the Colab notebook has "TPU" selected prior to use!
#!pip install -q cloud-tpu-client==0.10 torch==1.11.0 https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-1.11-cp37-cp37m-linux_x86_64.whl
#import torch_xla
#import torch_xla.core.xla_model as xm


# Install Captum for Feature Attribution
!pip install -q captum

# Install tqdm for the progress bars
!pip install -q tqdm
from tqdm.auto import tqdm
from tqdm.contrib import tenumerate

# Import Captum functions
from captum.attr import *

# 
from enum import Enum

In [None]:
# THIS CODE ONLY APPLIES IF YOU ARE TRYING TO DO SHAPLEY VALUE SAMPLING
## This gives an example of how to build a mask for Shapley Value Sampling

## a single instance of data has the shape is [4, 3, 130, 750]
## we want to build a mask that has the same dimensions

# one unit matrix, all zeros, 
unit_mat_26x75 = np.zeros((3,26,75), dtype=int)

# # create a single row with 10 26x75 matrices
unit_row_26x750_list = []

for i in range(10):
  unit_row_26x750_list.append(unit_mat_26x75 + i)

unit_row_26x750 = np.concatenate((tuple(unit_row_26x750_list)), axis=2)

unit_row_26x750.shape

# create multiple rows to create one base matrix
# Should be 5 instances of rows
# need to increment the values of each row by 10
# [0, 30,..., ]

unit_matrix_130x750_list = []
for i in np.arange(0,50,10):
  unit_matrix_130x750_list.append(unit_row_26x750 + i)

unit_matrix_130x750 = np.concatenate(tuple(unit_matrix_130x750_list), axis=1)

unit_matrix_130x750.shape

# need 4 base matrices to create one whole mask
# want 4 x 3 x 130 x 750

full_matrix_4x3x130x750_list = []
for i in np.arange(0, 200, 50):
  full_matrix_4x3x130x750_list.append(unit_matrix_130x750 + i)

full_matrix_4x3x130x750 = np.zeros((4,3,130,750))
for i in range(len(full_matrix_4x3x130x750_list)):
  full_matrix_4x3x130x750[i] = full_matrix_4x3x130x750_list[i]

full_matrix_4x3x130x750.shape

shapley_feature_mask = full_matrix_4x3x130x750

In [None]:
class AttributionMethods(Enum):
  INTEGRATED_GRADIENTS = 1
  DEEPLIFT = 2
  SALIENCY_MAPS = 3
  INPUT_X_GRADIENT = 4
  GUIDED_BACKPROP = 5
  DECONVOLUTION = 6
  LRP = 7
  OCCLUSION = 8
  SHAPLEY_VALUE_SAMPLING = 9

class AttributionTargetMode(Enum):
  SINGLE_CLASS = 1 
  ALL_CLASS = 2

class Backend(Enum):
  CPU = "cpu"
  GPU = "cuda:0"
  TPU = "tpu"

In [1]:
# Main Control Flow Variables

## Method for Feature Attribution
ATTRIBUTION_METHOD = AttributionMethods.DECONVOLUTION

## Targets to perform attribution against
### If SINGLE_CLASS then attribution is only done against whatever labels
### are supplied to the Y_PATH variable, if ALL_CLASS then attributions
### are done against all possible classes
ATTRIBUTION_TARGET_MODE = AttributionTargetMode.ALL_CLASS

## Select the device for the feature attribution processing to be done on
BACKEND = Backend.GPU
## Controls how many data instances should be attributed in one go
BATCH_SIZE = 1

# Number of pieces of data to process, useful for debugging if you 
# have a large dataset but only want the first n to be attributed.
# if left equal to None, then ALL DATA will be processed
NUMBER_TO_PROCESS = None

## File Paths
### Path to the saved model
MODEL_PATH = "/content/drive/MyDrive/Fish Attribution/model1e-050.5.2022-05-22 12:13:10.pt Work/model1e-050.5.2022-05-22 12_13_10.pt"
X_PATH = "/content/drive/Shareddrives/Exploding Gradients/X_cropped_b.npy"
Y_PATH = "/content/drive/MyDrive/Fish Attribution/model1e-050.5.2022-05-22 12:13:10.pt Work/predicted_labels.npy"
ATTRIBUTIONS_PATH = "/content/drive/MyDrive/Fish Attribution/model1e-050.5.2022-05-22 12:13:10.pt Work/Deconvolution All Class.pt"

## Attribtuion specific method options


## Occlusion
### Alternative working dimensions include:
### Strides: (1,3,10,10)
### Window Shape: (1,3,13,75)


OCCLUSION_SLIDING_WINDOW_SHAPES = (1,3,26,75)
OCCLUSION_STRIDES = (1,3,26,75)
OCCLUSION_PERTURBATIONS_PER_EVAL = 160

SHAPLEY_VALUE_SAMPLING_FEATURE_MASK = full_matrix_4x3x130x750
SHAPLEY_VALUE_SAMPLING_N_SAMPLES = 10
SHAPLEY_VALUE_SAMPLING_PERTURBATIONS_PER_EVAL = 80

INTEGRATED_GRADIENTS_N_STEPS = 20 # default is 50, may be too intensive for Colab to handle
INTEGRATED_GRADIENTS_METHOD = 'riemann_trapezoid'
INTEGRATED_GRADIENTS_INTERNAL_BATCH_SIZE = None


# Check that GPU is available, if it isn't then BACKEND will automatically toggle to CPU
if(BACKEND == Backend.GPU):
  if(torch.cuda.is_available()):
    print("GPU Selected, and confirmed available!")
    device = torch.device("cuda:0")
  else:
    print("GPU Selected, but not found! Switching to CPU for backend")
    BACKEND = Backend.CPU
    device = torch.device("cpu")
elif(BACKEND == Backend.TPU):
  assert os.environ['COLAB_TPU_ADDR'], 'Make sure to select TPU from Edit > Notebook settings > Hardware accelerator'
  print("TPU Selected")
  device = xm.xla_device()
elif(BACKEND == Backend.CPU):
  print("CPU Selected")
  device = torch.device("cpu")

NameError: ignored

In [None]:
#This function takes in a model and replaces inplace relu layers to an independent relu layer
def reluToInplaceFalse(model):
  for name, child in model.named_children():
    if isinstance(child, nn.ReLU):
      setattr(child, 'inplace', False)
    else:
      reluToInplaceFalse(child)

In [None]:
# Load the Model Class

TARGET_WIDTH = 750
TARGET_HEIGHT = 130
from torchvision.transforms.transforms import RandomRotation, RandomAdjustSharpness, RandomGrayscale
import torchvision.transforms.functional as tf

def init_weights(m):
  if isinstance(m, nn.Linear):
    nn.init.kaiming_normal_(m.weight, nonlinearity='relu')


class Classifier(torch.nn.Module):

  def __init__(self, backbone='resnet', multi_backbone = False, device ="cuda:0",dropout_rate = 0.2, do_augmentation = False, target_height=TARGET_HEIGHT, target_width=TARGET_WIDTH):
    super().__init__()
    self.multi_backbone = multi_backbone # Bool: Indicates if we use multibackbone

    #In the following section we download the appropriate prettrained model
    if backbone == "vgg19":
      backbone = torchvision.models.vgg19(pretrained=True)
      self.out_channels = 25088
      
    elif backbone == "resnet18":
      backbone = torchvision.models.resnet18(pretrained=True)
      self.out_channels = 512

    elif backbone == "resnet50":
      backbone = torchvision.models.resnet50(pretrained=True)
      self.out_channels = 2048

    elif backbone == "Efficientnet b1":
      backbone = torchvision.models.efficientnet_b1(pretrained=True)
      self.out_channels = 1280

    elif backbone == "Efficientnet b3":
      backbone = torchvision.models.efficientnet_b3(pretrained=True)
      self.out_channels = 1536

    elif backbone == "Efficientnet b5":
      backbone = torchvision.models.efficientnet_b5(pretrained=True)
      self.out_channels = 2048

    elif backbone == "Efficientnet b7":
      backbone = torchvision.models.efficientnet_b7(pretrained=True)
      self.out_channels = 2560
    else:
      raise ValueError(f'Invalid backbone "{backbone}"')
      
    # Disabling inplace ReLu becasuse GradCam doesn't work it enabled
    reluToInplaceFalse(backbone)
     
    modules = list(backbone.children())[:-1]

    if self.multi_backbone: #We create the backbones and put them on the device
      self.backbone1 = nn.Sequential(*copy.deepcopy(modules)).to(device)
      self.backbone2 = nn.Sequential(*copy.deepcopy(modules)).to(device)
      self.backbone3 = nn.Sequential(*copy.deepcopy(modules)).to(device)
      self.backbone4 = nn.Sequential(*copy.deepcopy(modules)).to(device)

    else:
      self.backbone =  nn.Sequential(*modules).to(device)

    self.do_augmentation = do_augmentation

    # Note: These are not all of the augmnetations performed, see custom_augmentation()
    self.unlabeled_augmentation = nn.Sequential(transforms.RandomVerticalFlip(0.5),
                                      transforms.RandomCrop(size=(target_height,target_width)),
                                      transforms.RandomRotation(10, interpolation=transforms.InterpolationMode.BILINEAR, fill=1),
                                      transforms.Normalize(0, 1)
    )

    self.bottleneck_dim = 256

    # This is the linear layer to compress each backbone
    self.fc_bb = nn.Sequential(nn.BatchNorm1d(self.out_channels),
                               nn.Dropout(dropout_rate),
                               nn.Linear(self.out_channels, self.bottleneck_dim),
                               nn.BatchNorm1d(self.bottleneck_dim),
                               nn.ReLU())
    self.fc_bb.apply(init_weights)

    self.fc_hflip1 = nn.Sequential(nn.Dropout(dropout_rate),
                                   nn.Linear(self.bottleneck_dim, 1))
    self.fc_hflip1.apply(init_weights)

    self.fc_hflip2 = nn.Sequential(nn.Dropout(dropout_rate),
                                   nn.Linear(self.bottleneck_dim, 1))
    self.fc_hflip2.apply(init_weights)

    self.fc_hflip3 = nn.Sequential(nn.Dropout(dropout_rate),
                                   nn.Linear(self.bottleneck_dim, 1))
    self.fc_hflip3.apply(init_weights)

    self.fc_hflip4 = nn.Sequential(nn.Dropout(dropout_rate),
                                   nn.Linear(self.bottleneck_dim, 1))
    self.fc_hflip4.apply(init_weights)

    #This is the final classification layer
    self.fc = nn.Sequential(nn.Dropout(dropout_rate),
                            nn.Linear(self.bottleneck_dim * 4, 3))
    self.fc.apply(init_weights)

    # A softmax is applied in eval mode
    self.softmax = nn.Softmax(dim=1)              
     
  def forward(self, x):
    if self.do_augmentation and self.training:
      imgs, hflip_labels = map(list, zip(*[self.custom_augmentation(x[:,i]) for i in range(4)])) #list of 4 images
      hflip_labels = [torch.Tensor(hflip_label).float().unsqueeze(1) for hflip_label in hflip_labels]
    else:
      imgs = [x[:,i] for i in range(4)] #list of 4 images
      hflip_labels = None
    
    if self.multi_backbone:
      encodings = [self.fc_bb(self.backbone1(imgs[0]).flatten(1)), 
                   self.fc_bb(self.backbone2(imgs[1]).flatten(1)),
                   self.fc_bb(self.backbone3(imgs[2]).flatten(1)),
                   self.fc_bb(self.backbone4(imgs[3]).flatten(1))]
    else:
      encodings = [self.fc_bb(self.backbone(img).flatten(1)) for img in imgs]

    logits = self.fc(torch.cat(encodings,1))
    if self.training:
      # get hflip predictions
      hflip_preds = [self.fc_hflip1(encodings[0]),
                     self.fc_hflip2(encodings[1]),
                     self.fc_hflip3(encodings[2]),
                     self.fc_hflip4(encodings[3])]
      hflip_preds = torch.cat(hflip_preds, 1)
      hflip_labels = torch.cat(hflip_labels, 1).to(device)
      return logits, hflip_preds, hflip_labels
    else:
      return self.softmax(logits)

  def custom_augmentation(self, images):
    hflip_labels = np.random.choice([0, 1], size = images.size(0))
    for i, hflip_label in enumerate(hflip_labels):
      if hflip_label == 1:
        images[i] = tf.hflip(images[i])
    images = self.unlabeled_augmentation(images)
    return images, hflip_labels

In [None]:
# Load the model
from google.colab import drive
drive.mount('/content/drive')

model = torch.load(MODEL_PATH, map_location="cpu")
model.eval()
model.zero_grad()

# Remove the augmenation layers, LRP apparently does not like them ):
# FAILS: SiLU does not work with LRP
# Credit: 
if(ATTRIBUTION_METHOD == AttributionMethods.LRP):
  model = nn.Sequential(*list(model.children())[1:])

if(ATTRIBUTION_METHOD == AttributionMethods.SHAPLEY_VALUE_SAMPLING):
  shapley_feature_mask = torch.tensor(shapley_feature_mask).unsqueeze(0).to(device)

# Put the model onto the device
model.to(device);

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# Prepare data
x = np.load(X_PATH)
y = np.load(Y_PATH)

print(x.shape)
print(y.shape)

tensor_x = torch.Tensor(x) 
tensor_y = torch.Tensor(y).long()

tensor_x = torch.swapaxes(tensor_x,2,4)
tensor_x = torch.swapaxes(tensor_x,3,4)

print(tensor_x.shape)
print(tensor_y.shape)
from torch.utils.data import TensorDataset

attribution_ds = TensorDataset(tensor_x ,tensor_y) 
attribution_dl = DataLoader(attribution_ds, BATCH_SIZE ,shuffle = False)
del x,y

(285, 4, 130, 750, 3)
(285, 1)
torch.Size([285, 4, 3, 130, 750])
torch.Size([285, 1])


In [None]:
## Depending on user selection, create the feature attribution object

if (ATTRIBUTION_METHOD == AttributionMethods.INTEGRATED_GRADIENTS):
  print("Integrated Gradients Chosen")
  attribution_obj = IntegratedGradients(model)
elif (ATTRIBUTION_METHOD == AttributionMethods.DEEPLIFT):
  print("DeepLift Chosen")
  attribution_obj = DeepLift(model)
elif (ATTRIBUTION_METHOD == AttributionMethods.SALIENCY_MAPS):
  attribution_obj = Saliency(model)
elif (ATTRIBUTION_METHOD == AttributionMethods.INPUT_X_GRADIENT):
  print("Input X Gradient Chosen")
  attribution_obj = InputXGradient(model)
elif (ATTRIBUTION_METHOD == AttributionMethods.GUIDED_BACKPROP):
  print("Guided Backpropagation Chosen")
  attribution_obj = GuidedBackprop(model)
elif (ATTRIBUTION_METHOD == AttributionMethods.DECONVOLUTION):
  print("Deconvolution Chosen")
  attribution_obj = Deconvolution(model)
elif (ATTRIBUTION_METHOD == AttributionMethods.LRP):
  print("LRP (Layer-wise Relevance Propagation)")
  attribution_obj = LRP(model)
elif (ATTRIBUTION_METHOD == AttributionMethods.OCCLUSION):
  print("Occlusion Selected")
  attribution_obj = Occlusion(model)
elif(ATTRIBUTION_METHOD == AttributionMethods.SHAPLEY_VALUE_SAMPLING):
  print("Shapley Value Sampling Selected")
  attribution_obj = ShapleyValueSampling(model)

Deconvolution Chosen


In [None]:
all_attributions = []
# should allow for a certain number to be executed
# Ex: 10 iterates
# should be an if-else
# if number provided, then call the loop that many times

if(NUMBER_TO_PROCESS is not None):
  total_batches = NUMBER_TO_PROCESS - 1
else:
  total_batches = tensor_x.size(0)

for iterations, (images, labels) in tenumerate(attribution_dl, total = total_batches):
  # send images to the device
  images = images.to(device)
  # perform attributions
  # If option is chosen to attribute to ALL targets, 
  # then apply a for-loop to 
  if(ATTRIBUTION_METHOD == AttributionMethods.INTEGRATED_GRADIENTS):
    if(ATTRIBUTION_TARGET_MODE == AttributionTargetMode.SINGLE_CLASS):
      attributions = attribution_obj.attribute(images, 
                                               target = labels.squeeze().to(device),
                                               method = INTEGRATED_GRADIENTS_METHOD,
                                               n_steps = INTEGRATED_GRADIENTS_N_STEPS,
                                               internal_batch_size = INTEGRATED_GRADIENTS_INTERNAL_BATCH_SIZE)
      all_attributions.append(attributions.cpu())     
    elif(ATTRIBUTION_TARGET_MODE == AttributionTargetMode.ALL_CLASS):
      sub_attributions = []
      for i in range(3):
        sub_attributions = attribution_obj.attribute(images, 
                                                     target = labels.squeeze().to(device),
                                                     method = INTEGRATED_GRADIENTS_METHOD,
                                                     n_steps = INTEGRATED_GRADIENTS_N_STEPS,
                                                     internal_batch_size = INTEGRATED_GRADIENTS_INTERNAL_BATCH_SIZE)
        all_attributions.append(sub_attributions)
  elif(ATTRIBUTION_METHOD == AttributionMethods.OCCLUSION):
    if(ATTRIBUTION_TARGET_MODE == AttributionTargetMode.SINGLE_CLASS):
      attributions = attribution_obj.attribute(images, 
                                              target=labels.squeeze().to(device), 
                                              strides=OCCLUSION_STRIDES,
                                              sliding_window_shapes=OCCLUSION_SLIDING_WINDOW_SHAPES,
                                              perturbations_per_eval=OCCLUSION_PERTURBATIONS_PER_EVAL,
                                              show_progress=False)
      all_attributions.append(attributions.cpu())  
    elif(ATTRIBUTION_TARGET_MODE == AttributionTargetMode.ALL_CLASS):
        sub_attributions = []
        for i in range(3): # classes are 0 1 and 2 respectively
          sub_attributions.append(attribution_obj.attribute(images, 
                                                            target=i,
                                                            strides=OCCLUSION_STRIDES,
                                                            sliding_window_shapes=OCCLUSION_SLIDING_WINDOW_SHAPES,
                                                            perturbations_per_eval=OCCLUSION_PERTURBATIONS_PER_EVAL,
                                                            show_progress=False).squeeze().cpu())
        all_attributions.append(sub_attributions)
  elif(ATTRIBUTION_METHOD == AttributionMethods.SHAPLEY_VALUE_SAMPLING):
    if(ATTRIBUTION_TARGET_MODE == AttributionTargetMode.SINGLE_CLASS):
      attributions = attribution_obj.attribute(images, 
                                               target=labels.squeeze().to(device),
                                               feature_mask = shapley_feature_mask,
                                               n_samples = SHAPLEY_VALUE_SAMPLING_N_SAMPLES,
                                               perturbations_per_eval=SHAPLEY_VALUE_SAMPLING_PERTURBATIONS_PER_EVAL,
                                               show_progress=False)
      all_attributions.append(attributions.cpu())  
    elif(ATTRIBUTION_TARGET_MODE == AttributionTargetMode.ALL_CLASS):
        sub_attributions = []
        for i in range(3): # classes are 0 1 and 2 respectively
          sub_attributions.append(attribution_obj.attribute(images, 
                                                            target=labels.squeeze().to(device),
                                                            feature_mask = shapley_feature_mask,
                                                            n_samples = SHAPLEY_VALUE_SAMPLING_N_SAMPLES,
                                                            perturbations_per_eval=SHAPLEY_VALUE_SAMPLING_PERTURBATIONS_PER_EVAL,
                                                            show_progress=False))
        all_attributions.append(sub_attributions)
  else:
    if(ATTRIBUTION_TARGET_MODE == AttributionTargetMode.SINGLE_CLASS):
      attributions = attribution_obj.attribute(images,
                                               target=labels.squeeze().to(device))
      all_attributions.append(attributions)
    elif(ATTRIBUTION_TARGET_MODE == AttributionTargetMode.ALL_CLASS):
        sub_attributions = []
        for i in range(3): # classes are 0 1 and 2 respectively
          sub_attributions.append(attribution_obj.attribute(images, 
                                                            target=i))
        all_attributions.append(sub_attributions)


  # delete images, labels, and batch attributions from memory
  del images, labels
  # If things were run on the GPU, empty the cache to prevent memory overloading
  if(BACKEND == Backend.GPU):
    torch.cuda.empty_cache()
  # prematurely break out of loop if an explicit amount of data was requested
  if(NUMBER_TO_PROCESS is not None and (iterations == NUMBER_TO_PROCESS - 1)):
    break

  0%|          | 0/285 [00:00<?, ?it/s]

  "required_grads has been set automatically." % index
  "Setting backward hooks on ReLU activations."


In [None]:
# final shape should be (285, 3, 4, 3, 130, 750)
# Credit to: https://discuss.pytorch.org/t/nested-lists-of-tensors-to-tensor/121449/2
if(ATTRIBUTION_TARGET_MODE == AttributionTargetMode.ALL_CLASS):
  all_attributions_tensor_size = [len(all_attributions), len(all_attributions[0])] + list(all_attributions[0][0].shape)
  all_attributions_tensor = torch.empty(all_attributions_tensor_size)
  for i in range(len(all_attributions)):
    for j in range(len(all_attributions[0])):
      all_attributions_tensor[i][j] = all_attributions[i][j]
  all_attributions_tensor = all_attributions_tensor.squeeze()
else:
  all_attributions_tensor = torch.cat(all_attributions, 0)

del all_attributions

In [None]:
# Save the attribution tensors back to drive

torch.save(all_attributions_tensor, ATTRIBUTIONS_PATH)