Releases: pytorch/captum
Captum v0.7.0 Release
The Captum 0.7.0 release adds new functionalities for language model attribution, dataset level attribution, and a few improvements and bug fixes for existing methods.
Language Model Attribution
Captum 0.7.0 adds new APIs for language model attribution, making it substantially easier to define interpretable text features with corresponding baselines and masks. These new wrappers are compatible with most attribution methods in Captum and make it substantially easier to understand how aspects of a prompt impact an LLM’s predicted response. More details can also be found in our paper:
Using Captum to Explain Generative Language Models
Example:
from captum.attr import ShapleyValueSampling, LLMAttribution, TextTemplateInput
shapley_values = ShapleyValueSampling(model)
llm_attr = LLMAttribution(shapley_values, tokenizer)
inp = TextTemplateInput(
# the text template
"{} lives in {}, {} and is a {}. {} personal interests include",
# the values of the features
["Dave", "Palm Coast", "FL", "lawyer", "His"],
# the reference baseline values of the features
baselines=["Sarah", "Seattle", "WA", "doctor", "Her"],
)
res = llm_attr.attribute(inp)
DataLoader Attribution
DataLoader Attribution is a new wrapper which provides an easy-to-use approach for obtaining attribution on a full dataset by providing a data loader rather than a single input (PR #1155, #1158).
Attribution Improvements
Captum 0.7.0 has added a few improvements to existing attribution methods including:
- Multi-task attribution for Shapley Values and Shapley Value Sampling is now supported, allowing users to get attributions for multiple target outputs simultaneously (PR #1173)
- LayerGradCam now supports returning attributions for each channel independently without summing across channels (PR #1086, thanks to @dzenanz for this contribution)
Bug Fixes
- Visualization utilities were updated to use the new keyword argument visible to ensure compatibility with Matplotlib 3.7 (PR #1118)
- The default visualization mode in visualize_timeseries_attr has been fixed to appropriately utilize overlay_individual (PR #1152, thanks to @teddykoker for this contribution)
Captum v0.6.0 Release
The Captum v0.6.0 release introduces a new feature StochasticGates
. This release also enhances Influential Examples and includes a series of other improvements & bug fixes.
Stochastic Gates
Stochastic Gates is a technique to enforce sparsity by approximating L0 regularization. It can be used for network pruning and feature selection. As directly optimizing L0 is a non-differentiable combinatorial problem, Stochastic Gates approximates it by using certain continuous probability distributions (e.g., Concrete, Gaussian) as smoothed Bernoulli distributions. So the optimization can be reparameterized into the distributions parameters. Check the following papers for more details:
Captum provides two Stochastic Gates implementations using different distributions as smoothed Bernoulli, BinaryConcreteStochasticGates
and GaussianStochasticGates
. They are available under captum.module
, a new subpackage collecting neural network building blocks that are useful for model understanding. A usage example:
from captum.module import GaussianStochasticGates
n_gates = 5 # number of gates
stg = GaussianStochasticGates(n_gates, reg_weight=0.01)
inputs = torch.randn(3, n_gates) # mock inputs with batch size of 3
gated_inputs, reg = stg(mock_inputs) # gate the inputs
loss = model(gated_inputs) # use gated inputs in the downstream network
# optimize sparsity regularization together with the model loss
loss += reg
...
# verify the learned gate values to see how model is using the inputs
print(stg.get_gate_values())
Influential Examples
Influential Examples is a new function pillar enabled in the last version. This new release continues to focus on it and introduces many improvements upon the existing TracInCP
family. Some of the changes are incompatible with the previous version. Below is the list of details:
- Support loss function with reduction of
mean
inTracInCPFast
andTracInCPFastRandProj
(#913) TracInCP
classes add a new argumentshow_progress
to optionally display progress bars for the compuation (#898, #1046)TracInCP
provides a new public methodself_influence
which computes the self influence scores among the examples in the given data.influence
can no longer compute self_influence scores and the argumentinputs
cannot beNone
(#994, #1069, #1087, #1072)- Previous constructor argument
influence_src_dataset
inTracInCP
is renamed totrain_dataset
(#994) - Add GPU support to
TracInCPFast
andTracInCPFastRandProj
(#969) TracInCP
andTracInCPFastRandProj
provides a new public methodcompute_intermediate_quantities
which computes “embedding” vectors for examples in a the given data (#1068)TracInCP
classes supports a new optional argumenttest_loss_fn
for use cases where different losses are used for training and testing examples (#1073)- Revised the interface of the method
influence
. Removed the argumentsunpack_inputs
andtarget
. Now, theinputs
argument must be atuple
where the last element is the label (#1072)
Notable Changes
- LRP now will throw error when it detects the model ruses any modules (#911)
- Fixed the bug that the concept order changes in
TCAV
’s output (#915, #909) - Fixed the data type issue of using Captum’s built-in SGD linear models in
Lime
(#938, #910) - All submodules are now accessible under the top-level
captum
module, so users canimport captum
and access everything underneath it, e.g.,captum.attr
(#912, #992, #680) - Added a new attribution visualization utility for time series data (#980)
- Improved version detection to fix some compatibility issues caused by dependencies’ versions (#940, #999, )
- Fixed an index bug in the tutorial Interpret regression models using Boston House Prices Dataset (#1014, #1012)
- Refactored
FeatureAblation
andFeaturePermutation
to verify the output type offorward_func
and its shape whenperturbation_per_eval > 1
(#1047, #1049, #1091) - Changed Housing Regression tutorial with California housing dataset (#1041)
- Improved the error message of invalid input types when the required data type is
tensor
ortuple[tensor]
(#1083) - Switched to tensor
forward_hook
from modulebackward_hook
for many attribution algorithms that need tensor gradients, likeDeepLift
andLayerLRP
. So those modules can now support models with in-place modules (#979, #914) - Added an optional
mask
argument toFGSM
andPGD
adversarial attacks undercaptum.robust
to specify which elements are perturbed (#1043)
Captum v0.5.0 Release
The Captum v0.5.0 release introduces a new function pillar, Influential Examples, with a few code improvements and bug fixes.
Influential Examples
Influential Examples implements the method TracInCP. It calculates the influence score of a given training example on a given test example, which approximately answers the question “if the given training example were removed from the training data, how much would the loss on the model change?”. TracInCP can be used for:
- identifying proponents/opponents, which are the training examples with the most positive/negative influence on a given test example
- identifying mis-labelled data
Captum currently offers the following specific variant implementings of TracInCP:
TracInCP
- Computes influence scores using gradients at all specified layers. Can be used for identifying proponents/opponents, and identifying mis-labelled data. Both computations take time linear in training data size.TracInCPFast
- Like TracInCP, but computes influence scores using only gradients in the last fully-connected layer, and is expedited using a computational trick.TracInCPFastRandProj
- Version of TracInCPFast which is specialized for computing proponents/opponents. In particular, pre-processing enables computation of proponents / opponents in constant time. The tradeoff is the linear time and memory required for pre-processing. Random projections can be used to reduce memory usage. This class should not be used for identifying mis-labelled data.
A tutorial is made to demonstrate the usage https://captum.ai/tutorials/TracInCP_Tutorial
Notable Changes
- Minimum required PyTorch version becomes v1.6.0 (#876)
- Enabled argument
model_id
inTCAV
and removedAV
from public concept module (PR #811) - Add new configurable argument
attribute_to_layer_input
inTCAV
to set for both layer activation and attribution (#864) - Rename the argument
raw_input
toraw_input_ids
in visualization utilVisualizationDataRecord
(PR #804) - Support configurable
eps
argument inDeepLift
(PR #835) - Captum now leverages
register_full_backward_hook
introduced in PyTorch v1.8.0. Attribution to neuron output inNeuronDeepLift
,NeuronGuidedBackprop
, andNeuronDeconvolution
are deprecated and will be removed in the next major release v0.6.0 (PR #837) - Fix the issue that Lime and KernelShap fail to handle empty tensor input like
tensor([[],[],[]])
(PR #812) - Fix the bug that
visualization_transform
ofImageFeature
in Captum Insight is not applied (PR #871)
Captum v0.4.1 Release
The Captum v0.4.1 release includes three new tutorials, a few code improvements and bug fixes.
New Tutorials
Robustness tutorial:
- Applying robustness attacks and metrics to CIFAR model and dataset
Concept tutorials:
- TCAV for image classification for googlenet model
- TCAV for NLP sentiment analysis model
Improvements
- Reduced unnecessary reliance on
Numpy
across the codebase by replacing such usages withPyTorch
equivalents when possible (PR #714 #755 #760) - Enhanced the error message for missing modules rules in LRP (PR #727)
- Switched linter to
ufmt
from previousblack
+isort
and reformatted the code accordingly (PR #739) - Generalized implementation of
captum._utils.av
for TCAV to use and refactored TCAV to simplify the creation of datasets used to train concept models (PR #747)
Bug Fixes
- Fixed the device error when using TCAV on cuda (Issue #719 #720 #721 , PR #725)
- Captum Insight now cache a subset of batches from dataset for recycle to fix the issue of not showing data after iterating all batches (PR #728)
- Corrected the loading of reference word embedding in tutorial “Interpreting Bert Part 1” (PR #743)
- Renamed the util
save_div
’s argumentdefault_value
todefault_denom
and unified its behaviors for different denominator types (Issue #654 , PR #751)
Captum v0.4.0 Release
The Captum 0.4.0 release adds new functionalities for concept-based interpretability, evaluating model robustness, new attribution methods including Layerwise Relevance Propagation (LRP), and improvements to existing attribution methods.
Concept-Based Interpretability
Captum 0.4.0 adds TCAV (Testing with Concept Activation Vectors) to Captum, allowing users to identify significance of user-defined concepts on a model’s prediction. TCAV has been implemented in a generic manner, allowing users to define custom concepts with example inputs for any modality including vision and text.
Robustness
Captum 0.4.0 also includes new tools to understand model robustness including implementations of adversarial attacks (Fast Gradient Sign Method and Projected Gradient Descent) as well as robustness metrics to evaluate the impact of different attacks or perturbations on a model. Robustness metrics included in this release include:
- Attack Comparator - Allows users to quantify the impact of any input perturbation (such as torchvision transforms, text augmentation, etc.) or adversarial attack on a model and compare the impact of different attacks
- Minimal Perturbation - Identifies the minimum perturbation needed to cause a model to misclassify the perturbed input
This robustness tooling enables model developers to better understand potential model vulnerabilities as well as analyze counterfactual examples to better understand a model’s decision boundary.
Layerwise Relevance Propagation (LRP)
We also add a new attribution method LRP (Layerwise Relevance Propagation) to Captum in the 0.4.0 release, as well as a layer attribution variant, Layer LRP. Layer-wise relevance propagation is based on a backward propagation mechanism applied sequentially to all layers of the model. The model output score represents the initial relevance which is decomposed into values for each neuron of the underlying layers. Thanks to @nanohanno for contributing this method to Captum and @rGure for providing feedback!
New Tutorials
We have added new tutorials to demonstrate Captum with BERT, usage of Lime, and DLRM recommender models. These tutorials are:
- Interpreting BERT Models (Part 2)
- LIME for Image & Text Classification
- Interpreting Deep Learning Recommender Model (DLRMs) (https://github.com/facebookresearch/dlrm) with Captum
Additionally, the following fixes and updates to existing tutorials have been added:
- The IMDB tutorial has been updated with a new model (trained with a larger embedding and updated dependencies) for reproducibility.
- Interpreting BERT Models (Part 1) has been updated to make use of LayerIntegratedGradients with multiple layers to obtain attributions simultaneously, and
Attribution Improvements
Captum 0.4.0 has added improvements to existing attribution methods including:
- Neuron conductance now supports a selector function (in addition to providing a neuron index) to select the target neuron for attribution, which enables support for layers with input / output as a tuple of tensors (PR #602).
- Lime now supports a generator to be returned by the perturbation function, rather than only a single sample, to better support enumeration of perturbations for interpretable model training (PR #619).
- KernelSHAP has been improved to perform weighted sampling of vectors for interpretable model training, rather than uniformly sampling vectors and weighting only when training. This change scales better with larger numbers of features, since weights for larger numbers of features were previously leading to arithmetic underflow (PR #619).
- A new option show_progress has been added to all perturbation-based attribution methods, which shows a progress bar to help users track progress of attribution computation (Issue #630 , PR #581).
- A new option / flag normalize has been added to infidelity evaluation metric that normalizes and scales infidelity score based on an input flag normalize (Issue: #613, PR: #639 )
- All perturbation-based attribution methods now support boolean input tensors (PR #666).
- Lime’s default regularization for Lasso regression has been reduced from 1.0 to 0.01 to avoid frequent issues with attribution results being 0 (Issue #679, PR #689).
Bug Fixes
- Gradient-based attribution methods have been fixed to not zero previously stored grads, which avoids warnings related to accessing grad of non-leaf tensors (Issue #421, #491, PR #597).
- Captum tests were previously included in Captum distributions unnecessarily; tests are no longer packaged with Captum releases (Issue #629 , PR #635).
- Captum’s dependency on matplotlib in Conda environments has been changed to matplotlib-base, since pyqt is not used in Captum (Issue #644, PR #648).
- Layer attribution methods now set gradient requirements only starting at the target layer rather than at the inputs, which ensures support for models with int or boolean input tensors (PR #647, #643).
- Lime and Kernel SHAP int overflow issues (with sklearn interpretable model training) have been resolved, and all interpretable model inputs / outputs are converted to floats prior to training (PR #649).
- Original parameter names which were renamed in v0.3 for NoiseTunnel, Kernel Shap, and Lime no longer lead to deprecation warnings and were removed in 0.4.0 (PR #558).
Captum v0.3.1 Release
Captum v0.3.1 includes some improvements and minor fixes beyond the functionalities added in Captum v0.3.0.
Improvements
Captum v0.3.1 has added improvements to existing attribution methods including:
- LayerIntegratedGradients now supports computing attributions for multiple layers simultaneously. (PR #532).
- NoiseTunnel now supports an internal batch size to split noised inputs into batches and appropriately aggregate results (PR #555).
- visualize_text now has an option return_html to export the visualization as HTML code (PR #548).
- A utility wrapper was added to allow computing attributions for intermediate layers and inputs simultaneously (PR #534).
Captum Insights
- Attributions for multiple models can be compared in Captum Insights (PR #551).
- Various improvements to reduce package size of Captum Insights (PR #556 and #562).
Bug Fixes
- Some parameter names were renamed in NoiseTunnel, Kernel Shap, and Lime to avoid conflicting names when combining Noise Tunnel or metrics with attribution methods. Deprecated arguments now raise warnings and will be removed in 0.4.0 (PR #558).
- Feature Ablation now supports cases where the output may be on a different device than the input, which may occur in model-parallel setups (#528).
- Lime (and KernelShap) were fixed to appropriately handle int or long input types (#570).
Captum v0.3.0 Release
The third release, v0.3.0, of Captum adds new attribution algorithms including Lime and KernelSHAP, metrics for assessing attribution results including infidelity and sensitivity, and improvements to existing attribution methods.
Metrics (Sensitivity and Infidelity)
Captum 0.3.0 adds metrics to estimate the trustworthiness of model explanations. Currently available metrics include Sensitivity-Max and Infidelity.
Infidelity measures the mean squared error between model explanations in the magnitudes of input perturbations and predictor function's changes to those input perturbations. Sensitivity measures the degree of explanation changes to subtle input perturbations using Monte Carlo sampling-based approximation. These metrics are available in captum.metrics and documentation can be found here.
Lime and KernelSHAP
In Captum 0.3.0, we also add surrogate-model interpretability methods including Lime and KernelSHAP. Lime is an interpretability method that trains an interpretable surrogate model by sampling points around a specified input example and using model evaluations at these points to train a simpler interpretable 'surrogate' model, such as a linear model.
We offer two implementation variants of this method, LimeBase and Lime. LimeBase provides a generic framework to train a surrogate interpretable model, while Lime provides a more specific implementation than LimeBase in order to expose a consistent API with other perturbation-based algorithms. KernelSHAP is a method that uses the Lime framework to compute Shapley Values.
New Tutorials
We have added new tutorials to demonstrate Captum with CV tasks such as segmentation as well as in distributed environments. These tutorials are:
- Using Captum with torch.distributed
- Interpreting a semantic segmentation model
Attribution Improvements
Captum 0.3.0 has added improvements to existing attribution methods including:
- LayerActivation and LayerGradientXActivation now support computing attributions for multiple layers simultaneously. (PR #456).
- Neuron attribution methods now support providing a callable to select or aggregate multiple neurons for attribution, as well as slices to select a range of neurons. (PR #490, #495). The parameter name neuron_index has been deprecated and is replaced by neuron_selector, which supports either indices or a callable.
- Feature ablation and feature permutation now allow attribution with respect to multiple batch-aggregate scalars (e.g.loss) simultaneously (PR #425).
- Most attribution methods now support a multiply_by_inputs argument. For attribution methods which include a multiplier of inputs or inputs - baselines, this argument selects whether these multipliers should be incorporated or left out to obtain marginal attributions. (PR #432)
- Methods accepting internal batch size were updated to generate batches lazily rather than splitting an expanded input tensor, eliminating memory constraints when experimenting with a large number of steps. (PR #333).
Captum Insights
Bug Fixes
- Providing target as a list with inputs on CUDA devices now works appropriately. (Issue #316, PR #317)
- DeepLift issues with DataParallel models, particularly when providing additional forward args or multiple targets, have been fixed. (PR #335)
- Hooks added within an attribution method were previously not being removed if the attribution method encountered an exception before removing the hook. All hooks are now removed even if an exception is raised during attribution. (PR #340)
- LayerDeepLift was fixed to avoid applying hooks on the target layer when attributing layer output, which caused incorrect results or errors with some non-linearities (Issue #382, PR #390, #415).
- Non-leaf tensor gradient warning when using NoiseTunnel with Saliency has been fixed. (Issue #421, PR #426)
- Text visualization helpers now have option to display legend. (Issue #401, PR #403)
- Image visualization helpers fixed to normalize even if outlier threshold is close to 0 (Issue #393, PR #458).
Captum v0.2.0 Release
The second release, v0.2.0, of Captum adds a variety of new attribution algorithms as well as additional tutorials, type hints, and Google Colab support for Captum Insights.
New Attribution Algorithms
The following new attribution algorithms are provided, which can be applied to any type of PyTorch model, including DataParallel models. While the first release focused primarily on gradient-based attribution methods such as Integrated Gradients, the new algorithms include perturbation-based methods, marked by ^ below. We also add new attribution methods designed primarily for convolution networks, denoted by * below. All attribution methods share a consistent API structure to make it easy to switch between attribution methods.
Attribution of model output with respect to the input features
1. Guided Backprop *
2. Deconvolution *
3. Guided GradCAM *
4. Feature Ablation ^
5. Feature Permutation ^
6. Occlusion ^
7. Shapley Value Sampling ^
Attribution of model output with respect to the layers of the model
1. Layer GradCAM
2. Layer Integrated Gradients
3. Layer DeepLIFT
4. Layer DeepLIFT SHAP
5. Layer Gradient SHAP
6. Layer Feature Ablation ^
Attribution of neurons with respect to the input features
1. Neuron DeepLIFT
2. Neuron DeepLIFT SHAP
3. Neuron Gradient SHAP
4. Neuron Guided Backprop *
5. Neuron Deconvolution *
6. Neuron Feature Ablation ^
^ Denotes Perturbation-Based Algorithm. These methods compute attribution by evaluating the model on perturbed versions of the input as opposed to using gradient information.
* Denotes attribution method designed primarily for convolutional networks.
New Tutorials
We have added new tutorials to demonstrate Captum on BERT models, regression cases, and using perturbation-based methods. These tutorials include:
- Interpreting question answering with BERT
- Interpreting regression models using Boston House Prices Dataset
- Feature Ablation on Images
Type Hints
The Captum code base is now fully typed with Python type hints and type checked using mypy. Users can now accurately type-check code using Captum.
Bug Fixes and Minor Features
- All Captum methods now support in-place modules and operations. (Issue #156)
- Computing convergence delta was fixed to work appropriately on CUDA devices. (Issue #163)
- A ReLU flag was added to Layer GradCAM to optionally apply a ReLU operation to the returned attributions. (Issue #179)
- All layer and neuron attribution methods now support attribution with respect to either input or output of a module, based on the
attribute_to_layer_input
andattribute_to_neuron_input
flags. - All layer attribution methods now support modules with multiple outputs.
Captum Insights
- Captum Insights now works on Google Colab. (Issue #116)
- Captum Insights can also be launched as a Jupyter Notebook widget.
- New attribution methods in Captum Insights:
- Deconvolution
- Deep Lift
- Guided Backprop
- Input X Gradient
- Saliency
Initial Release of Captum for Model Interpretability, Attribution and Debugging
We just released our first version of the PyTorch Captum library for model interpretability!
Highlights
This first release, v0.1.0, supports a number of gradient-based attribution algorithms as well as Captum Insights, a visualization tool for model debugging and understanding.
Attribution Algorithms
The following general purpose gradient-based attribution algorithms are provided. These can be applied to any type of PyTorch model and input features, including image, text, and multimodal.
-
Attribution of output of the model with respect to the input features
- Saliency
- InputXGradient
- IntegratedGradient
- DeepLift
- DeepLiftShap
- GradientShap
-
Attribution of output of the model with respect to the layers of the model
- LayerActivation
- LayerGradientXActivation
- LayerConductance
- InternalInfluence
-
Attribution of neurons with respect to the input features
- NeuronGradient
- NeuronIntegratedGradients
- NeuronConductance
-
Attribution Algorithm + noisy sampling
- NoiseTunnel
NoiseTunnel helps to reduce the noise in the attributions that are assigned by attribution algorithms by using different noise tunnel techniques such as smoothgrad, smoothgrad_sq and vargrad.
- NoiseTunnel
Batch and Data Parallel Optimizations
Since some of the algorithms, like integrated gradients, expand input tensors internally, we want to make sure we can scale those tensors and our forward/backward computations efficiently. For that reason, we developed a feature that chunks tensors internally into internal_batch_size
pieces, an argument which can be passed as input to attribute
methods, which will make the library run forward and backward passes for each tensor batch separately and ultimately combine those after computing gradients.
The algorithms that support batched optimization are:
- IntegratedGradients
- LayerConductance
- InternalInfluence
- NeuronConductance
PyTorch data parallel models are also supported across all Captum algorithms, allowing users to take advantage of multiple GPUs when applying interpretability algorithms.
More details on these algorithms can be found on our website at captum.ai/docs/algorithms
Captum Insights
Captum Insights provides these algorithms in an interactive Jupyter notebook-based tool for model debugging and understanding. It can be used embedded within a notebook or run as a standalone application.
Features:
- Visualize attribution across sampled data for classification models
- Multimodal support with text, image and general features into a single model
- Filtering and debugging specific sets of classes and misclassified examples
- Jupyter notebook support for easy model and dataset modification
Insights is built with standard web technologies including JavaScript, CSS, React, Yarn and Flask.