If you find some overlooked papers, please open issues or pull requests, and provide the paper(s) in this format:
- **[]** Paper Name [[pdf]]() [[code]]()
- Visualizing and Understanding Convolutional Networks [pdf]
- Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps [pdf] [saliency code]
- Striving for Simplicity: The All Convolutional Net [pdf]
- Understanding Neural Networks Through Deep Visualization [pdf]
- Synthesizing the preferred inputs for neurons in neural networks via deep generator networks [pdf]
- Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks [pdf]
- Understanding Deep Image Representations by Inverting Them [pdf]
- Visualizing deep convolutional neural networks using natural pre-images [pdf]
- Understanding Neural Networks via Feature Visualization: A survey [pdf]
- Conditional iterative generation of images in latent space [pdf]
- Interpretable Explanations of Black Boxes by Meaningful Perturbation [pdf] [code] [code] [code]
- Gradient-Based Attribution Methods [pdf]
- Top-down Neural Attention by Excitation Backprop [pdf] [code]
- Salient Deconvolutional Networks [pdf]
- Explaining and Interpreting LSTMs [pdf]
- Explaining and Harnessing Adversarial Examples [pdf] [code] [code] [code]
- Adversarial Training for Free! [pdf] [code] [video]
- Fast Adversarial Training with Smooth Convergence [pdf] [code]
- Intriguing properties of neural networks [pdf]
- High Confidence Predictions for Unrecognizable Images [pdf]
- Contrastive Explanations in Neural Networks [pdf] [code] [slides]
- Towards better understanding of gradient-based attribution methods for Deep Neural Networks [pdf]
- On the (In)fidelity and Sensitivity of Explanations [pdf] [code]
- Unsupervised learning of object semantic parts from internal states of CNNs by population encoding [pdf]
- Diverse feature visualizations reveal invariances in early layers of deep neural networks [pdf]
- Interpretation of Neural Networks is Fragile [pdf]
- Towards Better Analysis of Deep Convolutional Neural Networks [pdf]
- Do semantic parts emerge in Convolutional Neural Networks? [pdf]
- Do Convolutional Neural Networks Learn Class Hierarchy? [pdf]
- A Benchmark for Interpretability Methods in Deep Neural Networks [pdf]
- On the Robustness of Interpretability Methods [pdf]
- Sanity Checks for Saliency Maps [pdf]
- Sanity Checks for Saliency Metrics [pdf]
- Relative Attributing Propagation: Interpreting the Comparative Contributions of Individual Units in Deep Neural Networks [pdf]
- Transformer Interpretability Beyond Attention Visualization [pdf] [code] [video]
- Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers [pdf] [code]
- Optimizing Relevance Maps of Vision Transformers Improves Robustness [pdf] [code]
- Investigating the influence of noise and distractors on the interpretation of neural networks [pdf]
- Do Explanations Explain? Model Knows Best [pdf] [code]
- Visualizing Deep Neural Network Decisions: Prediction Difference Analysis [pdf] [code]
- Visualizing and Understanding Generative Adversarial Networks [pdf] [code] [website]
- ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness [pdf] [code]
- Deep Image Prior [pdf] [code] [code] [code] [website]
- How Do Vision Transformers Work? [pdf]
- Breaking Batch Normalization for better explainability of Deep Neural Networks through Layer-wise Relevance Propagation [pdf]
- Layer-wise Relevance Propagation for Neural Networks with Local Renormalization Layers [pdf]
- Revisiting The Evaluation of Class Activation Mapping for Explainability: A Novel Metric and Experimental Analysis [pdf]
- Explaining image classifiers by removing input features using generative models [pdf] [code]
- Do Vision Transformers See Like Convolutional Neural Networks? [pdf]
- Explaining Classifiers using Adversarial Perturbations on the Perceptual Ball [pdf]
- Explaining Knowledge Distillation by Quantifying the Knowledge [pdf]
- Interpreting Super-Resolution Networks with Local Attribution Maps [pdf]
- Is the deconvolution layer the same as a convolutional layer? [pdf]
- Towards Human-Understandable Visual Explanations: Imperceptible High-frequency Cues Can Better Be Removed [pdf]
- Gradient Inversion with Generative Image Prior [pdf] [code]
- Explaining Local, Global, And Higher-Order Interactions In Deep Learning [pdf]
- Pitfalls of Explainable ML: An Industry Perspective [pdf]
- Do Feature Attribution Methods Correctly Attribute Features? [pdf] [code]
- Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis [pdf] [code]
- What do neural networks learn in image classification? A frequency shortcut perspective [pdf]
- The effectiveness of feature attribution methods and its correlation with automatic evaluation scores [pdf]
- Interpreting Deep Neural Networks with Relative Sectional Propagation by Analyzing Comparative Gradients and Hostile Activations [pdf]
- The (Un)reliability of saliency methods [pdf]
- Explaining Convolutional Neural Networks through Attribution-Based Input Sampling and Block-Wise Feature Aggregation [pdf]
- Explainable Models with Consistent Interpretations [pdf] [code]
- Interpreting Multivariate Shapley Interactions in DNNs [pdf]
- Finding and Fixing Spurious Patterns with Explanations [pdf]
- Monitoring Shortcut Learning using Mutual Information [pdf]
- Dissecting Deep Learning Networks - Visualizing Mutual Information [pdf]
- Revisiting Backpropagation Saliency Methods [pdf]
- Towards Visually Explaining Variational Autoencoders [pdf] [code] [code] [video] [video]
- Understanding Integrated Gradients with SmoothTaylor for Deep Neural Network Attribution [pdf]
- Understanding Deep Networks via Extremal Perturbations and Smooth Masks [pdf] [code]
- Interpretable and Fine-Grained Visual Explanations for Convolutional Neural Networks [pdf]
- Towards Robust Interpretability with Self-Explaining Neural Networks [pdf]
- Influence-Directed Explanations for Deep Convolutional Networks [pdf]
- Interpretable Basis Decomposition for Visual Explanation [pdf] [code]
- Real Time Image Saliency for Black Box Classifiers [pdf]
- Bias Also Matters: Bias Attribution for Deep Neural Network Explanation [pdf]
- Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation [pdf]
- Distilling Critical Paths in Convolutional Neural Networks [pdf]
- Understanding intermediate layers using linear classifier probes [pdf]
- Neural Response Interpretation through the Lens of Critical Pathways [pdf] [code] [code]
- Interpret Neural Networks by Identifying Critical Data Routing Paths [pdf]
- Reconstructing Training Data from Trained Neural Networks [pdf] [website]
- Visualizing Deep Similarity Networks [pdf] [code]
- Improving Deep Learning Interpretability by Saliency Guided Training [pdf] [code]
- Understanding Prediction Discrepancies in Machine Learning Classifiers [pdf]
- Intriguing Properties of Vision Transformers [pdf] [code]
- From Clustering to Cluster Explanations via Neural Networks [pdf]
- Compositional Explanations of Neurons [pdf]
- What Does CNN Shift Invariance Look Like? A Visualization Study [pdf] [code] [project]
- Explainability Methods for Graph Convolutional Neural Networks [pdf] [code]
- What do Vision Transformers Learn? A Visual Exploration [pdf]
- Learning Accurate and Interpretable Decision Rule Sets from Neural Networks [pdf]
- Visual Explanation for Deep Metric Learning [pdf] [code]
- Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations [pdf]
- Understanding Black-box Predictions via Influence Functions [pdf] [code]
- Unmasking Clever Hans predictors and assessing what machines really learn [pdf]
- Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation [pdf]
- Quantitative Evaluations on Saliency Methods: An Experimental Study [pdf]
- Metrics for saliency map evaluation of deep learning explanation methods [pdf]
- Neural Networks are Decision Trees [pdf]
- Towards Generating Human-Centered Saliency Maps without Sacrificing Accuracy [blog]
- Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data [pdf] [code] [code] [pip] [powerlaw]
- Exploring Explainability for Vision Transformers [blog] [code]
- Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces [pdf]
- Are Transformers More Robust Than CNNs? [pdf] [code]
- Exploring Corruption Robustness: Inductive Biases in Vision Transformers and MLP-Mixers [pdf] [code]
- Explanatory Interactive Machine Learning [pdf]
- Toward Faithful Explanatory Active Learning with Self-explainable Neural Nets [pdf]
- Studying How to Efficiently and Effectively Guide Models with Explanations [pdf] [supp]
- Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [pdf]
- Fixing Localization Errors to Improve Image Classification [pdf]
- Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias [pdf]
- On Guiding Visual Attention with Language Specification [pdf]
- Improving Interpretability via Regularization of Neural Activation Sensitivity [pdf]
- L1-Norm Gradient Penalty for Noise Reduction of Attribution Maps [pdf]
- Identifying Spurious Correlations and Correcting them with an Explanation-based Learning [pdf]
- Visual Attention Consistency under Image Transforms for Multi-Label Image Classification [pdf]
- Improving performance of deep learning models with axiomatic attribution priors and expected gradients [pdf]
- Fast Axiomatic Attribution for Neural Networks [pdf] [code]
- Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation [pdf]
- Detecting Statistical Interactions from Neural Network Weights [pdf]
- What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods [pdf] [code] [blog]
- The Hidden Language of Diffusion Models [pdf] [code] [website]
- Investigating Vision Transformer representations [blog]
- Mean Attention Distance in Vision Transformers [pdf] [code]
- Interpreting Vision and Language Generative Models with Semantic Visual Priors [pdf]
- Learning Concise and Descriptive Attributes for Visual Recognition [pdf]
- Visual Classification via Description from Large Language Models [pdf] [code] [website]
- Representation Engineering: A Top-Down Approach to AI Transparency [pdf] [code] [website]
- Multimodal Neurons in Pretrained Text-Only Transformers [pdf] [website]
- Are Vision Language Models Texture or Shape Biased and Can We Steer Them? [pdf]
- Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation [pdf] [website]
- [OpenXAI] Towards a Transparent Evaluation of Model Explanations [pdf] [code] [website]
- [TracIn] Estimating Training Data Influence by Tracing Gradient Descent [pdf] [code] [code]
- [VoG] Estimating Example Difficulty using Variance of Gradients [pdf] [code] [project]
- [D-RISE] Black-box Explanation of Object Detectors via Saliency Maps [pdf]
- [SmoothGrad] Removing noise by adding noise [pdf]
- [Integrated Gradients] Axiomatic Attribution for Deep Networks [pdf] [code] [code]
- [BlurIG] Attribution in Scale and Space [pdf] [code]
- [IDGI] A Framework to Eliminate Explanation Noise from Integrated Gradients [pdf] [code]
- [GIG] Guided Integrated Gradients: an Adaptive Path Method for Removing Noise [pdf] [code]
- [SPI] Beyond Single Path Integrated Gradients for Reliable Input Attribution via Randomized Path Sampling [pdf] [supp]
- [IIA] Visual Explanations via Iterated Integrated Attributions [pdf] [supp] [code]
- [Integrated Hessians] Explaining Explanations: Axiomatic Feature Interactions for Deep Networks [pdf] [code]
- [Archipelago] How does this interaction affect me? Interpretable attribution for feature interactions [pdf] [code]
- [I-GOS] Visualizing Deep Networks by Optimizing with Integrated Gradients [pdf]
- [MoreauGrad] Sparse and Robust Interpretation of Neural Networks via Moreau Envelope [pdf] [code]
- [SAGs] One Explanation is Not Enough: Structured Attention Graphs for Image Classification [pdf]
- [LRP] On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation [pdf] [pdf] [pdf] [tutorial] [code] [code] [blog]
- [DeepDream] Inceptionism: Going Deeper into Neural Networks [blog] [code] [code] [code]
- [Archipelago] How does this interaction affect me? Interpretable attribution for feature interactions [pdf]
- [RISE] Randomized Input Sampling for Explanation of Black-box Models [pdf] [code] [website]
- [DeepLIFT] Learning Important Features Through Propagating Activation Differences [pdf] [video] [code]
- [ROAD] A Consistent and Efficient Evaluation Strategy for Attribution Methods [pdf] [code]
- [Layer Masking] Towards Improved Input Masking for Convolutional Neural Networks [pdf] [code]
- [Summit] Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations [pdf]
- [SHAP] A Unified Approach to Interpreting Model Predictions [pdf] [code]
- [MM-SHAP] A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks [pdf] [code] [video]
- [Anchors] High-Precision Model-Agnostic Explanations [pdf] [code]
- [Layer Conductance] How Important Is a Neuron? [pdf] [pdf]
- [BiLRP] Building and Interpreting Deep Similarity Models [pdf] [code]
- [CGC] Consistent Explanations by Contrastive Learning [pdf] [code]
- [DeepInversion] Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion [pdf] [code]
- [GradInversion] See through Gradients: Image Batch Recovery via GradInversion [pdf]
- [GradViT] Gradient Inversion of Vision Transformers [pdf]
- [Plug-In Inversion] Model-Agnostic Inversion for Vision with Data Augmentations [pdf]
- [GIFD] A Generative Gradient Inversion Method with Feature Domain Optimization [pdf]
- [X-OIA] Explainable Object-induced Action Decision for Autonomous Vehicles [pdf] [code] [website]
- [CAT-XPLAIN] Causality for Inherently Explainable Transformers [pdf] [code]
- [CLRP] Understanding Individual Decisions of CNNs via Contrastive Backpropagation [pdf] [code]
- [HINT] Leveraging Explanations to Make Vision and Language Models More Grounded [pdf]
- [BagNet] Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet [pdf] [code] [blog]
- [SMERF] Sanity Simulations for Saliency Methods [pdf]
- [ELUDE] Generating interpretable explanations via a decomposition into labelled and unlabelled features [pdf]
- [C3LT] Cycle-Consistent Counterfactuals by Latent Transformations [pdf]
- [B-cos] Alignment is All We Need for Interpretability [pdf] [code]
- [ShapNets] Shapley Explanation Networks [pdf] [code]
- [CALM] Keep CALM and Improve Visual Feature Attribution [pdf] [code]
- [SGLRP] Explaining Convolutional Neural Networks using Softmax Gradient Layer-wise Relevance Propagation [pdf]
- [DTD] Explaining NonLinear Classification Decisions with Deep Taylor Decomposition [pdf] [code]
- [GradCAT] Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding [pdf]
- [FastSHAP] Real-Time Shapley Value Estimation [pdf] [code]
- [VisualBackProp] Efficient visualization of CNNs [pdf]
- [NBDT] Neural-Backed Decision Trees [pdf] [code]
- [XRAI] Better Attributions Through Regions [pdf]
- [MeGe, ReCo] How Good is your Explanation? Algorithmic Stability Measures to Assess the Quality of Explanations for Deep Neural Networks [pdf]
- [FCDD] Explainable Deep One-Class Classification [pdf] [code]
- [DiCE] Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations [pdf] [code] [blog]
- [ARM] Blending Anti-Aliasing into Vision Transformer [pdf] [code]
- [RelEx] Building Reliable Explanations of Unreliable Neural Networks: Locally Smoothing Perspective of Model Interpretation [pdf] [code]
- [X-Pruner] eXplainable Pruning for Vision Transformers [pdf] [code]
- [ShearletX] Explaining Image Classifiers with Multiscale Directional Image Representation [pdf]
- [MACO] Unlocking Feature Visualization for Deeper Networks with MAgnitude Constrained Optimization [pdf] [website]
- [Guided Zoom] Questioning Network Evidence for Fine-Grained Classification [pdf] [pdf] [code]
- [DAAM] Interpreting Stable Diffusion Using Cross Attention [pdf] [code] [demo]
- [Diffusion Explainer] Visual Explanation for Text-to-image Stable Diffusion [pdf] [website] [video]
- [ECLIP] Exploring Visual Explanations for Contrastive Language-Image Pre-training [pdf]
- [CNC] Correct-N-Contrast: A Contrastive Approach for Improving Robustness to Spurious Correlations [pdf]
- [AMC] Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations [pdf]
- [ClickMe] Learning what and where to attend [pdf]
- [MaskTune] Mitigating Spurious Correlations by Forcing to Explore [pdf]
- [CoDA-Nets] Convolutional Dynamic Alignment Networks for Interpretable Classifications [pdf]
- [ABN] Attention Branch Network: Learning of Attention Mechanism for Visual Explanation [pdf] [pdf]
- [RES] A Robust Framework for Guiding Visual Explanation [pdf]
- [IAA] Aligning Eyes between Humans and Deep Neural Network through Interactive Attention Alignment [pdf]
- [DiFull] Towards Better Understanding Attribution Methods [pdf] [code]
- [AttentionViz] A Global View of Transformer Attention [pdf]
- [Rosetta Neurons] Mining the Common Units in a Model Zoo [pdf] [code] [website]
- [SAFARI] Versatile and Efficient Evaluations for Robustness of Interpretability [pdf]
- [LANCE] Stress-testing Visual Models by Generating Language-guided Counterfactual Images [pdf] [code] [website]
- [FunnyBirds] A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods [pdf] [code]
- [MAGI] Multi-Annotated Explanation-Guided Learning [pdf]
- [CCE] Towards Visual Contrastive Explanations for Neural Networks [pdf]
- [CNN Filter DB] An Empirical Investigation of Trained Convolutional Filters [pdf] [code]
- [VLSlice] Interactive Vision-and-Language Slice Discovery [pdf] [code] [website] [demo] [video] [video]
- [Feature Sieve] Overcoming Simplicity Bias in Deep Networks using a Feature Sieve [pdf] [blog]
- [CAM] Learning Deep Features for Discriminative Localization [pdf]
- [Grad-CAM] Visual Explanations from Deep Networks via Gradient-based Localization [pdf] [code] [code] [website]
- [Grad-CAM++] Improved Visual Explanations for Deep Convolutional Networks [pdf] [code]
- [Score-CAM] Score-Weighted Visual Explanations for Convolutional Neural Networks [pdf] [code] [code]
- [LayerCAM] Exploring Hierarchical Class Activation Maps for Localization [pdf] [code]
- [Eigen-CAM] Class Activation Map using Principal Components [pdf]
- [XGrad-CAM] Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs [pdf] [code]
- [Ablation-CAM] Visual Explanations for Deep Convolutional Network via Gradient-free Localization [pdf]
- [Group-CAM] Group Score-Weighted Visual Explanations for Deep Convolutional Networks [pdf] [code]
- [FullGrad] Full-Gradient Representation for Neural Network Visualization [pdf]
- [Relevance-CAM] Your Model Already Knows Where to Look [pdf] [code]
- [Poly-CAM] High resolution class activation map for convolutional neural networks [pdf] [code]
- [Smooth Grad-CAM++] An Enhanced Inference Level Visualization Technique for Deep Convolutional Neural Network Models [pdf] [code]
- [Zoom-CAM] Generating Fine-grained Pixel Annotations from Image Labels [pdf]
- [FD-CAM] Improving Faithfulness and Discriminability of Visual Explanation for CNNs [pdf] [code]
- [LIFT-CAM] Towards Better Explanations of Class Activation Mapping [pdf]
- [Shap-CAM] Visual Explanations for Convolutional Neural Networks based on Shapley Value [pdf]
- [HiResCAM] Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks [pdf]
- [FAM] Visual Explanations for the Feature Representations from Deep Convolutional Networks [pdf]
- [MinMaxCAM] Improving object coverage for CAM-based Weakly Supervised Object Localization [pdf]
- [LIME] "Why Should I Trust You?": Explaining the Predictions of Any Classifier [pdf] [code]
- [InteractionLIME] Model-Agnostic Visual Explanations via Approximate Bilinear Models [pdf]
- [NormLime] A New Feature Importance Metric for Explaining Deep Neural Networks [pdf]
- [GALE] Global Aggregations of Local Explanations for Black Box models [pdf]
- [D-LIME] A Deterministic Local Interpretable Model-Agnostic Explanations Approach for Computer-Aided Diagnosis Systems [pdf]
- [CBM] Concept Bottleneck Models [pdf] [code]
- [Label-free CBM] Label-Free Concept Bottleneck Models [pdf] [code]
- [PCBMs] Post-hoc Concept Bottleneck Models [pdf] [code]
- [CDM] Sparse Linear Concept Discovery Models [pdf] [code]
- [BotCL] Learning Bottleneck Concepts in Image Classification [pdf] [code]
- [LaBo] Language Model Guided Concept Bottlenecks for Interpretable Image Classification [pdf] [code]
- [CompMap] Do Vision-Language Pretrained Models Learn Composable Primitive Concepts? [pdf] [code] [website]
- [FVLC] Faithful Vision-Language Interpretation via Concept Bottleneck Models [pdf]
- [DN-CBM] Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery [pdf] [code]
- [ECBMs] Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic Interpretations [pdf] [code]
- Promises and Pitfalls of Black-Box Concept Learning Models [pdf]
- Do Concept Bottleneck Models Learn as Intended? [pdf]
- [Network Dissection]: Quantifying Interpretability of Deep Visual Representations [pdf] [code] [website]
- [CLIP-Dissect] Automatic Description of Neuron Representations in Deep Vision Networks [pdf] [code]
- [Net2Vec] Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks [pdf]
- [MILAN] Natural Language Descriptions of Deep Visual Features [pdf] [code] [website]
- [INViTE] INterpret and Control Vision-Language Models with Text Explanations [pdf] [code]
- [CLIP-Decomposition] Interpreting CLIP's Image Representation via Text-Based Decomposition [pdf] [code] [website]
- [Second-Order CLIP-Decomposition] Interpreting the Second-Order Effects of Neurons in CLIP [pdf] [code] [website]
- [ZS-A2T] Zero-shot Translation of Attention Patterns in VQA Models to Natural Language [pdf] [code]
- [FALCON] Identifying Interpretable Subspaces in Image Representations [pdf] [code]
- [STAIR] Learning Sparse Text and Image Representation in Grounded Tokens [pdf]
- [DISCOVER] Making Vision Networks Interpretable via Competition and Dissection [pdf]
- [DeViL] Decoding Vision features into Language [pdf] [code]
- [LaViSE] Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention [pdf] [code]
- [ProtoTrees] Neural Prototype Trees for Interpretable Fine-grained Image Recognition [pdf] [code]
- [ProtoPNet] This Looks Like That: Deep Learning for Interpretable Image Recognition [pdf] [code]
- [ST-ProtoPNet] Learning Support and Trivial Prototypes for Interpretable Image Classification [pdf]
- [Deformable ProtoPNet] An Interpretable Image Classifier Using Deformable Prototypes [pdf]
- [SPARROW] Semantically Coherent Prototypes for Image Classification [pdf]
- [Proto2Proto] Can you recognize the car, the way I do? [pdf] [code]
- [PDiscoNet] Semantically consistent part discovery for fine-grained recognition [pdf] [code]
- [ProtoPool] Interpretable Image Classification with Differentiable Prototypes Assignment [pdf] [code]
- [ProtoPShare] Prototype Sharing for Interpretable Image Classification and Similarity Discovery [pdf]
- [PW-Net] Towards Interpretable Deep Reinforcement Learning with Human-Friendly Prototypes [pdf] [code]
- [ProtoPDebug] Concept-level Debugging of Part-Prototype Networks [pdf]
- [DSX] Describe, Spot and Explain: Interpretable Representation Learning for Discriminative Visual Reasoning [pdf]
- [HINT] Hierarchical Neuron Concept Explainer [pdf] [code]
- [ConceptSHAP] On Completeness-aware Concept-Based Explanations in Deep Neural Networks [pdf] [code]
- [CW] Concept Whitening for Interpretable Image Recognition [pdf]
- [VRX] Interpreting with Structural Visual Concepts [pdf]
- [MOCE] Extracting Model-Oriented Concepts for Explaining Deep Neural Networks [pdf] [code]
- [ConceptExplainer] Interactive Explanation for Deep Neural Networks from a Concept Perspective [pdf]
- [ProtoSim] Prototype-based Dataset Comparison [pdf] [code] [website]
- [TCAV] Quantitative Testing with Concept Activation Vectors [pdf] [code] [book chapter]
- [SACV] Hidden Layer Interpretation with Spatial Activation Concept Vector [pdf] [code]
- [ACE] Towards Automatic Concept-based Explanations [pdf] [code]
- [DFF] Deep Feature Factorization For Concept Discovery [pdf] [code] [code] [blog and code]
- [CRP] From “Where” to “What”: Towards Human-Understandable Explanations through Concept Relevance Propagation [pdf] [code]
- [FeatUp] A Model-Agnostic Framework for Features at Any Resolution [pdf] [code] [colab] [website] [demo]
- [LENS] A Holistic Approach to Unifying Automatic Concept Extraction and Concept Importance Estimation [pdf] [website]
- [CRAFT] Concept Recursive Activation FacTorization for Explainability [pdf] [code] [website]
- Deep ViT Features as Dense Visual Descriptors [pdf] [supp] [code] [website]
- Evaluation and Improvement of Interpretability for Self-Explainable Part-Prototype Networks [pdf] [code]
- Distill
- Multimodal Neurons in Artificial Neural Networks [paper] [blog] [code]
- The Building Blocks of Interpretability [paper]
- Visualizing the Impact of Feature Attribution Baselines [paper]
- An Overview of Early Vision in InceptionV1 [paper]
- Feature Visualization [paper]
- Differentiable Image Parameterizations [paper]
- Deconvolution and Checkerboard Artifacts [paper]
- Visualizing memorization in RNNs [paper]
- Exploring Neural Networks with Activation Atlases [paper]
- High Fidelity Visualization of What Your Self-Supervised Representation Knows About [pdf]
- How Well Do Self-Supervised Models Transfer? [pdf] [code]
- A critical analysis of self-supervision, or what we can learn from a single image [pdf] [code] [video]
- How transferable are features in deep neural networks? [pdf]
- Understanding the Role of Self-Supervised Learning in Out-of-Distribution Detection Task [pdf]
- Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning [pdf] [code] [website]
- Revealing the Dark Secrets of Masked Image Modeling [pdf]
- Visualization of Supervised and Self-Supervised Neural Networks via Attribution Guided Factorization [pdf] [code]
- Understanding Failure Modes of Self-Supervised Learning [pdf]
- Explaining Self-Supervised Image Representations with Visual Probing [pdf] [pdf] [code]
- Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations [pdf]
- What Happens to the Source Domain in Transfer Learning? [pdf]
- Overwriting Pretrained Bias with Finetuning Data [pdf]
- Exploring Model Transferability through the Lens of Potential Energy [pdf] [code]
- How Far Pre-trained Models Are from Neural Collapse on the Target Dataset Informs their Transferability [pdf] [supp]
- What Contrastive Learning Learns Beyond Class-wise Features? [pdf]
- Are Large-scale Datasets Necessary for Self-Supervised Pre-training? [pdf]
- What makes instance discrimination good for transfer learning? [pdf] [website]
- Revisiting the Transferability of Supervised Pretraining: an MLP Perspective [pdf]
- Intriguing Properties of Contrastive Losses [pdf] [code]
- When Does Contrastive Visual Representation Learning Work? [pdf]
- What Makes for Good Views for Contrastive Learning? [pdf] [code]
- What Should Not Be Contrastive in Contrastive Learning [pdf]
- Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases [pdf]
- Are all negatives created equal in contrastive instance discrimination? [pdf]
- Improving Pixel-based MIM by Reducing Wasted Modeling Capability [pdf] [code]
- On Pretraining Data Diversity for Self-Supervised Learning [pdf] [code]
- Circuits [series]
- Transformer Circuits [series]
- Progress measures for grokking via mechanistic interpretability [pdf]
- Circuit Component Reuse Across Tasks in Transformer Language Models [pdf]
- Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks [pdf]
- TransformerLens
- [GVE] Generating visual explanations [pdf]
- [PJ-X] Multimodal Explanations: Justifying Decisions and Pointing to the Evidence [pdf] [code]
- [FME] Faithful Multimodal Explanation for Visual Question Answering [pdf]
- [RVT] Natural Language Rationales with Full-Stack Visual Reasoning [pdf] [code]
- [e-UG] e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks [pdf] [code]
- [NLX-GPT] A Model for Natural Language Explanations in Vision and Vision-Language Tasks [pdf] [code]
- [Uni-NLX] Unifying Textual Explanations for Vision and Vision-Language Tasks [pdf] [code]
- [Explain Yourself] Leveraging Language Models for Commonsense Reasoning [pdf]
- [e-SNLI] Natural Language Inference with Natural Language Explanations [pdf]
- [CLEVR-X] A Visual Reasoning Dataset for Natural Language Explanations [pdf] [code] [website]
- [VQA-E] Explaining, Elaborating, and Enhancing Your Answers for Visual Questions [pdf]
- [PtE] Are Training Resources Insufficient? Predict First Then Explain! [pdf]
- [WT5] Training Text-to-Text Models to Explain their Predictions [pdf]
- [RExC] Knowledge-Grounded Self-Rationalization via Extractive and Natural Language Explanations [pdf] [code]
- [ELV] Towards Interpretable Natural Language Understanding with Explanations as Latent Variables [pdf] [code]
- [FEB] Few-Shot Self-Rationalization with Natural Language Prompts [pdf]
- [CALeC] Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language Explanations [pdf]
- [OFA-X] Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations [pdf] [code]
- [S3C] Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning [pdf]
- [ReVisE] A Recursive Approach Towards Vision-Language Explanation [pdf] [code]
- [Multimodal-CoT] Multimodal Chain-of-Thought Reasoning in Language Models [pdf] [code]
- [CCoT] Compositional Chain-of-Thought Prompting for Large Multimodal Models [pdf]
- Grounding Visual Explanations [pdf]
- Textual Explanations for Self-Driving Vehicles [pdf] [code]
- Measuring Association Between Labels and Free-Text Rationales [pdf] [code]
- Reframing Human-AI Collaboration for Generating Free-Text Explanations [pdf]
- Few-Shot Out-of-Domain Transfer Learning of Natural Language Explanations [pdf]
- Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned [pdf] [code]
- Quantifying Attention Flow in Transformers [pdf]
- Locating and Editing Factual Associations in GPT [pdf] [code] [colab] [colab] [video] [website]
- Visualizing and Understanding Neural Machine Translation [pdf]
- Transformer Feed-Forward Layers Are Key-Value Memories [pdf]
- A Diagnostic Study of Explainability Techniques for Text Classification [pdf] [code]
- A Survey of the State of Explainable AI for Natural Language Processing [pdf]
- How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking [pdf] [code]
- Why use attention as explanation when we have saliency methods? [pdf]
- Attention is Not Only a Weight: Analyzing Transformers with Vector Norms [pdf]
- Attention is not Explanation [pdf]
- Attention is not not Explanation [pdf]
- Analyzing Individual Neurons in Pre-trained Language Models [pdf]
- Identifying and Controlling Important Neurons in Neural Machine Translation [pdf]
- “Will You Find These Shortcuts?” A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification [pdf] [blog]
- Interpreting Language Models with Contrastive Explanations [pdf] [code]
- Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers [pdf] [pdf] [code]
- Discretized Integrated Gradients for Explaining Language Models [pdf] [code]
- Did the Model Understand the Question? [pdf]
- Explaining Compositional Semantics for Neural Sequence Models [pdf] [code]
- Fooling Explanations in Text Classifiers [pdf]
- Interpreting GPT: The Logit Lens [blog]
- A Circuit for Indirect Object Identification in GPT-2 small [pdf]
- Inside BERT from BERT-related-papers Github [link]
- Massive Activations in Large Language Models [pdf] [code] [website]
- Language Models Represent Space and Time [pdf] [code]
- Are self-explanations from Large Language Models faithful? [pdf]
- Awesome LLM Interpretability
- Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications [pdf]
- Benchmarking and Survey of Explanation Methods for Black Box Models [pdf]
- An Empirical Study of Deep Neural Network Explanation Methods [pdf] [code]
- Methods for Interpreting and Understanding Deep Neural Networks [pdf]
- From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI [pdf]
- Leveraging Explanations in Interactive Machine Learning: An Overview [pdf]
- [SLOT-Attention] Object-Centric Learning with Slot Attention [pdf] [code] [code]
- [SCOUTER] Slot Attention-based Classifier for Explainable Image Recognition [pdf] [code]
- [SPOT] Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers [pdf] [code]
- Captum
- PyTorch Grad-CAM [github] [docs]
- Lucid [tensorflow] [pytorch]
- Zennit [github] [docs] [paper]
- TorchCAM [github] [docs] [demo]
- pytorch-cnn-visualizations
- VL-InterpreT [pdf] [github] [demo] [video]
- DeepExplain
- TorchRay [github] [docs]
- grad-cam-pytorch
- ViT-Prisma
- CLIP Explainability
- Explainable AI: Interpreting, Explaining and Visualizing Deep Learning
- Interpretable Machine Learning
- Transformer Circuits
- OpenAI Microscope
- Summary - Captum
- Alibi Docs
- jacobgil blogs
- Stanford CS231n slides
- TU Berlin Notes
- Tutorial Notebooks
- NPTEL-NOC IITM Videos [Early Methods] [Visualization Methods] [CAM Methods] [Recent Methods] [Beyond Explaining]
- AI Explained Video Series by Fiddler AI
- XAI Explained Video Series by DeepFindr
- Visualizing and Understanding Stanford Video
- CVPR 2021 Tutorial
- CVPR 2023 Tutorial
- CS231n Assignments Solutions
- Filter and Feature Maps Visualization [blog] [blog] [blog] [pytorch discuss]
- Hooks in PyTorch [tutorial] [tutorial] [tutorial] [tutorial]
- Feature Extraction using Torch FX [tutorial]
- Feature extraction for model inspection [tutorial]