Merge remote-tracking branch 'origin' into rollout

inseq-team · Apr 30, 2024 · 287caa4 · 287caa4
2 parents e27c6b9 + 04dde30
commit 287caa4
Show file tree

Hide file tree

Showing 50 changed files with 2,419 additions and 275 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,26 +4,20 @@
 
 ## 🚀 Features
 
-- Support for multi-GPU attribution ([#238](https://github.com/inseq-team/inseq/pull/238))
-- Added `inseq attribute-context` CLI command to support the [PECoRe framework] for detecting and attributing context reliance in generative LMs ([#237](https://github.com/inseq-team/inseq/pull/237))
-
-## 🔧 Fixes & Refactoring
-
-- Fix `ContiguousSpanAggregator` and `SubwordAggregator` edge case of single-step generation ([#247](https://github.com/inseq-team/inseq/pull/247))
-- Move tensors to CPU right away in the forward pass to avoid OOM when cloning ([#245](https://github.com/inseq-team/inseq/pull/245))
-- Fix `remap_from_filtered` behavior on sequence_scores tensors. ([#245](https://github.com/inseq-team/inseq/pull/245))
-- Use torch-native padding when converting lists of `FeatureAttributionStepOutput` to `FeatureAttributionSequenceOutput` in `get_sequences_from_batched_steps`. ([#245](https://github.com/inseq-team/inseq/pull/245))
-- Bump `ruff` version ([#245](https://github.com/inseq-team/inseq/pull/245))
-- Drop `poetry` in favor of [`uv`](https://github.com/astral-sh/uv) to accelerate package installation and simplify config in `pyproject.toml`. ([#249](https://github.com/inseq-team/inseq/pull/249))
-- Drop `darglint` in favor of `pydoclint`. ([#249](https://github.com/inseq-team/inseq/pull/249))
-- Replace Arxiv with ACL Anthology badge in `README`. ([#249](https://github.com/inseq-team/inseq/pull/249))
-- Add first version of `CHANGELOG.md` ([#249](https://github.com/inseq-team/inseq/pull/249))
-- Added multithread support for running tests using `pytest-xdist`
+- Added new models `DbrxForCausalLM`, `OlmoForCausalLM`, `Phi3ForCausalLM`, `Qwen2MoeForCausalLM` to model config.
+
+## 🔧 Fixes and Refactoring
+
+- Fix the issue in the attention implementation from [#268](https://github.com/inseq-team/inseq/issues/268) where non-terminal position in the tensor were set to nan if they were 0s ([#269](https://github.com/inseq-team/inseq/pull/269)).
+
+- Fix the pad token in cases where it is not specified by default in the loaded model (e.g. for Qwen models) ([#269](https://github.com/inseq-team/inseq/pull/269)).
+
+- Fix bug reported in [#266](https://github.com/inseq-team/inseq/issues/266) making `value_zeroing` unusable for SDPA attention. This enables using the method on models using SDPA attention as default (e.g. `GemmaForCausalLM`) without passing `model_kwargs={'attn_implementation': 'eager'}` ([#267](https://github.com/inseq-team/inseq/pull/267)).
 
 ## 📝 Documentation and Tutorials
 
 *No changes*
 
 ## 💥 Breaking Changes
 
-*No changes*
+*No changes*
diff --git a/Makefile b/Makefile
@@ -60,7 +60,7 @@ install-dev:
 
 .PHONY: install-ci
 install-ci:
-	make uv-activate && uv pip install -e .[lint]
+	make uv-activate && uv pip install -r requirements-dev.txt
 
 .PHONY: update-deps
 update-deps:

diff --git a/README.md b/README.md
@@ -147,6 +147,10 @@ Use the `inseq.list_feature_attribution_methods` function to list all available
 
 - `lime`: ["Why Should I Trust You?": Explaining the Predictions of Any Classifier](https://arxiv.org/abs/1602.04938) (Ribeiro et al., 2016)
 
+- `value_zeroing`: [Quantifying Context Mixing in Transformers](https://aclanthology.org/2023.eacl-main.245/) (Mohebbi et al. 2023)
+
+- `reagent`: [ReAGent: A Model-agnostic Feature Attribution Method for Generative Language Models](https://arxiv.org/abs/2402.00794) (Zhao et al., 2024)
+
 #### Step functions
 
 Step functions are used to extract custom scores from the model at each step of the attribution process with the `step_scores` argument in `model.attribute`. They can also be used as targets for attribution methods relying on model outputs (e.g. gradient-based methods) by passing them as the `attributed_fn` argument. The following step functions are currently supported:
@@ -301,7 +305,10 @@ If you use Inseq in your research we suggest to include a mention to the specifi
 
 ## Research using Inseq
 
-Inseq has been used in various research projects. A list of known publications that use Inseq to conduct interpretability analyses of generative models is shown below. If you know more, please let us know or submit a pull request (*last updated: February 2024*).
+Inseq has been used in various research projects. A list of known publications that use Inseq to conduct interpretability analyses of generative models is shown below.
+
+> [!TIP]
+> Last update: April 2024. Please open a pull request to add your publication to the list.
 
 <details>
   <summary><b>2023</b></summary>
@@ -322,6 +329,7 @@ Inseq has been used in various research projects. A list of known publications t
   <ol>
     <li><a href="https://arxiv.org/abs/2401.12576">LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools</a> (Wang et al., 2024)</li>
     <li><a href="https://arxiv.org/abs/2402.00794">ReAGent: A Model-agnostic Feature Attribution Method for Generative Language Models</a> (Zhao et al., 2024)</li>
+    <li><a href="https://arxiv.org/abs/2404.02421">Revisiting subword tokenization: A case study on affixal negation in large language models</a> (Truong et al., 2024)</li>
   </ol>
 
 </details>
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -21,13 +21,13 @@
 # -- Project information -----------------------------------------------------
 
 project = "inseq"
-copyright = "2023, The Inseq Team, Licensed under the Apache License, Version 2.0"
+copyright = "2024 , The Inseq Team, Licensed under the Apache License, Version 2.0"
 author = "The Inseq Team"
 
 # The short X.Y version
-version = "0.6"
+version = "0.7"
 # The full version, including alpha/beta/rc tags
-release = "0.6.0.dev0"
+release = "0.7.0.dev0"
 
 
 # Prefix link to point to master, comment this during version release and uncomment below line

diff --git a/docs/source/main_classes/cli.rst b/docs/source/main_classes/cli.rst
@@ -23,7 +23,7 @@ Three commands are supported:
 
 - ``inseq attribute-dataset``: Extends ``attribute`` to full dataset using Hugging Face ``datasets.load_dataset`` API.
 
-- ``inseq attribute-context``: Detects and attribute context dependence for generation tasks using the approach of `Sarti et al. (2023) <https://arxiv.org/abs/2310.0118>`__.
+- ``inseq attribute-context``: Detects and attribute context dependence for generation tasks using the approach of `Sarti et al. (2023) <https://arxiv.org/abs/2310.01188>`__.
 
 ``attribute``
 -----------------------------------------------------------------------------------------------------------------------
@@ -47,6 +47,6 @@ The ``attribute-dataset`` command extends the ``attribute`` command to full data
 -----------------------------------------------------------------------------------------------------------------------
 
 The ``attribute-context`` command detects and attributes context dependence for generation tasks using the approach of
-`Sarti et al. (2023) <https://arxiv.org/abs/2310.0118>`__. The command takes the following arguments:
+`Sarti et al. (2023) <https://arxiv.org/abs/2310.01188>`__. The command takes the following arguments:
 
-.. autoclass:: inseq.commands.attribute_context.attribute_context_args.AttributeContextArgs
+.. autoclass:: inseq.commands.attribute_context.attribute_context_args.AttributeContextArgs
diff --git a/docs/source/main_classes/feature_attribution.rst b/docs/source/main_classes/feature_attribution.rst
@@ -17,7 +17,7 @@ Attribution Methods
 .. autoclass:: inseq.attr.FeatureAttribution
     :members:
 
-Gradient Attribution Methods
+Gradient-based Attribution Methods
 -----------------------------------------------------------------------------------------------------------------------
 
 .. autoclass:: inseq.attr.feat.GradientAttributionRegistry
@@ -67,7 +67,7 @@ Layer Attribution Methods
     :members:
 
 
-Attention Attribution Methods
+Internals-based Attribution Methods
 -----------------------------------------------------------------------------------------------------------------------
 
 .. autoclass:: inseq.attr.feat.InternalsAttributionRegistry
@@ -76,3 +76,39 @@ Attention Attribution Methods
 
 .. autoclass:: inseq.attr.feat.AttentionWeightsAttribution
     :members:
+
+Perturbation-based Attribution Methods
+-----------------------------------------------------------------------------------------------------------------------
+
+.. autoclass:: inseq.attr.feat.PerturbationAttributionRegistry
+    :members:
+
+.. autoclass:: inseq.attr.feat.OcclusionAttribution
+    :members:
+
+.. autoclass:: inseq.attr.feat.LimeAttribution
+    :members:
+
+.. autoclass:: inseq.attr.feat.ValueZeroingAttribution
+    :members:
+
+.. autoclass:: inseq.attr.feat.ReagentAttribution
+    :members:
+
+    .. automethod:: __init__
+
+.. code:: python
+
+    import inseq
+
+    model = inseq.load_model(
+        "gpt2-medium",
+        "reagent",
+        keep_top_n=5,
+        stopping_condition_top_k=3,
+        replacing_ratio=0.3,
+        max_probe_steps=3000,
+        num_probes=8
+    )
+    out = model.attribute("Super Mario Land is a game that developed by")
+    out.show()
diff --git a/inseq/attr/feat/__init__.py b/inseq/attr/feat/__init__.py
@@ -17,6 +17,9 @@
 from .perturbation_attribution import (
     LimeAttribution,
     OcclusionAttribution,
+    PerturbationAttributionRegistry,
+    ReagentAttribution,
+    ValueZeroingAttribution,
 )
 
 __all__ = [
@@ -39,4 +42,7 @@
     "OcclusionAttribution",
     "LimeAttribution",
     "SequentialIntegratedGradientsAttribution",
+    "ValueZeroingAttribution",
+    "PerturbationAttributionRegistry",
+    "ReagentAttribution",
 ]
diff --git a/inseq/attr/feat/attribution_utils.py b/inseq/attr/feat/attribution_utils.py
@@ -144,11 +144,15 @@ def extract_args(
 def get_source_target_attributions(
     attr: Union[StepAttributionTensor, tuple[StepAttributionTensor, StepAttributionTensor]],
     is_encoder_decoder: bool,
+    has_sequence_scores: bool = False,
 ) -> tuple[Optional[StepAttributionTensor], Optional[StepAttributionTensor]]:
     if isinstance(attr, tuple):
         if is_encoder_decoder:
-            return (attr[0], attr[1]) if len(attr) > 1 else (attr[0], None)
+            if has_sequence_scores:
+                return (attr[0], attr[1], attr[2])
+            else:
+                return (attr[0], attr[1]) if len(attr) > 1 else (attr[0], None)
         else:
-            return (None, attr[0])
+            return (None, None, attr[0]) if has_sequence_scores else (None, attr[0])
     else:
         return (attr, None) if is_encoder_decoder else (None, attr)
diff --git a/inseq/attr/feat/feature_attribution.py b/inseq/attr/feat/feature_attribution.py
@@ -114,6 +114,7 @@ def __init__(self, attribution_model: "AttributionModel", hook_to_model: bool =
         self.use_hidden_states: bool = False
         self.use_predicted_target: bool = True
         self.use_model_config: bool = False
+        self.is_final_step_method: bool = False
         if hook_to_model:
             self.hook(**kwargs)
 
@@ -272,6 +273,35 @@ def _run_compatibility_checks(self, attributed_fn) -> None:
                 " method."
             )
 
+    @staticmethod
+    def _build_multistep_output_from_single_step(
+        single_step_output: FeatureAttributionStepOutput,
+        attr_pos_start: int,
+        attr_pos_end: int,
+    ) -> list[FeatureAttributionStepOutput]:
+        if single_step_output.step_scores:
+            raise ValueError("step_scores are not supported for final step attribution methods.")
+        num_seq = len(single_step_output.prefix)
+        steps = []
+        for pos_idx in range(attr_pos_start, attr_pos_end):
+            step_output = single_step_output.clone_empty()
+            step_output.source = single_step_output.source
+            step_output.prefix = [single_step_output.prefix[seq_idx][:pos_idx] for seq_idx in range(num_seq)]
+            step_output.target = (
+                single_step_output.target
+                if pos_idx == attr_pos_end - 1
+                else [[single_step_output.prefix[seq_idx][pos_idx]] for seq_idx in range(num_seq)]
+            )
+            if single_step_output.source_attributions is not None:
+                step_output.source_attributions = single_step_output.source_attributions[:, :, pos_idx - 1]
+            if single_step_output.target_attributions is not None:
+                step_output.target_attributions = single_step_output.target_attributions[:, :pos_idx, pos_idx - 1]
+            single_step_output.step_scores = {}
+            if single_step_output.sequence_scores is not None:
+                step_output.sequence_scores = single_step_output.sequence_scores
+            steps.append(step_output)
+        return steps
+
     def format_contrastive_targets(
         self,
         target_sequences: TextSequences,
@@ -416,9 +446,9 @@ def attribute(
             target_lengths=targets_lengths,
             method_name=self.method_name,
             show=show_progress,
-            pretty=pretty_progress,
+            pretty=False if self.is_final_step_method else pretty_progress,
             attr_pos_start=attr_pos_start,
-            attr_pos_end=attr_pos_end,
+            attr_pos_end=1 if self.is_final_step_method else attr_pos_end,
         )
         whitespace_indexes = find_char_indexes(sequences.targets, " ")
         attribution_outputs = []
@@ -427,6 +457,8 @@ def attribute(
 
         # Attribution loop for generation
         for step in range(attr_pos_start, iter_pos_end):
+            if self.is_final_step_method and step != iter_pos_end - 1:
+                continue
             tgt_ids, tgt_mask = batch.get_step_target(step, with_attention=True)
             step_output = self.filtered_attribute_step(
                 batch[:step],
@@ -450,7 +482,7 @@ def attribute(
                 contrast_targets_alignments=contrast_targets_alignments,
             )
             attribution_outputs.append(step_output)
-            if pretty_progress:
+            if pretty_progress and not self.is_final_step_method:
                 tgt_tokens = batch.target_tokens
                 skipped_prefixes = tok2string(self.attribution_model, tgt_tokens, end=attr_pos_start)
                 attributed_sentences = tok2string(self.attribution_model, tgt_tokens, attr_pos_start, step + 1)
@@ -464,19 +496,24 @@ def attribute(
                     skipped_suffixes,
                     whitespace_indexes,
                     show=show_progress,
-                    pretty=pretty_progress,
+                    pretty=True,
                 )
             else:
-                update_progress_bar(pbar, show=show_progress, pretty=pretty_progress)
+                update_progress_bar(pbar, show=show_progress, pretty=False)
         end = datetime.now()
-        close_progress_bar(pbar, show=show_progress, pretty=pretty_progress)
+        close_progress_bar(pbar, show=show_progress, pretty=False if self.is_final_step_method else pretty_progress)
         batch.detach().to("cpu")
+        if self.is_final_step_method:
+            attribution_outputs = self._build_multistep_output_from_single_step(
+                attribution_outputs[0],
+                attr_pos_start=attr_pos_start,
+                attr_pos_end=iter_pos_end,
+            )
         out = FeatureAttributionOutput(
             sequence_attributions=FeatureAttributionSequenceOutput.from_step_attributions(
                 attributions=attribution_outputs,
                 tokenized_target_sentences=target_tokens_with_ids,
-                pad_id=self.attribution_model.pad_token,
-                has_bos_token=self.attribution_model.is_encoder_decoder,
+                pad_token=self.attribution_model.pad_token,
                 attr_pos_end=attr_pos_end,
             ),
             step_attributions=attribution_outputs if output_step_attributions else None,
@@ -593,7 +630,7 @@ def filtered_attribute_step(
             step_output.step_scores[score] = get_step_scores(score, step_fn_args, step_fn_extra_args).to("cpu")
         # Reinsert finished sentences
         if target_attention_mask is not None and is_filtered:
-            step_output.remap_from_filtered(target_attention_mask, orig_batch)
+            step_output.remap_from_filtered(target_attention_mask, orig_batch, self.is_final_step_method)
         step_output = step_output.detach().to("cpu")
         return step_output
 

diff --git a/inseq/attr/feat/internals_attribution.py b/inseq/attr/feat/internals_attribution.py
@@ -17,11 +17,10 @@
 from typing import Any, Optional
 
 from captum._utils.typing import TensorOrTupleOfTensorsGeneric
-from captum.attr._utils.attribution import Attribution
 
 from ...data import MultiDimensionalFeatureAttributionStepOutput
 from ...utils import Registry
-from ...utils.typing import MultiLayerMultiUnitScoreTensor
+from ...utils.typing import InseqAttribution, MultiLayerMultiUnitScoreTensor
 from .feature_attribution import FeatureAttribution
 
 logger = logging.getLogger(__name__)
@@ -38,7 +37,7 @@ class AttentionWeightsAttribution(InternalsAttributionRegistry):
 
     method_name = "attention"
 
-    class AttentionWeights(Attribution):
+    class AttentionWeights(InseqAttribution):
         @staticmethod
         def has_convergence_delta() -> bool:
             return False
@@ -74,9 +73,9 @@ def attribute(
                 :class:`~inseq.data.MultiDimensionalFeatureAttributionStepOutput`: A step output containing attention
                 weights for each layer and head, with shape :obj:`(batch_size, seq_len, n_layers, n_heads)`.
             """
-            # We adopt the format [batch_size, sequence_length, num_layers, num_heads]
+            # We adopt the format [batch_size, sequence_length, sequence_length, num_layers, num_heads]
             # for consistency with other multi-unit methods (e.g. gradient attribution)
-            decoder_self_attentions = decoder_self_attentions[..., -1, :].to("cpu").clone().permute(0, 3, 1, 2)
+            decoder_self_attentions = decoder_self_attentions.to("cpu").clone().permute(0, 4, 3, 1, 2)
             if self.forward_func.is_encoder_decoder:
                 sequence_scores = {}
                 if len(inputs) > 1:
@@ -85,10 +84,11 @@ def attribute(
                     target_attributions = None
                     sequence_scores["decoder_self_attentions"] = decoder_self_attentions
                 sequence_scores["encoder_self_attentions"] = (
-                    encoder_self_attentions.to("cpu").clone().permute(0, 3, 4, 1, 2)
+                    encoder_self_attentions.to("cpu").clone().permute(0, 4, 3, 1, 2)
                 )
+                cross_attentions = cross_attentions.to("cpu").clone().permute(0, 4, 3, 1, 2)
                 return MultiDimensionalFeatureAttributionStepOutput(
-                    source_attributions=cross_attentions[..., -1, :].to("cpu").clone().permute(0, 3, 1, 2),
+                    source_attributions=cross_attentions,
                     target_attributions=target_attributions,
                     sequence_scores=sequence_scores,
                     _num_dimensions=2,  # num_layers, num_heads
@@ -106,6 +106,8 @@ def __init__(self, attribution_model, **kwargs):
         self.use_attention_weights = True
         # Does not rely on predicted output (i.e. decoding strategy agnostic)
         self.use_predicted_target = False
+        # Needs only the final generation step to extract scores
+        self.is_final_step_method = True
         self.method = self.AttentionWeights(attribution_model)
 
     def attribute_step(

diff --git a/inseq/attr/feat/ops/__init__.py b/inseq/attr/feat/ops/__init__.py
@@ -1,13 +1,17 @@
 from .discretized_integrated_gradients import DiscretetizedIntegratedGradients
 from .lime import Lime
 from .monotonic_path_builder import MonotonicPathBuilder
+from .reagent import Reagent
 from .rollout import rollout_fn
 from .sequential_integrated_gradients import SequentialIntegratedGradients
+from .value_zeroing import ValueZeroing
 
 __all__ = [
     "DiscretetizedIntegratedGradients",
     "MonotonicPathBuilder",
+    "ValueZeroing",
     "Lime",
+    "Reagent",
     "SequentialIntegratedGradients",
     "rollout_fn",
 ]