Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin' into rollout
Browse files Browse the repository at this point in the history
  • Loading branch information
gsarti committed Apr 30, 2024
2 parents e27c6b9 + 04dde30 commit 287caa4
Show file tree
Hide file tree
Showing 50 changed files with 2,419 additions and 275 deletions.
26 changes: 10 additions & 16 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,20 @@

## 馃殌 Features

- Support for multi-GPU attribution ([#238](https://github.com/inseq-team/inseq/pull/238))
- Added `inseq attribute-context` CLI command to support the [PECoRe framework] for detecting and attributing context reliance in generative LMs ([#237](https://github.com/inseq-team/inseq/pull/237))

## 馃敡 Fixes & Refactoring

- Fix `ContiguousSpanAggregator` and `SubwordAggregator` edge case of single-step generation ([#247](https://github.com/inseq-team/inseq/pull/247))
- Move tensors to CPU right away in the forward pass to avoid OOM when cloning ([#245](https://github.com/inseq-team/inseq/pull/245))
- Fix `remap_from_filtered` behavior on sequence_scores tensors. ([#245](https://github.com/inseq-team/inseq/pull/245))
- Use torch-native padding when converting lists of `FeatureAttributionStepOutput` to `FeatureAttributionSequenceOutput` in `get_sequences_from_batched_steps`. ([#245](https://github.com/inseq-team/inseq/pull/245))
- Bump `ruff` version ([#245](https://github.com/inseq-team/inseq/pull/245))
- Drop `poetry` in favor of [`uv`](https://github.com/astral-sh/uv) to accelerate package installation and simplify config in `pyproject.toml`. ([#249](https://github.com/inseq-team/inseq/pull/249))
- Drop `darglint` in favor of `pydoclint`. ([#249](https://github.com/inseq-team/inseq/pull/249))
- Replace Arxiv with ACL Anthology badge in `README`. ([#249](https://github.com/inseq-team/inseq/pull/249))
- Add first version of `CHANGELOG.md` ([#249](https://github.com/inseq-team/inseq/pull/249))
- Added multithread support for running tests using `pytest-xdist`
- Added new models `DbrxForCausalLM`, `OlmoForCausalLM`, `Phi3ForCausalLM`, `Qwen2MoeForCausalLM` to model config.

## 馃敡 Fixes and Refactoring

- Fix the issue in the attention implementation from [#268](https://github.com/inseq-team/inseq/issues/268) where non-terminal position in the tensor were set to nan if they were 0s ([#269](https://github.com/inseq-team/inseq/pull/269)).

- Fix the pad token in cases where it is not specified by default in the loaded model (e.g. for Qwen models) ([#269](https://github.com/inseq-team/inseq/pull/269)).

- Fix bug reported in [#266](https://github.com/inseq-team/inseq/issues/266) making `value_zeroing` unusable for SDPA attention. This enables using the method on models using SDPA attention as default (e.g. `GemmaForCausalLM`) without passing `model_kwargs={'attn_implementation': 'eager'}` ([#267](https://github.com/inseq-team/inseq/pull/267)).

## 馃摑 Documentation and Tutorials

*No changes*

## 馃挜 Breaking Changes

*No changes*
*No changes*
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ install-dev:

.PHONY: install-ci
install-ci:
make uv-activate && uv pip install -e .[lint]
make uv-activate && uv pip install -r requirements-dev.txt

.PHONY: update-deps
update-deps:
Expand Down
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,10 @@ Use the `inseq.list_feature_attribution_methods` function to list all available

- `lime`: ["Why Should I Trust You?": Explaining the Predictions of Any Classifier](https://arxiv.org/abs/1602.04938) (Ribeiro et al., 2016)

- `value_zeroing`: [Quantifying Context Mixing in Transformers](https://aclanthology.org/2023.eacl-main.245/) (Mohebbi et al. 2023)

- `reagent`: [ReAGent: A Model-agnostic Feature Attribution Method for Generative Language Models](https://arxiv.org/abs/2402.00794) (Zhao et al., 2024)

#### Step functions

Step functions are used to extract custom scores from the model at each step of the attribution process with the `step_scores` argument in `model.attribute`. They can also be used as targets for attribution methods relying on model outputs (e.g. gradient-based methods) by passing them as the `attributed_fn` argument. The following step functions are currently supported:
Expand Down Expand Up @@ -301,7 +305,10 @@ If you use Inseq in your research we suggest to include a mention to the specifi

## Research using Inseq

Inseq has been used in various research projects. A list of known publications that use Inseq to conduct interpretability analyses of generative models is shown below. If you know more, please let us know or submit a pull request (*last updated: February 2024*).
Inseq has been used in various research projects. A list of known publications that use Inseq to conduct interpretability analyses of generative models is shown below.

> [!TIP]
> Last update: April 2024. Please open a pull request to add your publication to the list.
<details>
<summary><b>2023</b></summary>
Expand All @@ -322,6 +329,7 @@ Inseq has been used in various research projects. A list of known publications t
<ol>
<li><a href="https://arxiv.org/abs/2401.12576">LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools</a> (Wang et al., 2024)</li>
<li><a href="https://arxiv.org/abs/2402.00794">ReAGent: A Model-agnostic Feature Attribution Method for Generative Language Models</a> (Zhao et al., 2024)</li>
<li><a href="https://arxiv.org/abs/2404.02421">Revisiting subword tokenization: A case study on affixal negation in large language models</a> (Truong et al., 2024)</li>
</ol>

</details>
6 changes: 3 additions & 3 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,13 @@
# -- Project information -----------------------------------------------------

project = "inseq"
copyright = "2023, The Inseq Team, Licensed under the Apache License, Version 2.0"
copyright = "2024 , The Inseq Team, Licensed under the Apache License, Version 2.0"
author = "The Inseq Team"

# The short X.Y version
version = "0.6"
version = "0.7"
# The full version, including alpha/beta/rc tags
release = "0.6.0.dev0"
release = "0.7.0.dev0"


# Prefix link to point to master, comment this during version release and uncomment below line
Expand Down
6 changes: 3 additions & 3 deletions docs/source/main_classes/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Three commands are supported:

- ``inseq attribute-dataset``: Extends ``attribute`` to full dataset using Hugging Face ``datasets.load_dataset`` API.

- ``inseq attribute-context``: Detects and attribute context dependence for generation tasks using the approach of `Sarti et al. (2023) <https://arxiv.org/abs/2310.0118>`__.
- ``inseq attribute-context``: Detects and attribute context dependence for generation tasks using the approach of `Sarti et al. (2023) <https://arxiv.org/abs/2310.01188>`__.

``attribute``
-----------------------------------------------------------------------------------------------------------------------
Expand All @@ -47,6 +47,6 @@ The ``attribute-dataset`` command extends the ``attribute`` command to full data
-----------------------------------------------------------------------------------------------------------------------

The ``attribute-context`` command detects and attributes context dependence for generation tasks using the approach of
`Sarti et al. (2023) <https://arxiv.org/abs/2310.0118>`__. The command takes the following arguments:
`Sarti et al. (2023) <https://arxiv.org/abs/2310.01188>`__. The command takes the following arguments:

.. autoclass:: inseq.commands.attribute_context.attribute_context_args.AttributeContextArgs
.. autoclass:: inseq.commands.attribute_context.attribute_context_args.AttributeContextArgs
40 changes: 38 additions & 2 deletions docs/source/main_classes/feature_attribution.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Attribution Methods
.. autoclass:: inseq.attr.FeatureAttribution
:members:

Gradient Attribution Methods
Gradient-based Attribution Methods
-----------------------------------------------------------------------------------------------------------------------

.. autoclass:: inseq.attr.feat.GradientAttributionRegistry
Expand Down Expand Up @@ -67,7 +67,7 @@ Layer Attribution Methods
:members:


Attention Attribution Methods
Internals-based Attribution Methods
-----------------------------------------------------------------------------------------------------------------------

.. autoclass:: inseq.attr.feat.InternalsAttributionRegistry
Expand All @@ -76,3 +76,39 @@ Attention Attribution Methods

.. autoclass:: inseq.attr.feat.AttentionWeightsAttribution
:members:

Perturbation-based Attribution Methods
-----------------------------------------------------------------------------------------------------------------------

.. autoclass:: inseq.attr.feat.PerturbationAttributionRegistry
:members:

.. autoclass:: inseq.attr.feat.OcclusionAttribution
:members:

.. autoclass:: inseq.attr.feat.LimeAttribution
:members:

.. autoclass:: inseq.attr.feat.ValueZeroingAttribution
:members:

.. autoclass:: inseq.attr.feat.ReagentAttribution
:members:

.. automethod:: __init__

.. code:: python
import inseq
model = inseq.load_model(
"gpt2-medium",
"reagent",
keep_top_n=5,
stopping_condition_top_k=3,
replacing_ratio=0.3,
max_probe_steps=3000,
num_probes=8
)
out = model.attribute("Super Mario Land is a game that developed by")
out.show()
6 changes: 6 additions & 0 deletions inseq/attr/feat/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@
from .perturbation_attribution import (
LimeAttribution,
OcclusionAttribution,
PerturbationAttributionRegistry,
ReagentAttribution,
ValueZeroingAttribution,
)

__all__ = [
Expand All @@ -39,4 +42,7 @@
"OcclusionAttribution",
"LimeAttribution",
"SequentialIntegratedGradientsAttribution",
"ValueZeroingAttribution",
"PerturbationAttributionRegistry",
"ReagentAttribution",
]
8 changes: 6 additions & 2 deletions inseq/attr/feat/attribution_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,11 +144,15 @@ def extract_args(
def get_source_target_attributions(
attr: Union[StepAttributionTensor, tuple[StepAttributionTensor, StepAttributionTensor]],
is_encoder_decoder: bool,
has_sequence_scores: bool = False,
) -> tuple[Optional[StepAttributionTensor], Optional[StepAttributionTensor]]:
if isinstance(attr, tuple):
if is_encoder_decoder:
return (attr[0], attr[1]) if len(attr) > 1 else (attr[0], None)
if has_sequence_scores:
return (attr[0], attr[1], attr[2])
else:
return (attr[0], attr[1]) if len(attr) > 1 else (attr[0], None)
else:
return (None, attr[0])
return (None, None, attr[0]) if has_sequence_scores else (None, attr[0])
else:
return (attr, None) if is_encoder_decoder else (None, attr)
55 changes: 46 additions & 9 deletions inseq/attr/feat/feature_attribution.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ def __init__(self, attribution_model: "AttributionModel", hook_to_model: bool =
self.use_hidden_states: bool = False
self.use_predicted_target: bool = True
self.use_model_config: bool = False
self.is_final_step_method: bool = False
if hook_to_model:
self.hook(**kwargs)

Expand Down Expand Up @@ -272,6 +273,35 @@ def _run_compatibility_checks(self, attributed_fn) -> None:
" method."
)

@staticmethod
def _build_multistep_output_from_single_step(
single_step_output: FeatureAttributionStepOutput,
attr_pos_start: int,
attr_pos_end: int,
) -> list[FeatureAttributionStepOutput]:
if single_step_output.step_scores:
raise ValueError("step_scores are not supported for final step attribution methods.")
num_seq = len(single_step_output.prefix)
steps = []
for pos_idx in range(attr_pos_start, attr_pos_end):
step_output = single_step_output.clone_empty()
step_output.source = single_step_output.source
step_output.prefix = [single_step_output.prefix[seq_idx][:pos_idx] for seq_idx in range(num_seq)]
step_output.target = (
single_step_output.target
if pos_idx == attr_pos_end - 1
else [[single_step_output.prefix[seq_idx][pos_idx]] for seq_idx in range(num_seq)]
)
if single_step_output.source_attributions is not None:
step_output.source_attributions = single_step_output.source_attributions[:, :, pos_idx - 1]
if single_step_output.target_attributions is not None:
step_output.target_attributions = single_step_output.target_attributions[:, :pos_idx, pos_idx - 1]
single_step_output.step_scores = {}
if single_step_output.sequence_scores is not None:
step_output.sequence_scores = single_step_output.sequence_scores
steps.append(step_output)
return steps

def format_contrastive_targets(
self,
target_sequences: TextSequences,
Expand Down Expand Up @@ -416,9 +446,9 @@ def attribute(
target_lengths=targets_lengths,
method_name=self.method_name,
show=show_progress,
pretty=pretty_progress,
pretty=False if self.is_final_step_method else pretty_progress,
attr_pos_start=attr_pos_start,
attr_pos_end=attr_pos_end,
attr_pos_end=1 if self.is_final_step_method else attr_pos_end,
)
whitespace_indexes = find_char_indexes(sequences.targets, " ")
attribution_outputs = []
Expand All @@ -427,6 +457,8 @@ def attribute(

# Attribution loop for generation
for step in range(attr_pos_start, iter_pos_end):
if self.is_final_step_method and step != iter_pos_end - 1:
continue
tgt_ids, tgt_mask = batch.get_step_target(step, with_attention=True)
step_output = self.filtered_attribute_step(
batch[:step],
Expand All @@ -450,7 +482,7 @@ def attribute(
contrast_targets_alignments=contrast_targets_alignments,
)
attribution_outputs.append(step_output)
if pretty_progress:
if pretty_progress and not self.is_final_step_method:
tgt_tokens = batch.target_tokens
skipped_prefixes = tok2string(self.attribution_model, tgt_tokens, end=attr_pos_start)
attributed_sentences = tok2string(self.attribution_model, tgt_tokens, attr_pos_start, step + 1)
Expand All @@ -464,19 +496,24 @@ def attribute(
skipped_suffixes,
whitespace_indexes,
show=show_progress,
pretty=pretty_progress,
pretty=True,
)
else:
update_progress_bar(pbar, show=show_progress, pretty=pretty_progress)
update_progress_bar(pbar, show=show_progress, pretty=False)
end = datetime.now()
close_progress_bar(pbar, show=show_progress, pretty=pretty_progress)
close_progress_bar(pbar, show=show_progress, pretty=False if self.is_final_step_method else pretty_progress)
batch.detach().to("cpu")
if self.is_final_step_method:
attribution_outputs = self._build_multistep_output_from_single_step(
attribution_outputs[0],
attr_pos_start=attr_pos_start,
attr_pos_end=iter_pos_end,
)
out = FeatureAttributionOutput(
sequence_attributions=FeatureAttributionSequenceOutput.from_step_attributions(
attributions=attribution_outputs,
tokenized_target_sentences=target_tokens_with_ids,
pad_id=self.attribution_model.pad_token,
has_bos_token=self.attribution_model.is_encoder_decoder,
pad_token=self.attribution_model.pad_token,
attr_pos_end=attr_pos_end,
),
step_attributions=attribution_outputs if output_step_attributions else None,
Expand Down Expand Up @@ -593,7 +630,7 @@ def filtered_attribute_step(
step_output.step_scores[score] = get_step_scores(score, step_fn_args, step_fn_extra_args).to("cpu")
# Reinsert finished sentences
if target_attention_mask is not None and is_filtered:
step_output.remap_from_filtered(target_attention_mask, orig_batch)
step_output.remap_from_filtered(target_attention_mask, orig_batch, self.is_final_step_method)
step_output = step_output.detach().to("cpu")
return step_output

Expand Down
16 changes: 9 additions & 7 deletions inseq/attr/feat/internals_attribution.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,10 @@
from typing import Any, Optional

from captum._utils.typing import TensorOrTupleOfTensorsGeneric
from captum.attr._utils.attribution import Attribution

from ...data import MultiDimensionalFeatureAttributionStepOutput
from ...utils import Registry
from ...utils.typing import MultiLayerMultiUnitScoreTensor
from ...utils.typing import InseqAttribution, MultiLayerMultiUnitScoreTensor
from .feature_attribution import FeatureAttribution

logger = logging.getLogger(__name__)
Expand All @@ -38,7 +37,7 @@ class AttentionWeightsAttribution(InternalsAttributionRegistry):

method_name = "attention"

class AttentionWeights(Attribution):
class AttentionWeights(InseqAttribution):
@staticmethod
def has_convergence_delta() -> bool:
return False
Expand Down Expand Up @@ -74,9 +73,9 @@ def attribute(
:class:`~inseq.data.MultiDimensionalFeatureAttributionStepOutput`: A step output containing attention
weights for each layer and head, with shape :obj:`(batch_size, seq_len, n_layers, n_heads)`.
"""
# We adopt the format [batch_size, sequence_length, num_layers, num_heads]
# We adopt the format [batch_size, sequence_length, sequence_length, num_layers, num_heads]
# for consistency with other multi-unit methods (e.g. gradient attribution)
decoder_self_attentions = decoder_self_attentions[..., -1, :].to("cpu").clone().permute(0, 3, 1, 2)
decoder_self_attentions = decoder_self_attentions.to("cpu").clone().permute(0, 4, 3, 1, 2)
if self.forward_func.is_encoder_decoder:
sequence_scores = {}
if len(inputs) > 1:
Expand All @@ -85,10 +84,11 @@ def attribute(
target_attributions = None
sequence_scores["decoder_self_attentions"] = decoder_self_attentions
sequence_scores["encoder_self_attentions"] = (
encoder_self_attentions.to("cpu").clone().permute(0, 3, 4, 1, 2)
encoder_self_attentions.to("cpu").clone().permute(0, 4, 3, 1, 2)
)
cross_attentions = cross_attentions.to("cpu").clone().permute(0, 4, 3, 1, 2)
return MultiDimensionalFeatureAttributionStepOutput(
source_attributions=cross_attentions[..., -1, :].to("cpu").clone().permute(0, 3, 1, 2),
source_attributions=cross_attentions,
target_attributions=target_attributions,
sequence_scores=sequence_scores,
_num_dimensions=2, # num_layers, num_heads
Expand All @@ -106,6 +106,8 @@ def __init__(self, attribution_model, **kwargs):
self.use_attention_weights = True
# Does not rely on predicted output (i.e. decoding strategy agnostic)
self.use_predicted_target = False
# Needs only the final generation step to extract scores
self.is_final_step_method = True
self.method = self.AttentionWeights(attribution_model)

def attribute_step(
Expand Down
4 changes: 4 additions & 0 deletions inseq/attr/feat/ops/__init__.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
from .discretized_integrated_gradients import DiscretetizedIntegratedGradients
from .lime import Lime
from .monotonic_path_builder import MonotonicPathBuilder
from .reagent import Reagent
from .rollout import rollout_fn
from .sequential_integrated_gradients import SequentialIntegratedGradients
from .value_zeroing import ValueZeroing

__all__ = [
"DiscretetizedIntegratedGradients",
"MonotonicPathBuilder",
"ValueZeroing",
"Lime",
"Reagent",
"SequentialIntegratedGradients",
"rollout_fn",
]

0 comments on commit 287caa4

Please sign in to comment.