`Attribution` Classes refactoring, `Aggregator` for postponed score aggregation #130

gsarti · 2022-03-16T09:39:45Z

Description

Currently, attributions are aggregated at the token level during the step-by-step computation, providing only the aggregated per-token scores as output. Ideally, we want to make the aggregation step happen as late as possible (TBD) to enable various analyses (e.g. per-neuron attributions, different aggregation strategies) that are currently not supported.

The changes performed are the following:

Refactor the classes in batch.py to use a shared TensorWrapper backbone class for common operations.
Decouple gradient attribution-specific logics from the main attribution flow via inheritance, creating specific output subclasses containing e.g. converge delta parameters. This will be useful to seamlessly add the per-layer logic of attention attribution.
Preserve the full attribution tensors until the end of the attribution process. This in turn involves:
- Rethinking the structure of FeatureAttributionRawStepOutput and FeatureAttributionStepOutput and the transition method FeatureAttribution.make_attribution_output to account for preserving the attribution tensors.
- Modifying FeatureAttributionSequenceOutput.from_step_attributions to take max_target_seq_len step-attributions of shape batch_size, max_source_seq_len, hidden_size and produce source-target attributions of sizes source_seq_len, target_seq_len, hidden_size and target_seq_len, target_seq_len, hidden_size respectively, where the sequence lengths are variable for every generated object (i.e. remove end padding by truncating on token seq length).
Create an Aggregator abstract class and a SumNormAggregator (current strategy, will be used as default). Add an aggregator field in FeatureAttributionSequenceOutput that doesn't get picked up when saving and uses SumNormAggregator as default. The aggregator attached to the class is used as default behavior to aggregate attributions when printing the object, using .show(), using maximum, minimum, etc.

Idea for the Aggregator design: Aggregator contains a mapping from FeatureAttributionSequenceOutput field names to functions that need to be applied to them. Some checks need to be performed after the full aggregation process to ensure that the show method will work:

source_attributions have a shape corresponding to source_tokens x target_tokens
target_attributions have a shape corresponding to target_tokens x target_tokens
probabilities, if present, have a shape of target_tokens
all other per-step outputs have a shape of target_tokens (e.g. deltas)

gsarti · 2022-03-17T21:46:47Z

Final touches missing, save/load on disk broken due to TokenWithId unserializability

gsarti · 2022-03-21T18:53:31Z

Adding ContiguousSpanAggregator directly here, allowing aggregation for predefined source or target spans. Aggregation works but need fixing in the aggregation_fn used for attributions & probabilities. Save/load moved to pickle with only info saved as JSON to make them user-readable, but still not working for now.

EDIT: thanks to json-tricks we are able to save the FeatureAttributionOutput object to JSON and preserve a user-readable info section at the top of the file!

gsarti · 2022-03-23T16:36:03Z

Everything should be in place at this point, only some minimal testing for the two aggregators is still missing before the merge.

gsarti · 2022-04-04T14:56:47Z

Finished tests, merging. Summary of the changes:

Centralized shared methods from tensor-holding classes (in inseq.data.batch and inseq.data.attribution) into a new abstract TensorWrapper class in inseq.data.data_utils.
Aggregation from full-scale gradient attribution (i.e. one score per weight) is no longer performed during attribution steps but is instead postponed and delegated to classes of the newly added Aggregator family. Providing a default aggregator that is triggered when calling FeatureAttributionSequenceOutput.show() allows us to preserve the original two-step procedure of attribute-show, enabling more flexibility. See examples of usage in test_aggregator.py
Removed the intermediate FeatureAttributionRawStepOutput, while now the final FeatureAttributionStepOutput is built from the beginning and simply filled with content.

gsarti added 4 commits March 16, 2022 09:50

remap from filtered for arbitrary tail dimension

ff9063b

Added Tensorwrapper abstraction

e0f2752

WIP moved deltas to attr-specific output class

93d7efa

new attribution structure working

919fc52

span aggregation so-so, fix aggregation_fn and save/load

ecaec6f

gsarti changed the title ~~Postpone attribution aggregation and data classes refactoring~~ [WIP] Postpone attribution aggregation and data classes refactoring Mar 21, 2022

gsarti added 2 commits March 23, 2022 13:31

updated dependencies

2ebfb74

Aggregators working, test passing

779101b

gsarti added 8 commits March 31, 2022 10:28

Add torch install in gh build

d1b5c3b

Fix torch dependency in poetry lock

64edcef

Updated poetry dependencies

940067c

Fixed black-click incompatibility, test for Py3.10

367b75d

Added scipy 1.8 for py3.10, bumped actions vers

74feced

Fix CI checks to avoid running twice on PR commits

f605dac

Bugfix

d3152c3

Added aggregator tests

25d727b

gsarti changed the title ~~[WIP] Postpone attribution aggregation and data classes refactoring~~ Postpone attribution aggregation and data classes refactoring Apr 4, 2022

gsarti changed the title ~~Postpone attribution aggregation and data classes refactoring~~ Attribution classes refactoring, Aggregator for postponed score aggregation Apr 4, 2022

gsarti closed this Apr 4, 2022

gsarti reopened this Apr 4, 2022

gsarti changed the title ~~Attribution classes refactoring, Aggregator for postponed score aggregation~~ Attribution Classes refactoring, Aggregator for postponed score aggregation Apr 4, 2022

gsarti merged commit 034a46d into main Apr 4, 2022

gsarti deleted the full-attribution-output branch April 4, 2022 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Attribution` Classes refactoring, `Aggregator` for postponed score aggregation #130

`Attribution` Classes refactoring, `Aggregator` for postponed score aggregation #130

gsarti commented Mar 16, 2022 •

edited

gsarti commented Mar 17, 2022

gsarti commented Mar 21, 2022 •

edited

gsarti commented Mar 23, 2022

gsarti commented Apr 4, 2022

Attribution Classes refactoring, Aggregator for postponed score aggregation #130

Attribution Classes refactoring, Aggregator for postponed score aggregation #130

Conversation

gsarti commented Mar 16, 2022 • edited

Description

gsarti commented Mar 17, 2022

gsarti commented Mar 21, 2022 • edited

gsarti commented Mar 23, 2022

gsarti commented Apr 4, 2022

`Attribution` Classes refactoring, `Aggregator` for postponed score aggregation #130

`Attribution` Classes refactoring, `Aggregator` for postponed score aggregation #130

gsarti commented Mar 16, 2022 •

edited

gsarti commented Mar 21, 2022 •

edited