Aggregation functions, named aggregators, contrastive context step functions, inseq.explain
#182
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
inseq.explain: The
ìnseq.explain(ID)
function can be used to easily get more information about the class or function indexed by ID in the inseq library (attribution methods, step function, aggregators, aggregation functions at the moment). For example:This is intended to be used together with
list_aggregators
,list_feature_attribution_methods
, etc. to get more information without having to navigate the documentation.Aggregation functions: This PR generalizes the aggregation logics that were previously used inside the
AttentionWeightAttribution
moving them to the post-attributionaggregate
step. These logics include:select_idx
inaggregate
to perform the chosen aggregation only across the selected dimensions (e.g.select_idx=(0,5)
will take into account only the first five elements of the last dimension for aggregation.aggregate_fn
among the ones made available through the newAggregationFunction
class (e.g. `"max", "mean", "vnorm")normalize=False
Named aggregators: Using aggregators becomes much easier in this PR thanks to named aliases that do not require imports. Similarly,
AggregatorPipeline
can be defined simply as a list of strings. Moreover, specifying the name of anaggregate_fn
instead of the one of an aggregator will automatically aggregate last attribution dimension using that function (see choices withinseq.list_aggregation_functions
)Before:
Now:
New step functions: The
contrast_prob
,pcxmi
andkl_divergence
functions were added to pre-registered step functions inside the library. Whilecontrast_prob_diff
returns the delta in probabilities between regular and contrastive prediction options,contrast_prob
is the delta in probabilities for the same target token given different source and/or preceding target context.pcxmi
andkl_divergence
usecontrast_prob
to derive information-theoretic quantities given the two context options.💥 Breaking changes:
contrast_prob_diff
step function now takescontrast_targets
directly in (list of) string,BatchEncoding
orBatch
format instead ofcontrast_ids
andcontrast_attention_mask
. Examples in docs and README were updated accordingly.attribute
call for theattention
attribution method now does not take any parameter since the aggregation is postponed after the attribution (before, it used to accept indexes and aggregation function names). Hence, now the method returns the full attention tensor of size[source_len, target_len, num_layers, num_heads]
rather than the pre-filtered and pre-aggregated one.Here is an example of how previous attention attribution results can be reproduced with the new postponed aggregation:
Before (v0.4.0): Aggregation during attribution
Now: postponed aggregation. Same result, but gives users more control on the aggregation process.