Custom alignment of contrast_targets
for contrastive attribution methods
#195
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
The current implementation of contrastive attribution method can only be applied to tokens in the same positions across the original and contrastive sequences, limiting the applicability of such methods for real-world contrastive pairs in which differences are not necessarily minimal.
This PR introduces a new parameter
contrast_targets_alignments
(List[Tuple[int, int]]
orList[List[Tuple[int, int]]]
, if more than one sequence is attributed) that can be used to provide custom alignments between the originalgenerated_texts
used for attribution and thecontrast_targets
used as contrastive pairs whenattributed_fn
is set to a contrastive function (contrast_prob
orcontrast_prob_diff
).Example
The following example shows the current problematic behavior of
attributed_fn=contrast_prob_diff
when two largely different sentences are provided:Using
contrast_targets_alignments
we can specify pairs oforiginal_idx, contrast_idx
to align the contents ofcontrast_targets
to the attributed sequence:Finally, a
contrast_targets_alignments="auto"
option is provided to allow automatic word alignment. The words between the original and contrastive target sequences are aligned automatically using cosine similarity of the embeddings formed by a massively multilingual encoder model (sentence-transformers/LaBSE
).Notes
If the new
contrast_targets_alignments
is not specified, current behavior is preserved (1:1 match, example 1).Provided alignments need to encompass all tokens of the original sequence, since contrastive attribution is performed at every generation step for that sequence. This is likely to produce nonsensical contrast pairs, so meaningful pairs need to be selected post-attribution for further analysis. If provided alignments do not cover all tokens, the current behavior is to raise a warning and add 1:1 alignments for the missing original tokens.
If a token of the original sequence is aligned with multiple tokens in the contrast, the current behavior is to use the first (in terms of position in the sentence) non-aligned token among those (if any), or the first if they are all aligned.
In the presence of
contrast_targets
differing from the aligned original tokens, the output tokens produced bymodel.attribute
are modified to reflect this using theContrast → Original
notation. This might change in the future with the introduction of an ad-hoc field for contrast targets in the output to preserve maximal information.The
sentence-transformers/LaBSE
is chosen as default aligner since it encompasses 109 languages. At the moment, the model cannot be set programmatically by users. The model is loaded lazily when the"auto"
option is used and kept cached for subsequent calls.