-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to programmatically extract attribution scores per token? #160
Comments
Hi @MoritzLaurer, thank you for your interest! This part is still quite undocumented, but we hope to add more details in the docs soon! At the end of the Getting started section in the docs we show an example of the attribution output, that I report here: >>> print(out)
FeatureAttributionOutput({
sequence_attributions: list with 1 elements of type GradientFeatureAttributionSequenceOutput: [
GradientFeatureAttributionSequenceOutput({
source: list with 13 elements of type TokenWithId:[
'▁Hello', '▁world', ',', '▁here', '\'', 's', '▁the', '▁In', 'se', 'q', '▁library', '!', '</s>'
],
target: list with 12 elements of type TokenWithId:[
'▁Bonjour', '▁le', '▁monde', ',', '▁voici', '▁la', '▁bibliothèque', '▁Ins', 'e', 'q', '!', '</s>'
],
source_attributions: torch.float32 tensor of shape [13, 12, 512] on CPU,
...
})
],
step_attributions: None,
info: {
...
}
}) As you can see, the source sequence contains 13 tokens and the target contains 12 tokens, while the attribution computed with a gradient-based method is a 3D tensor shape For gradient methods, the default aggregator used is a To obtain the same output and pair it with the tokens, assuming a gradient method that returns a 3D tensor, you could do something like: import inseq
model = inseq.load_model("Helsinki-NLP/opus-mt-en-fr", "saliency")
# Produces a FeatureAttributionOutput containing 1 GradientFeatureAttributionSequenceOutput
out = model.attribute(<YOUR_INPUT>)
# The source and, if present, target attributions have shapes of [src_len, tgt_len] and [tgt_len, tgt_len]
# respectively after this step
aggregated_attribution = out.sequence_attributions[0].aggregate()
# Creating a mapping of [src_token, tgt_token] -> attribution score
score_map = {}
for src_idx, src_tok in enumerate(aggregated_attribution.source):
for tgt_idx, tgt_tok in enumerate(aggregated_attribution.target):
score_map[(src_tok.token, tgt_tok.token)] = aggregated_attribution.source_attributions[src_idx, tgt_idx].item()
print(score_map)
{('▁Hello', '▁Bonjour'): 0.8095492720603943,
('▁Hello', '▁le'): 0.5914772152900696,
('▁Hello', '▁monde'): 0.655048131942749,
('▁Hello', ','): 0.6247086524963379,
('▁Hello', '▁voici'): 0.7142019271850586,
('▁Hello', '▁la'): 0.623748779296875,
('▁Hello', '▁bibliothèque'): 0.3409218192100525,
('▁Hello', '▁Ins'): 0.28728920221328735,
('▁Hello', 'e'): 0.18802204728126526,
('▁Hello', 'q'): 0.13516321778297424,
('▁Hello', '!'): 0.792391300201416,
('▁Hello', '</s>'): 0.7535314559936523,
('▁world', '▁Bonjour'): 0.39373481273651123,
('▁world', '▁le'): 0.3593481779098511,
... Hope it helps! I'd be curious to hear ideas you might have on how a better API to access such scores could look like! |
great, that works, thanks! (intuitively I would probably enable people to return this as a pandas dataframe for downstream analysis, but that would probably add another dependency) |
I am not sure we want |
yeah I think that makes sense. a format that enables easy transformation to a df e.g. with |
Extracting scores and convert them in pandas format will be made easier by |
Checklist
issues
.❓ Question
How do I programmatically extract the per-token scores to have them in a list or dictionary, mapped to each token?
I understand how to show the scores per token visually, but I don't how to extract them from the "out" object for further downstream processing
The text was updated successfully, but these errors were encountered: