-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why sometimes the target attribution can be 'nan'? #268
Comments
Hi @frankdarkluo, Nans are used in the target attribution tensor to mark positions that are not being used for the current prediction step due to the causal attention mask of the model. This said, in the case above the single step of attribution should not have any nans in the scores. Could you try to reproduce this on another CausalLM model like |
Thank you @gsarti . I tried 'gpt2-large', and the result is even worse. More 'nan' appears.
|
When I use 'Qwen/Qwen1.5-0.5B-Chat', another error appeared during loading attribution model. The error happens when I do
Even if I have specifically set
The same error still happens. How should I fix this? Greatly appreciate it if you could help this! |
Hi @frankdarkluo, thanks for the follow-up! Indeed, attention had an issue in which 0s in the tensor were set to Re: the usage of Qwen, this works for me on #269: import inseq
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name='Qwen/Qwen1.5-0.5B-Chat'
model = AutoModelForCausalLM.from_pretrained(
model_name, #device_map='cuda',
#torch_dtype=torch.float16,
)
toker=AutoTokenizer.from_pretrained(model_name)
messages = [
{"role": "system", "content": "The following are multiple choice questions: You should directly answer the question by choosing the correct option."},
{"role": "user", "content": "Question: The morning temperature in a city is 41\u00b0F. If a sunny, mild day is forecast, which temperature is most likely for 2:00 p.m.?\nOptions:\nC. 32\u00b0 F\nD. 41\u00b0 F\nA. 78\u00b0 F\nB. 98\u00b0 F\nAnswer:"}
]
question = toker.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
print(question)
qa_model = inseq.load_model(model, "attention", tokenizer=model_name, tokenizer_kwargs={"legacy": False})
out=qa_model.attribute(
question,
generation_args={"max_new_tokens": 20, 'do_sample':False, "skip_special_tokens": False})
out_agg = out.aggregate(normalise=True)
out_agg.show(do_aggregation=False) The only main adjustment is the usage of the chat template, and |
In addition, I found an interesting phenomenon that the attribution socre is not 'nan' anymore for the source tokens, but some meaningful logits. Is it because previously there is some precision threshold for those very small logits, and then become 'nan'?
|
Hey @frankdarkluo, yes, it was being rounded after the fourth decimal! |
Question
When I load my decoder-only language model to analyze the target_attributions with the attention-based method, there will be some 'nan' for some tokens.
My code is basically
The printing result is shown as
I wonder:
Additional context
Checklist
issues
.The text was updated successfully, but these errors were encountered: