-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why some prompt doesn't work, the hidden_states will be nan after GemmaModel.forward #10
Comments
Hi, which model variant did you use? On which platform? Can you please share the command to reproduce this? |
Hi, I'm seeing the exact same behavior and error, but relatively randomly and seemingly related to the prompt content. I've used several of the Torch models 2b and 7b (not quantized) with cuda 11.8 device on Ubuntu 20.04. I'm able to replicate it by adding some digits into the default prompt like: prompt="The meaning of life 189 is". Hopefully someone else can verify this? |
Thanks @vupjing and @xrbailey for opening this issue and giving some details! I was able to reproduce your issue for 7b (I couldn't reproduce for 2b). I used a machine with 8 A100 chips with cuda 11.8 on Debian 10. My results can be replicated with the following steps
I will investigate further to find the root cause. Thanks again for reporting! |
@pengchongjin the hardware is a 3080ti laptop (16G VRAM), windows, (torch / 2.1.1+cu118), ( torchvision / 0.16.1+cu118)
@michaelmoynihan even a single "attention" will trigger this issue , the below is the command:
|
Due to dtype of bfloat16? |
I've encountered a peculiar issue with the Gemma model that I'd like to share. I'm consistently getting a RuntimeError: probability tensor contains either inf, nan or an element < 0 under specific circumstances when using different prompts. Here are the details: I use 3 differernt prompt: prompt1:'Gemma is a large language model, please introduce it.' prompt2: 'Gemma is an instruction tuned model with seven billion parameter, please introduce it.' prompt3 'Gemma is an instruction tuned model with 7 billion parameter, please introduce it.' I do not reload the model between runs; I only change the prompts and execute Interestingly, if I first run the model with prompt 1 and then with prompt 2, the second prompt almost always succeeds. This behavior seems to suggest some form of state-dependency or sensitivity to the numerical representation in the prompts. I thought this might be of interest to the community, especially if others are experiencing similar issues. |
Same observation I made. Just putting an integer into the prompt seems to be the most likely to make it happen. Perhaps a bug in the tokenizer or other state dependency in the way the model is represented in memory; as @ShadovvSinger suggests? |
Try loading the model with |
Question as the above title, some prompt it can work, for example, the default prompt " the meaning of the life", but the below prompt cannot work.
"the self-attention is important for transformer because"
some basic debug info is the below:
===DEBUG===: after model -> hidden_states = tensor([[[-10.6953, 3.7734, 0.0226, ..., -0.6284, -1.8652, -1.2998]]],
device='cuda:0', dtype=torch.float16)
===DEBUG===: hidden_states = tensor([[[-21.8125, -1.1279, 4.8867, ..., -6.3945, 0.7524, 4.8867]]],
device='cuda:0', dtype=torch.float16) kv_write_indices= tensor([12], device='cuda:0')
===DEBUG===: after model -> hidden_states = tensor([[[-6.8164, 2.2676, 0.6655, ..., 1.5391, -2.5996, -2.0840]]],
device='cuda:0', dtype=torch.float16)
===DEBUG===: hidden_states = tensor([[[-18.0000, -0.4390, 5.7070, ..., -5.7070, 1.7559, 0.8779]]],
device='cuda:0', dtype=torch.float16) kv_write_indices= tensor([13], device='cuda:0')
===DEBUG===: after model -> hidden_states = tensor([[[nan, nan, nan, ..., nan, nan, nan]]], device='cuda:0',
dtype=torch.float16)
Traceback (most recent call last):
after running for a while, the hidden_states will become nan after GemmaModel.forward.
The text was updated successfully, but these errors were encountered: