-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Description
Sometimes when playing around with the new Llama-3 models with the Vulkan backend (on the server example) I ended up in a situation where the model would suddenly start generating complete gibberish. Once this happens, the server keeps generating garbage only, even when evaluating a new prompt that used to work before.
A server restart fixes the output. (until the next time it happens)
My setup:
GPU: Vulkan device: AMD Radeon RX 5700 XT | uma: 0 | fp16: 1 | warp size: 64 (gfx 1010),
OS: Windows 10 22H2
I suspect some operations are randomly generating NaNs, which stay even after clearing the KV cache. Reminds me a bit of #5243, except it doesn't always happen.
I'll try to build a simple setup to consistently cause this issue.
Edit: I can't find a new prompt that causes that problem, and I can't really share the one I already have, if I try to remove the sensitive information, it doesn't cause the issue anymore... The one I have consiently crashes the Llama-3-8B base model (tested with Q3_K_S/Q3_K_M/Q4_K_S) , but not the instruct model. No issue with the same prompt on other backends.