Vulkan: possible NaN propagation on llama-3 8B (more testing required)

Sometimes when playing around with the new Llama-3 models with the Vulkan backend (on the server example) I ended up in a situation where the model would suddenly start generating complete gibberish. Once this happens, the server keeps generating garbage only, even when evaluating a new prompt that used to work before. 

A server restart fixes the output. (until the next time it happens)

My setup: 
GPU: `Vulkan device: AMD Radeon RX 5700 XT | uma: 0 | fp16: 1 | warp size: 64` (gfx 1010),
OS: Windows 10 22H2

I suspect some operations are randomly generating NaNs, which stay even after clearing the KV cache. Reminds me a bit of https://github.com/ggerganov/llama.cpp/issues/5243, except it doesn't always happen.

I'll try to build a simple setup to consistently cause this issue.

Edit: I can't find a new prompt that causes that problem, and I can't really share the one I already have, if I try to remove the sensitive information, it doesn't cause the issue anymore... The one I have consiently crashes the Llama-3-8B base model (tested with Q3_K_S/Q3_K_M/Q4_K_S) , but not the instruct model. No issue with the same prompt on other backends.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vulkan: possible NaN propagation on llama-3 8B (more testing required) #6874

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Vulkan: possible NaN propagation on llama-3 8B (more testing required) #6874

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions