Skip to content

Vulkan: possible NaN propagation on llama-3 8B (more testing required) #6874

@stduhpf

Description

@stduhpf

Sometimes when playing around with the new Llama-3 models with the Vulkan backend (on the server example) I ended up in a situation where the model would suddenly start generating complete gibberish. Once this happens, the server keeps generating garbage only, even when evaluating a new prompt that used to work before.

A server restart fixes the output. (until the next time it happens)

My setup:
GPU: Vulkan device: AMD Radeon RX 5700 XT | uma: 0 | fp16: 1 | warp size: 64 (gfx 1010),
OS: Windows 10 22H2

I suspect some operations are randomly generating NaNs, which stay even after clearing the KV cache. Reminds me a bit of #5243, except it doesn't always happen.

I'll try to build a simple setup to consistently cause this issue.

Edit: I can't find a new prompt that causes that problem, and I can't really share the one I already have, if I try to remove the sensitive information, it doesn't cause the issue anymore... The one I have consiently crashes the Llama-3-8B base model (tested with Q3_K_S/Q3_K_M/Q4_K_S) , but not the instruct model. No issue with the same prompt on other backends.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions