Skip to content

Conversation

@xi-guo-0
Copy link

Description:

This PR resolves an assertion failure in ggml_metal_op_flash_attn_ext that occurs when the has_kvpad variable evaluates to false.

Root Cause

I observed an inconsistency in how the has_kvpad variable is determined within ggml-metal-ops.cpp.

In one section of the file (around line 1980), the logic was intentionally hard-coded to true, likely as a temporary fix:

// const bool has_kvpad = ne11 % ncpsg != 0;
const bool has_kvpad = true;

However, later in the file (around line 2156), the original (and now inconsistent) logic was still being used:

const bool has_kvpad = ne11 % ncpsg != 0;

Command:

./llama-simple -m ~/Library/Caches/llama.cpp/ggml-org_gemma-3-1b-it-GGUF_gemma-3-1b-it-Q4_K_M.gguf

This inconsistency was causing the following assertion to fail in my case, as the code was hitting the un-patched else block:

ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_set_rows_f16_i64', name = 'kernel_set_rows_f16_i64'
ggml_metal_library_compile_pipeline: loaded kernel_set_rows_f16_i64                       0x101590890 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_cpy_f32_f16', name = 'kernel_cpy_f32_f16'
ggml_metal_library_compile_pipeline: loaded kernel_cpy_f32_f16                            0x101590b90 | th_max = 1024 | th_width =   32
Assertion failed: (ggml_metal_op_flash_attn_ext_extra_pad(op) == 0), function ggml_metal_op_flash_attn_ext, file ggml-metal-ops.cpp, line 2367.
<bos>Hello my name isProcess 89743 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = hit program assert
    frame #4: 0x0000000100485f34 libggml-metal.0.dylib`ggml_metal_op_flash_attn_ext(ctx=0x00000007ed1700c0, idx=16) at ggml-metal-ops.cpp:2367:13
   2364
   2365             need_sync = true;
   2366         } else {
-> 2367             assert(ggml_metal_op_flash_attn_ext_extra_pad(op) == 0);
   2368         }
   2369
   2370         if (need_sync) {

@xi-guo-0 xi-guo-0 requested a review from ggerganov as a code owner November 16, 2025 04:47
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Nov 16, 2025
@ggerganov
Copy link
Member

Fixed with #17295

@ggerganov ggerganov closed this Nov 16, 2025
@xi-guo-0 xi-guo-0 deleted the fix/metal-flash-attn-crash branch November 16, 2025 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants