Skip to content

graph : ensure DS32 kq_mask_lid is F32#23864

Open
CISC wants to merge 1 commit into
masterfrom
cisc/graph-ds32-lid-mask-fix
Open

graph : ensure DS32 kq_mask_lid is F32#23864
CISC wants to merge 1 commit into
masterfrom
cisc/graph-ds32-lid-mask-fix

Conversation

@CISC
Copy link
Copy Markdown
Member

@CISC CISC commented May 29, 2026

Overview

cont #23346
cont #23764

Additional information

Since build_attn_inp_kq_mask returns F16 mask when flash attention is enabled, pass a modified copy of cparams for kq_mask_lid.

// mask indexer scores
ggml_tensor * indexer_kq_mask = inp_attn_dsa->get_kq_mask_lid();
indexer_score = ggml_add(ctx0, indexer_score, indexer_kq_mask);
cb(indexer_score, "indexer_score", il);

This is a bit hacky, open for better solutions. cc/ @am17an

Requirements

@CISC CISC requested a review from ggerganov May 29, 2026 10:48
@am17an
Copy link
Copy Markdown
Contributor

am17an commented May 29, 2026

Does this mask need to be f32?

@CISC
Copy link
Copy Markdown
Member Author

CISC commented May 29, 2026

Does this mask need to be f32?

Either that or we have to cast indexer_score to F16.

@fairydreaming
Copy link
Copy Markdown
Collaborator

So... I checked how DeepSeek V3.2 works in master (a couple of hours too late) and ended up here. But this PR helps, ggml_cuda_op_add error is gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants