Skip to content

[SYCL] fix flash_attention crash for Qwen3-Coder#21377

Merged
ggerganov merged 1 commit intoggml-org:masterfrom
arthw:fix_flash_atten
Apr 6, 2026
Merged

[SYCL] fix flash_attention crash for Qwen3-Coder#21377
ggerganov merged 1 commit intoggml-org:masterfrom
arthw:fix_flash_atten

Conversation

@arthw
Copy link
Copy Markdown
Contributor

@arthw arthw commented Apr 3, 2026

The code branches can't cover the case of Qwen3-Coder-Next-UD-IQ1_M.gguf.
Add code as final handler.
Verified the LLM and all related UT cases are passed.

@arthw arthw requested a review from a team as a code owner April 3, 2026 15:37
@arthw arthw requested a review from ggerganov April 3, 2026 15:37
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Apr 3, 2026
@ggerganov ggerganov added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Apr 3, 2026
@arthw
Copy link
Copy Markdown
Contributor Author

arthw commented Apr 5, 2026

@ggerganov
Please review and merge this PR!
Thank you!

@ggerganov ggerganov merged commit f51fd36 into ggml-org:master Apr 6, 2026
183 of 190 checks passed
iamwavecut pushed a commit to iamwavecut/llama-cpp-turboquant that referenced this pull request Apr 8, 2026
iamwavecut pushed a commit to iamwavecut/llama-cpp-turboquant that referenced this pull request Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants