Skip to content

Conversation

@jiachengjason
Copy link
Contributor

  1. Patched failed test case MUL_MAT(type_a=q4_0,type_b=f32,m=576,n=512,k=576,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) for enabling WMMA on RDNA4 (verified all test cases passing when running ./build/bin/test-backend-ops test -o MUL_MAT

  2. Quick clean up on mma.cuh to add ggml_cuda_memcpy_1 back in for half2 and bfloat162

for #17156

…76,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) for enabling WMMA on RDNA4
@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Nov 25, 2025
@meven3000
Copy link

Can confirm this resolves the incorrect Qwen model output.
Thanks

@JohannesGaessler JohannesGaessler merged commit 3e18dba into ggml-org:master Nov 26, 2025
59 of 63 checks passed
am17an pushed a commit to am17an/llama.cpp that referenced this pull request Nov 27, 2025
…7502)

* patch failed test case MUL_MAT(type_a=q4_0,type_b=f32,m=576,n=512,k=576,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) for enabling WMMA on RDNA4

* Quick clean up on mma.cuh to add ggml_cuda_memcpy_1 back in for half2 and bfloat162
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants