Skip to content

Conversation

jeffbolznv
Copy link
Collaborator

I've only looked at the directed perf tests, and there may still be more to do. But this gets to the same order of magnitude as Q4_K (included for comparison):

before:
  MUL_MAT(type_a=q4_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                  626 runs -  1597.97 us/run -  60.13 GFLOP/run -  37.63 TFLOPS
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):               276 runs -  3629.79 us/run -  60.13 GFLOP/run -  16.57 TFLOPS
  MUL_MAT(type_a=iq2_xs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                252 runs -  3981.24 us/run -  60.13 GFLOP/run -  15.10 TFLOPS
  MUL_MAT(type_a=iq2_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                 338 runs -  2970.81 us/run -  60.13 GFLOP/run -  20.24 TFLOPS
  MUL_MAT(type_a=iq3_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):               284 runs -  3525.55 us/run -  60.13 GFLOP/run -  17.06 TFLOPS
  MUL_MAT(type_a=iq3_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                 330 runs -  3039.16 us/run -  60.13 GFLOP/run -  19.78 TFLOPS

after:
  MUL_MAT(type_a=q4_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                  626 runs -  1598.70 us/run -  60.13 GFLOP/run -  37.61 TFLOPS
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):               632 runs -  1585.54 us/run -  60.13 GFLOP/run -  37.92 TFLOPS
  MUL_MAT(type_a=iq2_xs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                552 runs -  1811.94 us/run -  60.13 GFLOP/run -  33.19 TFLOPS
  MUL_MAT(type_a=iq2_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                 496 runs -  2016.71 us/run -  60.13 GFLOP/run -  29.82 TFLOPS
  MUL_MAT(type_a=iq3_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):               670 runs -  1494.11 us/run -  60.13 GFLOP/run -  40.24 TFLOPS
  MUL_MAT(type_a=iq3_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                 670 runs -  1496.61 us/run -  60.13 GFLOP/run -  40.18 TFLOPS

@jeffbolznv jeffbolznv requested a review from 0cc4m January 30, 2025 17:42
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning labels Jan 30, 2025
Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@0cc4m 0cc4m merged commit 2c6c8df into ggml-org:master Feb 6, 2025
41 checks passed
tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025
* vulkan: optimize coopmat2 iq2/iq3 callbacks

* build: trigger CI on GLSL compute shader changes
orca-zhang pushed a commit to orca-zhang/llama.cpp that referenced this pull request Feb 26, 2025
* vulkan: optimize coopmat2 iq2/iq3 callbacks

* build: trigger CI on GLSL compute shader changes
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025
* vulkan: optimize coopmat2 iq2/iq3 callbacks

* build: trigger CI on GLSL compute shader changes
mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025
* vulkan: optimize coopmat2 iq2/iq3 callbacks

* build: trigger CI on GLSL compute shader changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants