- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.5k
Closed
Labels
Nvidia GPUIssues specific to Nvidia GPUsIssues specific to Nvidia GPUs
Description
Noticed a 10% performance loss in tg on the AGX Orin this week, a bisect led me to f77c13b (#16715).
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: Orin, compute capability 8.7, VMM: yes
| model | size | params | backend | ngl | threads | n_ubatch | fa | mmap | test | t/s | 
|---|---|---|---|---|---|---|---|---|---|---|
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | CUDA | 99 | 1 | 2048 | 1 | 0 | tg32 | 37.09 ± 0.58 | 
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | CUDA | 99 | 1 | 2048 | 1 | 0 | tg64 | 37.31 ± 0.05 | 
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | CUDA | 99 | 1 | 2048 | 1 | 0 | tg128 | 37.33 ± 0.02 | 
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | CUDA | 99 | 1 | 2048 | 1 | 0 | tg512 | 37.20 ± 0.01 | 
build: 3cfa9c3 (6840)
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: Orin, compute capability 8.7, VMM: yes
| model | size | params | backend | ngl | threads | n_ubatch | fa | mmap | test | t/s | 
|---|---|---|---|---|---|---|---|---|---|---|
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | CUDA | 99 | 1 | 2048 | 1 | 0 | tg32 | 33.21 ± 0.44 | 
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | CUDA | 99 | 1 | 2048 | 1 | 0 | tg64 | 33.39 ± 0.04 | 
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | CUDA | 99 | 1 | 2048 | 1 | 0 | tg128 | 33.40 ± 0.02 | 
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | CUDA | 99 | 1 | 2048 | 1 | 0 | tg512 | 33.29 ± 0.01 | 
build: f77c13b (6841)
Metadata
Metadata
Assignees
Labels
Nvidia GPUIssues specific to Nvidia GPUsIssues specific to Nvidia GPUs