Skip to content

tests: test-backend-ops -j <N> to run tests in parallel#23637

Merged
ggerganov merged 1 commit into
ggml-org:masterfrom
jeffbolznv:test-backend-ops-j
May 26, 2026
Merged

tests: test-backend-ops -j <N> to run tests in parallel#23637
ggerganov merged 1 commit into
ggml-org:masterfrom
jeffbolznv:test-backend-ops-j

Conversation

@jeffbolznv
Copy link
Copy Markdown
Contributor

Overview

Create a pool of N threads that grab a chunk of up to 100 tests at a time to iterate through. The number of tests at a time decreases as fewer remain.

Each thread uses its own dev and cpu backend, and set_n_threads_fn is not called on the cpu backend.

Fix some TSAN issues that arose:

  • In init_tensor_uniform, don't use static vector of generators.
  • Replace gmtime with versions that don't use a global variable.
  • Mutex calls to print_test_result.

This should be TSAN clean, at least running CPU backends.

Timings on my system (singlethreaded -> large N):
vulkan (with shaders cached) 3:10 -> 1:47
cuda 2:15 -> 0:55

There are locking issues in ggml-vulkan that prevent scaling when pipelines are being compiled, which I'll fix separately.

#23595 will help a bit with scaling (fixes a big stutter at the start). #23376 will help with vulkan.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES. I hand-wrote the initial implementation a while back, but used Claude to clean it up a bit and codex to triage and fix TSAN issues.

Create a pool of N threads that grab a chunk of up to 100 tests at a time to
iterate through. The number of tests at a time decreases as fewer remain.

Each thread uses its own dev and cpu backend, and set_n_threads_fn is not
called on the cpu backend.

Fix some TSAN issues that arose:
- In init_tensor_uniform, don't use static vector of generators.
- Replace gmtime with versions that don't use a global variable.
- Mutex calls to print_test_result.
Copy link
Copy Markdown
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - should utilize this in the CI runs in a follow-up PR

@ggerganov ggerganov added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label May 25, 2026
@ggerganov ggerganov merged commit 7623de1 into ggml-org:master May 26, 2026
61 of 63 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request May 26, 2026
* origin/master: (59 commits)
ggml-zendnn : fixed naming of matmul function (ggml-org#20964)
ci : do not allocate ccache for 3rd-party hosted runners (ggml-org#23730)
ci : move [no release] check to dedicated check_release job (ggml-org#23734)
ci : add `[no release]` keyword + fix sanitizer builds (ggml-org#23728)
ci : move macos jobs to the apple workflow + fix names (ggml-org#23721)
vulkan: optimize conv2d and implement coopmat1 support (ggml-org#22620)
ci : remove vulkan SDK dep from webgpu job (ggml-org#23718)
hexagon: add support for CONCAT op (ggml-org#23648)
ci : move more CPU jobs to self-hosted runners (ggml-org#23715)
ci : move sanitizer jobs to self-hosted runners (ggml-org#23713)
ci : reduce (disable SYCL and CANN builds/releases) (ggml-org#23705)
convert : support Gemma4ForCausalLM architecture (ggml-org#23682)
models : Attach Mistral3 NVFP4 weight scales (ggml-org#23629)
SYCL: implement ggml_sycl_pool_vmm (ggml-org#22862)
tests: test-backend-ops -j <N> to run tests in parallel (ggml-org#23637)
model : add support for talkie-1930-13b (ggml-org#22596)
ggml-webgpu: Add MMVQ path for Q4/Q8/Q2_K/Q4_K and clean up legacy MUL_MAT pipeline (ggml-org#23594)
[WebGPU] Check batch_compute_passes before sending passes when not doing GPU profiling (ggml-org#23457)
CUDA: missing PDL sync for FWHT, better fallback (ggml-org#23690)
metal : add apple device id (ggml-org#23566)
...
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
Create a pool of N threads that grab a chunk of up to 100 tests at a time to
iterate through. The number of tests at a time decreases as fewer remain.

Each thread uses its own dev and cpu backend, and set_n_threads_fn is not
called on the cpu backend.

Fix some TSAN issues that arose:
- In init_tensor_uniform, don't use static vector of generators.
- Replace gmtime with versions that don't use a global variable.
- Mutex calls to print_test_result.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants