-
Notifications
You must be signed in to change notification settings - Fork 13.3k
ci : migrate ggml ci to self-hosted runners #16116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e05d6c2
to
b50a96f
Compare
f0704eb
to
3848517
Compare
It'll definitely be good to have a Vulkan CI. Is it not crashing now on Nvidia? BTW the V100 is pretty old and I wonder if it's worth upgrading to something newer with coopmat and coopmat2 support. Budget wise it should be possible to move the existing arm and x86 runs to the free Github machines (I think 16GB and 4 cores are enough for the CI) or have the amx machine do double duty as a x86 CPU runner. |
On both V100 and T4, the F16 EXP tests are failing because instead of
Is this a known issue and can we workaround? |
Maybe #15652? |
Yes, probably a patch like this is needed. I'll leave it to be fixed on For the Vulkan runs, we now have:
Let me know if there are any additional Vulkan configs that we can add. Or any machines from the Azure cloud that can be useful to exercise in addition to the current ones. |
For Vulkan your runs already cover most cases including the FP16 path (Mac with MoltenVK), the int dot path (V100), and coopmat 1 and 2 (T4). We also already run the tests for the FP16 path with llvmpipe using the standard Github machines. The only remaining path is the FP32 one (run with If you're thinking of getting another machine then the NVads V710 is an AMD one which could be used for the ROCM ci as well as an additional Vulkan testcase for benchmarking and detecting driver bugs. Having an Intel GPU would be nice for SYCL and Vulkan as well but Azure doesn't have those. FYI while I was looking into this I also discovered that the V100 Azure machines will be discontinued at the end of this month so that'll have to be dealt with eventually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good starting point. Some notes:
- A lot of this could be consolidated into a single job using a matrix
- It may be too heavy to run the full ggml-ci on every PR push, it may be necessary to leave the full version for master only, and use lighter tests for PRs
@netrunnereve Do you know which driver I need to install for the AMD V710 GPU? https://github.com/ggml-org/llama.cpp/actions/runs/17892383263/job/50874519651?pr=16116 Currently, I have installed the Vulkan SDK and the project builds. But it does not detect the GPU. Edit: I think I found it: https://learn.microsoft.com/en-us/azure/virtual-machines/linux/azure-n-series-amd-gpu-driver-linux-installation-guide |
Perhaps limit them to pushes to the related backend? |
* ci : migrate ggml ci to a self-hosted runners * ci : add T4 runner * ci : add instructions for adding self-hosted runners * ci : disable test-backend-ops from debug builds due to slowness * ci : add AMD V710 runner (vulkan) * cont : add ROCM workflow * ci : switch to qwen3 0.6b model * cont : fix the context size
ggml-ci
into commits to trigger the CItest-backend-ops
from Debug builds due to slowness (alternatively, we can add a Release build config with asserts enabled and run just that)