ci: split backend-jobs into single-arch and multi-arch matrices#9746
Merged
Conversation
Symptom (run 25612992409): backend-merge-jobs failed with
"quay.io/go-skynet/local-ai-backends@sha256:fdbd93ca...: not found"
even though the per-arch build for -cpu-llama-cpp pushed that exact
digest 14h31m earlier.
Root cause: backend-merge-jobs was gated on the WHOLE backend-jobs
matrix (`needs: backend-jobs`). The multi-arch -cpu-llama-cpp legs
finished within 30 min, but a single-arch CUDA-12-llama-cpp slot in
the same matrix queued for ~8h (max-parallel: 8 throttle) and then
took ~6h to build cold. By the time it freed the merge to run, quay's
GC had reaped the per-arch digests pushed by the fast multi-arch legs
the day before.
Fix: split the linux backend matrix in two.
backend-jobs-multiarch - entries with `platform-tag` set (paired
per-arch legs that feed backend-merge-jobs).
backend-jobs-singlearch - entries without `platform-tag` (heavy
standalone builds: CUDA, ROCm, Intel oneAPI, vLLM, sglang, etc.).
backend-merge-jobs now `needs:` only backend-jobs-multiarch. The
multi-arch matrix completes in ~2-3h, well inside quay's GC window.
Heavy single-arch entries keep running independently with no merge
dependency.
scripts/changed-backends.js gains a splitByArch() helper that
partitions filtered entries by whether `platform-tag` is set, and
emits matrix-singlearch + matrix-multiarch + has-backends-singlearch
+ has-backends-multiarch outputs (replacing the previous combined
matrix / has-backends pair). Applied in both the full-matrix and
filtered-matrix code paths. Smoke test: 199 single-arch + 72 multi-
arch + 35 darwin = 271 total entries; 36 merge-matrix entries
(one per multi-arch backend pair). Matches expectation.
Local `make backends/<name>` is unaffected — the script's outputs
only feed CI workflow matrices.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
mudler
added a commit
that referenced
this pull request
May 11, 2026
backend_build.yml pushes by canonical digest only (push-by-digest=true, no tags applied at build time). User-facing tagging happens in backend_merge.yml's `imagetools create` step. Before this commit, scripts/changed-backends.js emitted a merge entry only for tag-suffixes with 2+ legs, so every single-arch backend (CUDA/ROCm/Intel Python images, vLLM, sglang, transformers, diffusers, ...) pushed its digest untagged and stayed that way until quay's GC reaped it. Symptom: tag releases shipped multi-arch backends tagged correctly, but no v<X>-gpu-nvidia-cuda-12-vllm (or any singleton variant) ever appeared in the registry. Changes: - scripts/changed-backends.js drops the `group.length < 2` skip and emits two merge matrices, one per arch class, so each downstream merge job can `needs:` only its corresponding build matrix. - backend.yml splits backend-merge-jobs into multiarch and singlearch variants. The split preserves PR #9746's fix: slow singlearch CUDA builds (~6h) must not gate multiarch merges, or quay's GC reaps the multiarch per-arch digests before they're tagged. - backend_pr.yml mirrors the split. - backend_build.yml renames the digest artifact from `digests<suffix>-<platform-tag>` to `digests<suffix>--<platform-tag-or-"single">`. The `--` separator prevents the merge-side glob from over-matching sibling backends whose tag-suffix is a prefix of ours (e.g. -cpu-vllm vs -cpu-vllm-omni, -cpu-mlx vs -cpu-mlx-audio); the `single` placeholder keeps the name well-formed when platform-tag is empty. - backend_merge.yml updates the download pattern to match. Verified locally: a tag-push event now expands to 36 multiarch merge entries (= 72 builds / 2 legs) and 199 singlearch merge entries (one per singleton, including -gpu-nvidia-cuda-12-vllm at index 24). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix for the merge-job GC failure observed in run 25612992409: split the linux backend matrix into two separate jobs so
backend-merge-jobsonly waits on the multi-arch entries (which feed it) instead of the whole matrix (which includes 6h-cold-build single-arch CUDA/ROCm/vLLM entries).What happened
`-cpu-llama-cpp` per-arch builds pushed their digests at 22:16 / 22:37 May 9. The merge job didn't fire until 12:47 May 10 — 14h31m later — because it gated on the entire `backend-jobs` matrix, and a single-arch `-gpu-nvidia-cuda-12-llama-cpp` slot was queued + cold-built for that whole window. By then, quay's untagged-manifest GC had reaped the per-arch digests:
```
ERROR: quay.io/go-skynet/local-ai-backends@sha256:fdbd93ca...: not found
```
Note: `provenance: false` (the previous fix from PR #9730) IS being applied correctly — `--attest type=provenance,disabled=true` is in the buildx command line. This failure is a separate, GC-driven issue.
What changes
scripts/changed-backends.js— newsplitByArch()helper partitions filtered linux entries by whetherplatform-tagis set:platform-tag→ multi-arch (paired per-arch legs that feedbackend-merge-jobs)platform-tag→ single-arch (heavy standalone builds)Emits
matrix-singlearch+matrix-multiarch+has-backends-singlearch+has-backends-multiarchoutputs in both full-matrix and filtered-matrix paths..github/workflows/backend.yml+backend_pr.yml— replace the singlebackend-jobsjob with two:needs:backend-jobs-multiarchmatrix-multiarchgenerate-matrixbackend-jobs-singlearchmatrix-singlearchgenerate-matrixbackend-merge-jobsmerge-matrix[generate-matrix, backend-jobs-multiarch](CHANGED)backend-jobs-darwinis unchanged — Darwin matrix isn't split (no per-arch legs).Smoke test
Total linux: 271, matches expected. No matrix entry is lost or double-counted.
Test plan
backend_pr.ymlruns cleanly with the new outputs.Why this also makes us more robust
Before: a single slow heavy backend in the matrix could starve the merge job. Now the merge fires as soon as the per-arch legs feeding it finish, independent of unrelated slow builds. Even if quay's GC window tightens in the future, the wait is bounded by the slowest multi-arch leg (typically <2h for the C++ + lighter Python backends) rather than the slowest entry in the entire matrix.
Assisted-by: Claude:claude-opus-4-7