ci: split backend-jobs into single-arch and multi-arch matrices by localai-bot · Pull Request #9746 · mudler/LocalAI

localai-bot · 2026-05-10T16:14:04Z

Summary

Fix for the merge-job GC failure observed in run 25612992409: split the linux backend matrix into two separate jobs so backend-merge-jobs only waits on the multi-arch entries (which feed it) instead of the whole matrix (which includes 6h-cold-build single-arch CUDA/ROCm/vLLM entries).

What happened

`-cpu-llama-cpp` per-arch builds pushed their digests at 22:16 / 22:37 May 9. The merge job didn't fire until 12:47 May 10 — 14h31m later — because it gated on the entire `backend-jobs` matrix, and a single-arch `-gpu-nvidia-cuda-12-llama-cpp` slot was queued + cold-built for that whole window. By then, quay's untagged-manifest GC had reaped the per-arch digests:

```
ERROR: quay.io/go-skynet/local-ai-backends@sha256:fdbd93ca...: not found
```

Note: `provenance: false` (the previous fix from PR #9730) IS being applied correctly — `--attest type=provenance,disabled=true` is in the buildx command line. This failure is a separate, GC-driven issue.

What changes

scripts/changed-backends.js — new splitByArch() helper partitions filtered linux entries by whether platform-tag is set:

Entries with platform-tag → multi-arch (paired per-arch legs that feed backend-merge-jobs)
Entries without platform-tag → single-arch (heavy standalone builds)

Emits matrix-singlearch + matrix-multiarch + has-backends-singlearch + has-backends-multiarch outputs in both full-matrix and filtered-matrix paths.

.github/workflows/backend.yml + backend_pr.yml — replace the single backend-jobs job with two:

Job	Consumes	`needs:`
`backend-jobs-multiarch`	`matrix-multiarch`	`generate-matrix`
`backend-jobs-singlearch`	`matrix-singlearch`	`generate-matrix`
`backend-merge-jobs`	`merge-matrix`	`[generate-matrix, backend-jobs-multiarch]` (CHANGED)

backend-jobs-darwin is unchanged — Darwin matrix isn't split (no per-arch legs).

Smoke test

matrix-singlearch:    199 entries
matrix-multiarch:      72 entries  (36 backends × 2 per-arch legs)
matrix-darwin:         35 entries
merge-matrix:          36 entries  (one per multi-arch backend pair)

Total linux: 271, matches expected. No matrix entry is lost or double-counted.

Test plan

PR-side backend_pr.yml runs cleanly with the new outputs.
After merge, the next master push exercises the new split. Multi-arch backend builds + merge job complete within ~2-3h regardless of any concurrent slow CUDA/ROCm singlearch builds.
No remaining "manifest not found" failures from quay GC on slow runs.
Tag pushes still rebuild everything (FORCE_ALL=true → emitFullMatrix → both singlearch and multiarch get full lists).

Why this also makes us more robust

Before: a single slow heavy backend in the matrix could starve the merge job. Now the merge fires as soon as the per-arch legs feeding it finish, independent of unrelated slow builds. Even if quay's GC window tightens in the future, the wait is bounded by the slowest multi-arch leg (typically <2h for the C++ + lighter Python backends) rather than the slowest entry in the entire matrix.

Assisted-by: Claude:claude-opus-4-7

Symptom (run 25612992409): backend-merge-jobs failed with "quay.io/go-skynet/local-ai-backends@sha256:fdbd93ca...: not found" even though the per-arch build for -cpu-llama-cpp pushed that exact digest 14h31m earlier. Root cause: backend-merge-jobs was gated on the WHOLE backend-jobs matrix (`needs: backend-jobs`). The multi-arch -cpu-llama-cpp legs finished within 30 min, but a single-arch CUDA-12-llama-cpp slot in the same matrix queued for ~8h (max-parallel: 8 throttle) and then took ~6h to build cold. By the time it freed the merge to run, quay's GC had reaped the per-arch digests pushed by the fast multi-arch legs the day before. Fix: split the linux backend matrix in two. backend-jobs-multiarch - entries with `platform-tag` set (paired per-arch legs that feed backend-merge-jobs). backend-jobs-singlearch - entries without `platform-tag` (heavy standalone builds: CUDA, ROCm, Intel oneAPI, vLLM, sglang, etc.). backend-merge-jobs now `needs:` only backend-jobs-multiarch. The multi-arch matrix completes in ~2-3h, well inside quay's GC window. Heavy single-arch entries keep running independently with no merge dependency. scripts/changed-backends.js gains a splitByArch() helper that partitions filtered entries by whether `platform-tag` is set, and emits matrix-singlearch + matrix-multiarch + has-backends-singlearch + has-backends-multiarch outputs (replacing the previous combined matrix / has-backends pair). Applied in both the full-matrix and filtered-matrix code paths. Smoke test: 199 single-arch + 72 multi- arch + 35 darwin = 271 total entries; 36 merge-matrix entries (one per multi-arch backend pair). Matches expectation. Local `make backends/<name>` is unaffected — the script's outputs only feed CI workflow matrices. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

backend_build.yml pushes by canonical digest only (push-by-digest=true, no tags applied at build time). User-facing tagging happens in backend_merge.yml's `imagetools create` step. Before this commit, scripts/changed-backends.js emitted a merge entry only for tag-suffixes with 2+ legs, so every single-arch backend (CUDA/ROCm/Intel Python images, vLLM, sglang, transformers, diffusers, ...) pushed its digest untagged and stayed that way until quay's GC reaped it. Symptom: tag releases shipped multi-arch backends tagged correctly, but no v<X>-gpu-nvidia-cuda-12-vllm (or any singleton variant) ever appeared in the registry. Changes: - scripts/changed-backends.js drops the `group.length < 2` skip and emits two merge matrices, one per arch class, so each downstream merge job can `needs:` only its corresponding build matrix. - backend.yml splits backend-merge-jobs into multiarch and singlearch variants. The split preserves PR #9746's fix: slow singlearch CUDA builds (~6h) must not gate multiarch merges, or quay's GC reaps the multiarch per-arch digests before they're tagged. - backend_pr.yml mirrors the split. - backend_build.yml renames the digest artifact from `digests<suffix>-<platform-tag>` to `digests<suffix>--<platform-tag-or-"single">`. The `--` separator prevents the merge-side glob from over-matching sibling backends whose tag-suffix is a prefix of ours (e.g. -cpu-vllm vs -cpu-vllm-omni, -cpu-mlx vs -cpu-mlx-audio); the `single` placeholder keeps the name well-formed when platform-tag is empty. - backend_merge.yml updates the download pattern to match. Verified locally: a tag-push event now expands to 36 multiarch merge entries (= 72 builds / 2 legs) and 199 singlearch merge entries (one per singleton, including -gpu-nvidia-cuda-12-vllm at index 24). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler merged commit 35f6db8 into master May 10, 2026
52 of 53 checks passed

mudler deleted the ci/split-backend-matrix branch May 10, 2026 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: split backend-jobs into single-arch and multi-arch matrices#9746

ci: split backend-jobs into single-arch and multi-arch matrices#9746
mudler merged 1 commit into
masterfrom
ci/split-backend-matrix

localai-bot commented May 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented May 10, 2026

Summary

What happened

What changes

Smoke test

Test plan

Why this also makes us more robust

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants