Skip to content

ci: split backend-jobs into single-arch and multi-arch matrices#9746

Merged
mudler merged 1 commit into
masterfrom
ci/split-backend-matrix
May 10, 2026
Merged

ci: split backend-jobs into single-arch and multi-arch matrices#9746
mudler merged 1 commit into
masterfrom
ci/split-backend-matrix

Conversation

@localai-bot
Copy link
Copy Markdown
Collaborator

Summary

Fix for the merge-job GC failure observed in run 25612992409: split the linux backend matrix into two separate jobs so backend-merge-jobs only waits on the multi-arch entries (which feed it) instead of the whole matrix (which includes 6h-cold-build single-arch CUDA/ROCm/vLLM entries).

What happened

`-cpu-llama-cpp` per-arch builds pushed their digests at 22:16 / 22:37 May 9. The merge job didn't fire until 12:47 May 10 — 14h31m later — because it gated on the entire `backend-jobs` matrix, and a single-arch `-gpu-nvidia-cuda-12-llama-cpp` slot was queued + cold-built for that whole window. By then, quay's untagged-manifest GC had reaped the per-arch digests:

```
ERROR: quay.io/go-skynet/local-ai-backends@sha256:fdbd93ca...: not found
```

Note: `provenance: false` (the previous fix from PR #9730) IS being applied correctly — `--attest type=provenance,disabled=true` is in the buildx command line. This failure is a separate, GC-driven issue.

What changes

scripts/changed-backends.js — new splitByArch() helper partitions filtered linux entries by whether platform-tag is set:

  • Entries with platform-tag → multi-arch (paired per-arch legs that feed backend-merge-jobs)
  • Entries without platform-tag → single-arch (heavy standalone builds)

Emits matrix-singlearch + matrix-multiarch + has-backends-singlearch + has-backends-multiarch outputs in both full-matrix and filtered-matrix paths.

.github/workflows/backend.yml + backend_pr.yml — replace the single backend-jobs job with two:

Job Consumes needs:
backend-jobs-multiarch matrix-multiarch generate-matrix
backend-jobs-singlearch matrix-singlearch generate-matrix
backend-merge-jobs merge-matrix [generate-matrix, backend-jobs-multiarch] (CHANGED)

backend-jobs-darwin is unchanged — Darwin matrix isn't split (no per-arch legs).

Smoke test

matrix-singlearch:    199 entries
matrix-multiarch:      72 entries  (36 backends × 2 per-arch legs)
matrix-darwin:         35 entries
merge-matrix:          36 entries  (one per multi-arch backend pair)

Total linux: 271, matches expected. No matrix entry is lost or double-counted.

Test plan

  • PR-side backend_pr.yml runs cleanly with the new outputs.
  • After merge, the next master push exercises the new split. Multi-arch backend builds + merge job complete within ~2-3h regardless of any concurrent slow CUDA/ROCm singlearch builds.
  • No remaining "manifest not found" failures from quay GC on slow runs.
  • Tag pushes still rebuild everything (FORCE_ALL=true → emitFullMatrix → both singlearch and multiarch get full lists).

Why this also makes us more robust

Before: a single slow heavy backend in the matrix could starve the merge job. Now the merge fires as soon as the per-arch legs feeding it finish, independent of unrelated slow builds. Even if quay's GC window tightens in the future, the wait is bounded by the slowest multi-arch leg (typically <2h for the C++ + lighter Python backends) rather than the slowest entry in the entire matrix.

Assisted-by: Claude:claude-opus-4-7

Symptom (run 25612992409): backend-merge-jobs failed with
"quay.io/go-skynet/local-ai-backends@sha256:fdbd93ca...: not found"
even though the per-arch build for -cpu-llama-cpp pushed that exact
digest 14h31m earlier.

Root cause: backend-merge-jobs was gated on the WHOLE backend-jobs
matrix (`needs: backend-jobs`). The multi-arch -cpu-llama-cpp legs
finished within 30 min, but a single-arch CUDA-12-llama-cpp slot in
the same matrix queued for ~8h (max-parallel: 8 throttle) and then
took ~6h to build cold. By the time it freed the merge to run, quay's
GC had reaped the per-arch digests pushed by the fast multi-arch legs
the day before.

Fix: split the linux backend matrix in two.

  backend-jobs-multiarch  - entries with `platform-tag` set (paired
    per-arch legs that feed backend-merge-jobs).
  backend-jobs-singlearch - entries without `platform-tag` (heavy
    standalone builds: CUDA, ROCm, Intel oneAPI, vLLM, sglang, etc.).

backend-merge-jobs now `needs:` only backend-jobs-multiarch. The
multi-arch matrix completes in ~2-3h, well inside quay's GC window.
Heavy single-arch entries keep running independently with no merge
dependency.

scripts/changed-backends.js gains a splitByArch() helper that
partitions filtered entries by whether `platform-tag` is set, and
emits matrix-singlearch + matrix-multiarch + has-backends-singlearch
+ has-backends-multiarch outputs (replacing the previous combined
matrix / has-backends pair). Applied in both the full-matrix and
filtered-matrix code paths. Smoke test: 199 single-arch + 72 multi-
arch + 35 darwin = 271 total entries; 36 merge-matrix entries
(one per multi-arch backend pair). Matches expectation.

Local `make backends/<name>` is unaffected — the script's outputs
only feed CI workflow matrices.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler merged commit 35f6db8 into master May 10, 2026
52 of 53 checks passed
@mudler mudler deleted the ci/split-backend-matrix branch May 10, 2026 16:15
mudler added a commit that referenced this pull request May 11, 2026
backend_build.yml pushes by canonical digest only (push-by-digest=true,
no tags applied at build time). User-facing tagging happens in
backend_merge.yml's `imagetools create` step. Before this commit,
scripts/changed-backends.js emitted a merge entry only for tag-suffixes
with 2+ legs, so every single-arch backend (CUDA/ROCm/Intel Python
images, vLLM, sglang, transformers, diffusers, ...) pushed its digest
untagged and stayed that way until quay's GC reaped it. Symptom: tag
releases shipped multi-arch backends tagged correctly, but no
v<X>-gpu-nvidia-cuda-12-vllm (or any singleton variant) ever appeared
in the registry.

Changes:

- scripts/changed-backends.js drops the `group.length < 2` skip and
  emits two merge matrices, one per arch class, so each downstream
  merge job can `needs:` only its corresponding build matrix.
- backend.yml splits backend-merge-jobs into multiarch and singlearch
  variants. The split preserves PR #9746's fix: slow singlearch CUDA
  builds (~6h) must not gate multiarch merges, or quay's GC reaps the
  multiarch per-arch digests before they're tagged.
- backend_pr.yml mirrors the split.
- backend_build.yml renames the digest artifact from
  `digests<suffix>-<platform-tag>` to
  `digests<suffix>--<platform-tag-or-"single">`. The `--` separator
  prevents the merge-side glob from over-matching sibling backends
  whose tag-suffix is a prefix of ours (e.g. -cpu-vllm vs
  -cpu-vllm-omni, -cpu-mlx vs -cpu-mlx-audio); the `single` placeholder
  keeps the name well-formed when platform-tag is empty.
- backend_merge.yml updates the download pattern to match.

Verified locally: a tag-push event now expands to 36 multiarch merge
entries (= 72 builds / 2 legs) and 199 singlearch merge entries (one
per singleton, including -gpu-nvidia-cuda-12-vllm at index 24).

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants