ci: finish GHA free-tier migration (per-arch fan-out, image splits, retire self-hosted, fix provenance) by localai-bot · Pull Request #9730 · mudler/LocalAI

localai-bot · 2026-05-09T07:20:25Z

Summary

Final consolidated migration PR. Eight commits, scoped for revertability.

#	Commit	Phase	Scope
1	`ba7771b3`	2.6	image_build.yml refactor + new image_merge.yml (with `-core` placeholder bug fix)
2	`ad490623`	2.5	Backend matrix fan-out: 34 multi-arch → 68 per-arch
3	`26096ee8`	2.6	image.yml + image-pr.yml splits + 2 image merge jobs
4	`7d436aa3`	4	Retire 5 single-arch `bigger-runner` entries → ubuntu-latest
5	`24bf7c5c`	5.1	9 Intel oneAPI backends → ubuntu-latest
6	`f1c4d669`	5.2	15 ROCm Python backends → ubuntu-latest
7	`c36435e6`	5.3	6 CUDA Python backends (vLLM/sglang) → ubuntu-latest
8	`37c2a05b`	fix	`provenance: false` on multi-registry digest push (root-caused on master via #9727 pilot)

Critical fix in this PR (commit 8)

The pilot in #9727 surfaced a real bug: docker/build-push-action@v7 with push-by-digest=true to two registries generates a per-registry provenance attestation, making the manifest-list digest diverge across registries. steps.build.outputs.digest then only matches one of them, and the merge step's imagetools create <reg>@sha256:<digest> fails on the other. Concrete failure observed in run 25581983094 / job 75110021491: arm64 build of -cpu-faster-whisper had quay manifest list 66b4cfc8… and dockerhub list e0733c3b…; merge tried quay.io/...@sha256:e0733c3b... and got "not found".

Setting provenance: false on the Build and push by digest step makes the manifest-list digest pure content and identical across registries.

State after merge

Zero arc-runner-set references in any workflow / matrix file.
Zero bigger-runner references in .github/backend-matrix.yml. (One residual reference in test-extra.yml for the vibevoice-cpp transcription test — out of scope here, separate concern.)
All 36 multi-arch backends build natively per arch (amd64 on ubuntu-latest, arm64 on ubuntu-24.04-arm) and merge into a single tagged manifest list.
The 4 multi-arch backends previously on bigger-runner (-cpu-llama-cpp, -cpu-turboquant, -gpu-vulkan-llama-cpp, -gpu-vulkan-turboquant) also flipped to free tier — Phase 4 work bundled with their split.
LocalAI server image's core and -gpu-vulkan builds are now per-arch native + manifest-merge.

Test plan

Master push after merge schedules: per-arch backend builds for any touched backend → backend-merge-jobs matrix → final tagged manifest list. imagetools inspect shows two platforms.
Touch one Intel oneAPI backend (Phase 5.1, lowest risk) → confirm it builds on ubuntu-latest with setup-build-disk /mnt headroom.
Touch one ROCm backend (Phase 5.2) → same.
Touch a CUDA vLLM/sglang backend (Phase 5.3, highest risk) → watch peak df -h /mnt during flash-attn build. If disk runs out, revert just commit 7 (c36435e6) and keep the rest.
Sunday weekly cron rebuilds full matrix without regression.

Revert plan

Each Phase 4/5 commit (4–7) is one runs-on: flip per backend. If a specific batch fails on master, the commit is the unit of revert. The infrastructure commits (1–3, 8) are foundational; the migration commits (4–7) are the risky load-bearing ones.

🤖 Generated with Claude Code
Assisted-by: Claude:claude-opus-4-7

Mirror the backend_build.yml + backend_merge.yml pattern shipped in PR #9726 for the LocalAI server image: - image_build.yml accepts optional platform-tag (default ''), scopes registry cache to cache-localai<suffix>-<platform-tag>, and pushes by canonical digest only on push events. Digests upload as artifacts named digests-localai<suffix>-<platform-tag>, with a "-core" placeholder when tag-suffix is empty so the merge job's download pattern doesn't over-match across multiple suffixes. - image_merge.yml is a new reusable workflow that downloads matching digest artifacts and assembles the final tagged manifest list via docker buildx imagetools create. Image names differ from backend_*.yml: the LocalAI server is published under quay.io/go-skynet/local-ai and localai/localai (not -backends). Not yet wired into image.yml / image-pr.yml — Commit C does that. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Convert all remaining linux/amd64,linux/arm64 entries in backend-matrix.yml to per-arch + manifest-merge form. Each was a single matrix entry running both arches on x86 under QEMU emulation; each becomes two entries — amd64 on ubuntu-latest, arm64 on ubuntu-24.04-arm (native). Four backends that were on bigger-runner (-cpu-llama-cpp, -cpu-turboquant, -gpu-vulkan-llama-cpp, -gpu-vulkan-turboquant) have both legs moved to free tier as part of the same change. They are compile-only (no torch/CUDA install) and fit comfortably with the setup-build-disk /mnt relocation. Phase 4 (next commit) retires the remaining 5 single-arch bigger-runner entries. After this commit: - 271 total matrix entries (was 237) - 0 multi-arch entries left - 36 per-arch pairs (34 new + 2 pilots from PR #9727) - 5 bigger-runner entries remaining (single-arch, Phase 4 target) Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Mirror the backend per-arch split for the main LocalAI image: - image.yml's core-image-build matrix: split the core ('') and -gpu-vulkan entries into amd64 + arm64 legs each. amd64 on ubuntu-latest, arm64 on ubuntu-24.04-arm (native). - New top-level core-image-merge and gpu-vulkan-image-merge jobs call image_merge.yml after core-image-build completes. - image-pr.yml's image-build matrix: split the -vulkan-core entry. No merge job added on the PR side — image_build.yml's digest-push is push-only-event-gated, so a PR-side merge would have nothing to download. After this commit, no workflow file references linux/amd64,linux/arm64 in a single matrix slot. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Migrate the remaining 5 single-arch bigger-runner entries to ubuntu-latest. Combined with the Phase 3 setup-build-disk /mnt relocation (PR #9726), free-tier ubuntu-latest now has ~100 GB of working space — enough for ROCm dev image (~16 GB), CUDA toolkit (~5 GB), and the per-backend compile/install steps these entries do. Backends migrated: - -gpu-nvidia-cuda-12-llama-cpp - -gpu-nvidia-cuda-12-turboquant - -gpu-rocm-hipblas-faster-whisper - -gpu-rocm-hipblas-coqui - -cpu-ik-llama-cpp After this commit, .github/backend-matrix.yml has zero bigger-runner references. The bigger-runner used in tests-vibevoice-cpp-grpc- transcription (test-extra.yml) is a separate concern handled in a follow-up. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Intel oneAPI base image is ~6 GB; each backend's wheel install stays well within the ~100 GB working space provided by Phase 3's setup-build-disk /mnt relocation. Lowest-risk batch of the arc-runner-set retirement. Backends migrated: vllm, sglang, vibevoice, qwen-asr, nemo, qwen-tts, fish-speech, voxcpm, pocket-tts (all -gpu-intel-* variants). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

ROCm dev image (~16 GB) plus per-backend torch/wheels install fits on ubuntu-latest with the /mnt-relocated Docker root. These entries include the heavier vLLM/sglang/transformers/diffusers stack on ROCm; if any specific backend OOMs or runs out of disk, individual flips back to arc-runner-set are revertable per-entry. Backends migrated: all 15 -gpu-rocm-hipblas-* entries previously on arc-runner-set (vllm/vllm-omni/sglang/transformers/diffusers/ ace-step/kokoro/vibevoice/qwen-asr/nemo/qwen-tts/fish-speech/ voxcpm/pocket-tts/neutts). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

vLLM/sglang stacks on CUDA 12 and CUDA 13 are the heaviest backends in the matrix — flash-attn intermediate layers can spike disk usage during build. setup-build-disk's /mnt relocation gives ~100 GB working space which fits the documented peak. Highest-risk batch of the arc-runner-set retirement; if any backend fails to build on free tier, the per-entry runs-on flip is the unit of revert. Backends migrated: -gpu-nvidia-cuda-{12,13}-{vllm,vllm-omni,sglang}. After this commit, .github/backend-matrix.yml has zero references to arc-runner-set or bigger-runner. The migration is complete. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

@sha256

Root-caused on master via PR #9727's pilot: when docker/build-push-action@v7 pushes a single build to TWO registries simultaneously with push-by-digest=true, buildx generates a per-registry provenance attestation manifest (because mode=max — the default for push:true — includes the runner ID). That makes the resulting manifest-list digest diverge across registries: arm64 -cpu-faster-whisper build: image manifest: sha256:d3bdd34b... (identical, content-only) quay manifest list: sha256:66b4cfc8... (with quay attestation) dockerhub manifest list: sha256:e0733c3b... (with dockerhub attestation) steps.build.outputs.digest returns only one of the list digests (empirically the dockerhub one). The merge job then asks "quay.io/...@sha256:e0733c3b..." which doesn't exist on quay — that list has digest 66b4cfc8 there. Result: imagetools create fails with "not found" and the merge job fails (run 25581983094, job 75110021491). Setting provenance: false drops the per-registry attestation; the manifest-list digest becomes pure content, identical across both registries, and steps.build.outputs.digest works on either lookup. Applied to backend_build.yml and image_build.yml — both refactored to use the same multi-registry digest-push pattern in the prior PRs. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler added 8 commits May 8, 2026 23:16

mudler merged commit f0374aa into master May 9, 2026
52 checks passed

mudler deleted the ci/finish-migration branch May 9, 2026 07:37

localai-bot added the enhancement New feature or request label May 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: finish GHA free-tier migration (per-arch fan-out, image splits, retire self-hosted, fix provenance)#9730

ci: finish GHA free-tier migration (per-arch fan-out, image splits, retire self-hosted, fix provenance)#9730
mudler merged 8 commits intomasterfrom
ci/finish-migration

localai-bot commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented May 9, 2026

Summary

Critical fix in this PR (commit 8)

State after merge

Test plan

Revert plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants