Skip to content

ci: finish GHA free-tier migration (per-arch fan-out, image splits, retire self-hosted, fix provenance)#9730

Merged
mudler merged 8 commits intomasterfrom
ci/finish-migration
May 9, 2026
Merged

ci: finish GHA free-tier migration (per-arch fan-out, image splits, retire self-hosted, fix provenance)#9730
mudler merged 8 commits intomasterfrom
ci/finish-migration

Conversation

@localai-bot
Copy link
Copy Markdown
Collaborator

Summary

Final consolidated migration PR. Eight commits, scoped for revertability.

# Commit Phase Scope
1 ba7771b3 2.6 image_build.yml refactor + new image_merge.yml (with -core placeholder bug fix)
2 ad490623 2.5 Backend matrix fan-out: 34 multi-arch → 68 per-arch
3 26096ee8 2.6 image.yml + image-pr.yml splits + 2 image merge jobs
4 7d436aa3 4 Retire 5 single-arch bigger-runner entries → ubuntu-latest
5 24bf7c5c 5.1 9 Intel oneAPI backends → ubuntu-latest
6 f1c4d669 5.2 15 ROCm Python backends → ubuntu-latest
7 c36435e6 5.3 6 CUDA Python backends (vLLM/sglang) → ubuntu-latest
8 37c2a05b fix provenance: false on multi-registry digest push (root-caused on master via #9727 pilot)

Critical fix in this PR (commit 8)

The pilot in #9727 surfaced a real bug: docker/build-push-action@v7 with push-by-digest=true to two registries generates a per-registry provenance attestation, making the manifest-list digest diverge across registries. steps.build.outputs.digest then only matches one of them, and the merge step's imagetools create <reg>@sha256:<digest> fails on the other. Concrete failure observed in run 25581983094 / job 75110021491: arm64 build of -cpu-faster-whisper had quay manifest list 66b4cfc8… and dockerhub list e0733c3b…; merge tried quay.io/...@sha256:e0733c3b... and got "not found".

Setting provenance: false on the Build and push by digest step makes the manifest-list digest pure content and identical across registries.

State after merge

  • Zero arc-runner-set references in any workflow / matrix file.
  • Zero bigger-runner references in .github/backend-matrix.yml. (One residual reference in test-extra.yml for the vibevoice-cpp transcription test — out of scope here, separate concern.)
  • All 36 multi-arch backends build natively per arch (amd64 on ubuntu-latest, arm64 on ubuntu-24.04-arm) and merge into a single tagged manifest list.
  • The 4 multi-arch backends previously on bigger-runner (-cpu-llama-cpp, -cpu-turboquant, -gpu-vulkan-llama-cpp, -gpu-vulkan-turboquant) also flipped to free tier — Phase 4 work bundled with their split.
  • LocalAI server image's core and -gpu-vulkan builds are now per-arch native + manifest-merge.

Test plan

  • Master push after merge schedules: per-arch backend builds for any touched backend → backend-merge-jobs matrix → final tagged manifest list. imagetools inspect shows two platforms.
  • Touch one Intel oneAPI backend (Phase 5.1, lowest risk) → confirm it builds on ubuntu-latest with setup-build-disk /mnt headroom.
  • Touch one ROCm backend (Phase 5.2) → same.
  • Touch a CUDA vLLM/sglang backend (Phase 5.3, highest risk) → watch peak df -h /mnt during flash-attn build. If disk runs out, revert just commit 7 (c36435e6) and keep the rest.
  • Sunday weekly cron rebuilds full matrix without regression.

Revert plan

Each Phase 4/5 commit (4–7) is one runs-on: flip per backend. If a specific batch fails on master, the commit is the unit of revert. The infrastructure commits (1–3, 8) are foundational; the migration commits (4–7) are the risky load-bearing ones.

🤖 Generated with Claude Code
Assisted-by: Claude:claude-opus-4-7

mudler added 8 commits May 8, 2026 23:16
Mirror the backend_build.yml + backend_merge.yml pattern shipped in
PR #9726 for the LocalAI server image:

- image_build.yml accepts optional platform-tag (default ''), scopes
  registry cache to cache-localai<suffix>-<platform-tag>, and pushes
  by canonical digest only on push events. Digests upload as artifacts
  named digests-localai<suffix>-<platform-tag>, with a "-core"
  placeholder when tag-suffix is empty so the merge job's download
  pattern doesn't over-match across multiple suffixes.
- image_merge.yml is a new reusable workflow that downloads matching
  digest artifacts and assembles the final tagged manifest list via
  docker buildx imagetools create.

Image names differ from backend_*.yml: the LocalAI server is published
under quay.io/go-skynet/local-ai and localai/localai (not -backends).

Not yet wired into image.yml / image-pr.yml — Commit C does that.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Convert all remaining linux/amd64,linux/arm64 entries in
backend-matrix.yml to per-arch + manifest-merge form. Each was a
single matrix entry running both arches on x86 under QEMU emulation;
each becomes two entries — amd64 on ubuntu-latest, arm64 on
ubuntu-24.04-arm (native).

Four backends that were on bigger-runner (-cpu-llama-cpp,
-cpu-turboquant, -gpu-vulkan-llama-cpp, -gpu-vulkan-turboquant) have
both legs moved to free tier as part of the same change. They are
compile-only (no torch/CUDA install) and fit comfortably with the
setup-build-disk /mnt relocation. Phase 4 (next commit) retires the
remaining 5 single-arch bigger-runner entries.

After this commit:
- 271 total matrix entries (was 237)
- 0 multi-arch entries left
- 36 per-arch pairs (34 new + 2 pilots from PR #9727)
- 5 bigger-runner entries remaining (single-arch, Phase 4 target)

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Mirror the backend per-arch split for the main LocalAI image:

- image.yml's core-image-build matrix: split the core ('') and
  -gpu-vulkan entries into amd64 + arm64 legs each. amd64 on
  ubuntu-latest, arm64 on ubuntu-24.04-arm (native).
- New top-level core-image-merge and gpu-vulkan-image-merge jobs
  call image_merge.yml after core-image-build completes.
- image-pr.yml's image-build matrix: split the -vulkan-core entry.
  No merge job added on the PR side — image_build.yml's digest-push
  is push-only-event-gated, so a PR-side merge would have nothing
  to download.

After this commit, no workflow file references
linux/amd64,linux/arm64 in a single matrix slot.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Migrate the remaining 5 single-arch bigger-runner entries to
ubuntu-latest. Combined with the Phase 3 setup-build-disk /mnt
relocation (PR #9726), free-tier ubuntu-latest now has ~100 GB of
working space — enough for ROCm dev image (~16 GB), CUDA toolkit
(~5 GB), and the per-backend compile/install steps these entries do.

Backends migrated:
- -gpu-nvidia-cuda-12-llama-cpp
- -gpu-nvidia-cuda-12-turboquant
- -gpu-rocm-hipblas-faster-whisper
- -gpu-rocm-hipblas-coqui
- -cpu-ik-llama-cpp

After this commit, .github/backend-matrix.yml has zero bigger-runner
references. The bigger-runner used in tests-vibevoice-cpp-grpc-
transcription (test-extra.yml) is a separate concern handled in a
follow-up.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Intel oneAPI base image is ~6 GB; each backend's wheel install
stays well within the ~100 GB working space provided by Phase 3's
setup-build-disk /mnt relocation. Lowest-risk batch of the
arc-runner-set retirement.

Backends migrated:
  vllm, sglang, vibevoice, qwen-asr, nemo, qwen-tts,
  fish-speech, voxcpm, pocket-tts (all -gpu-intel-* variants).

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
ROCm dev image (~16 GB) plus per-backend torch/wheels install fits
on ubuntu-latest with the /mnt-relocated Docker root. These entries
include the heavier vLLM/sglang/transformers/diffusers stack on
ROCm; if any specific backend OOMs or runs out of disk, individual
flips back to arc-runner-set are revertable per-entry.

Backends migrated: all 15 -gpu-rocm-hipblas-* entries previously on
arc-runner-set (vllm/vllm-omni/sglang/transformers/diffusers/
ace-step/kokoro/vibevoice/qwen-asr/nemo/qwen-tts/fish-speech/
voxcpm/pocket-tts/neutts).

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
vLLM/sglang stacks on CUDA 12 and CUDA 13 are the heaviest
backends in the matrix — flash-attn intermediate layers can spike
disk usage during build. setup-build-disk's /mnt relocation gives
~100 GB working space which fits the documented peak.

Highest-risk batch of the arc-runner-set retirement; if any
backend fails to build on free tier, the per-entry runs-on flip
is the unit of revert.

Backends migrated: -gpu-nvidia-cuda-{12,13}-{vllm,vllm-omni,sglang}.

After this commit, .github/backend-matrix.yml has zero references
to arc-runner-set or bigger-runner. The migration is complete.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Root-caused on master via PR #9727's pilot: when docker/build-push-action@v7
pushes a single build to TWO registries simultaneously with
push-by-digest=true, buildx generates a per-registry provenance
attestation manifest (because mode=max — the default for push:true —
includes the runner ID). That makes the resulting manifest-list digest
diverge across registries:

  arm64 -cpu-faster-whisper build:
    image manifest:        sha256:d3bdd34b... (identical, content-only)
    quay manifest list:    sha256:66b4cfc8... (with quay attestation)
    dockerhub manifest list: sha256:e0733c3b... (with dockerhub attestation)

steps.build.outputs.digest returns only one of the list digests
(empirically the dockerhub one). The merge job then asks
"quay.io/...@sha256:e0733c3b..." which doesn't exist on quay — that
list has digest 66b4cfc8 there. Result: imagetools create fails with
"not found" and the merge job fails (run 25581983094, job 75110021491).

Setting provenance: false drops the per-registry attestation; the
manifest-list digest becomes pure content, identical across both
registries, and steps.build.outputs.digest works on either lookup.

Applied to backend_build.yml and image_build.yml — both refactored
to use the same multi-registry digest-push pattern in the prior PRs.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler merged commit f0374aa into master May 9, 2026
52 checks passed
@mudler mudler deleted the ci/finish-migration branch May 9, 2026 07:37
@localai-bot localai-bot added the enhancement New feature or request label May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants