ci: pilot per-arch split + manifest merge for faster-whisper and llama-cpp-quantization#9727
Merged
ci: pilot per-arch split + manifest merge for faster-whisper and llama-cpp-quantization#9727
Conversation
Convert two backends from QEMU-emulated multi-arch (linux/amd64,linux/arm64
on a single ubuntu-latest) to native per-arch + manifest-list merge:
- amd64 leg on ubuntu-latest
- arm64 leg on ubuntu-24.04-arm (native, ~5-10x faster than emulated)
- merge job assembles both digests under the final tag via
docker buildx imagetools create
Backends piloted:
- -cpu-faster-whisper (small Python, fast baseline)
- -cpu-llama-cpp-quantization (heavier compile path, stress test)
Infrastructure changes that the rest of Phase 2 (Tasks 2.5+) will reuse:
- .github/backend-matrix.yml entries gain a `platform-tag` field
('amd64'/'arm64') for matrix entries that participate in the split.
Other entries omit it; backend_build.yml already defaults missing
values to '' (empty cache key suffix preserved as cache<suffix>-).
- backend.yml + backend_pr.yml forward `platform-tag` from matrix to
the reusable backend_build.yml.
- scripts/changed-backends.js groups filtered entries by tag-suffix
and emits a `merge-matrix` (plus `has-merges`) for groups of size>=2.
Singletons aren't merged.
- backend.yml + backend_pr.yml gain a `backend-merge-jobs` job that
consumes merge-matrix and calls backend_merge.yml after backend-jobs.
PR variant is also event-gated so the no-op-on-PR merge job doesn't
even start.
The other 34 multi-arch entries are unchanged in this PR -- Task 2.5
fans out the same shape to them once the pilot is observed green.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Owner
|
can only test on master |
mudler
added a commit
that referenced
this pull request
May 8, 2026
The PR that introduced the per-arch + manifest-merge pilot (#9727) only touched CI infrastructure files, so the path filter correctly skipped backend builds on its merge commit. To observe the new backend-merge-jobs flow assemble a real manifest list, this commit touches faster-whisper's Makefile so its two new per-arch entries schedule and the merge job runs. The trailing comment is the smallest possible diff and is harmless to the build. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
5 tasks
mudler
added a commit
that referenced
this pull request
May 9, 2026
…etire self-hosted, fix provenance) (#9730) * ci: add per-arch + manifest-merge support for LocalAI server image Mirror the backend_build.yml + backend_merge.yml pattern shipped in PR #9726 for the LocalAI server image: - image_build.yml accepts optional platform-tag (default ''), scopes registry cache to cache-localai<suffix>-<platform-tag>, and pushes by canonical digest only on push events. Digests upload as artifacts named digests-localai<suffix>-<platform-tag>, with a "-core" placeholder when tag-suffix is empty so the merge job's download pattern doesn't over-match across multiple suffixes. - image_merge.yml is a new reusable workflow that downloads matching digest artifacts and assembles the final tagged manifest list via docker buildx imagetools create. Image names differ from backend_*.yml: the LocalAI server is published under quay.io/go-skynet/local-ai and localai/localai (not -backends). Not yet wired into image.yml / image-pr.yml — Commit C does that. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: fan out per-arch split to remaining 34 backends Convert all remaining linux/amd64,linux/arm64 entries in backend-matrix.yml to per-arch + manifest-merge form. Each was a single matrix entry running both arches on x86 under QEMU emulation; each becomes two entries — amd64 on ubuntu-latest, arm64 on ubuntu-24.04-arm (native). Four backends that were on bigger-runner (-cpu-llama-cpp, -cpu-turboquant, -gpu-vulkan-llama-cpp, -gpu-vulkan-turboquant) have both legs moved to free tier as part of the same change. They are compile-only (no torch/CUDA install) and fit comfortably with the setup-build-disk /mnt relocation. Phase 4 (next commit) retires the remaining 5 single-arch bigger-runner entries. After this commit: - 271 total matrix entries (was 237) - 0 multi-arch entries left - 36 per-arch pairs (34 new + 2 pilots from PR #9727) - 5 bigger-runner entries remaining (single-arch, Phase 4 target) Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: split LocalAI image multi-arch entries per arch + merge Mirror the backend per-arch split for the main LocalAI image: - image.yml's core-image-build matrix: split the core ('') and -gpu-vulkan entries into amd64 + arm64 legs each. amd64 on ubuntu-latest, arm64 on ubuntu-24.04-arm (native). - New top-level core-image-merge and gpu-vulkan-image-merge jobs call image_merge.yml after core-image-build completes. - image-pr.yml's image-build matrix: split the -vulkan-core entry. No merge job added on the PR side — image_build.yml's digest-push is push-only-event-gated, so a PR-side merge would have nothing to download. After this commit, no workflow file references linux/amd64,linux/arm64 in a single matrix slot. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: retire bigger-runner from backend matrix (Phase 4) Migrate the remaining 5 single-arch bigger-runner entries to ubuntu-latest. Combined with the Phase 3 setup-build-disk /mnt relocation (PR #9726), free-tier ubuntu-latest now has ~100 GB of working space — enough for ROCm dev image (~16 GB), CUDA toolkit (~5 GB), and the per-backend compile/install steps these entries do. Backends migrated: - -gpu-nvidia-cuda-12-llama-cpp - -gpu-nvidia-cuda-12-turboquant - -gpu-rocm-hipblas-faster-whisper - -gpu-rocm-hipblas-coqui - -cpu-ik-llama-cpp After this commit, .github/backend-matrix.yml has zero bigger-runner references. The bigger-runner used in tests-vibevoice-cpp-grpc- transcription (test-extra.yml) is a separate concern handled in a follow-up. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: migrate 9 Intel oneAPI backends to free tier (Phase 5.1) Intel oneAPI base image is ~6 GB; each backend's wheel install stays well within the ~100 GB working space provided by Phase 3's setup-build-disk /mnt relocation. Lowest-risk batch of the arc-runner-set retirement. Backends migrated: vllm, sglang, vibevoice, qwen-asr, nemo, qwen-tts, fish-speech, voxcpm, pocket-tts (all -gpu-intel-* variants). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: migrate 15 ROCm Python backends to free tier (Phase 5.2) ROCm dev image (~16 GB) plus per-backend torch/wheels install fits on ubuntu-latest with the /mnt-relocated Docker root. These entries include the heavier vLLM/sglang/transformers/diffusers stack on ROCm; if any specific backend OOMs or runs out of disk, individual flips back to arc-runner-set are revertable per-entry. Backends migrated: all 15 -gpu-rocm-hipblas-* entries previously on arc-runner-set (vllm/vllm-omni/sglang/transformers/diffusers/ ace-step/kokoro/vibevoice/qwen-asr/nemo/qwen-tts/fish-speech/ voxcpm/pocket-tts/neutts). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: migrate 6 CUDA Python backends to free tier (Phase 5.3) vLLM/sglang stacks on CUDA 12 and CUDA 13 are the heaviest backends in the matrix — flash-attn intermediate layers can spike disk usage during build. setup-build-disk's /mnt relocation gives ~100 GB working space which fits the documented peak. Highest-risk batch of the arc-runner-set retirement; if any backend fails to build on free tier, the per-entry runs-on flip is the unit of revert. Backends migrated: -gpu-nvidia-cuda-{12,13}-{vllm,vllm-omni,sglang}. After this commit, .github/backend-matrix.yml has zero references to arc-runner-set or bigger-runner. The migration is complete. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: disable provenance on multi-registry digest pushes Root-caused on master via PR #9727's pilot: when docker/build-push-action@v7 pushes a single build to TWO registries simultaneously with push-by-digest=true, buildx generates a per-registry provenance attestation manifest (because mode=max — the default for push:true — includes the runner ID). That makes the resulting manifest-list digest diverge across registries: arm64 -cpu-faster-whisper build: image manifest: sha256:d3bdd34b... (identical, content-only) quay manifest list: sha256:66b4cfc8... (with quay attestation) dockerhub manifest list: sha256:e0733c3b... (with dockerhub attestation) steps.build.outputs.digest returns only one of the list digests (empirically the dockerhub one). The merge job then asks "quay.io/...@sha256:e0733c3b..." which doesn't exist on quay — that list has digest 66b4cfc8 there. Result: imagetools create fails with "not found" and the merge job fails (run 25581983094, job 75110021491). Setting provenance: false drops the per-registry attestation; the manifest-list digest becomes pure content, identical across both registries, and steps.build.outputs.digest works on either lookup. Applied to backend_build.yml and image_build.yml — both refactored to use the same multi-registry digest-push pattern in the prior PRs. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Pilots Phase 2.3 + 2.4 of the CI migration plan: convert two backends from QEMU-emulated multi-arch to native per-arch + manifest-list merge. This validates the split-and-merge pattern end-to-end on real CI before fanning out to the other 34 multi-arch entries (Task 2.5, follow-up PR).
What changes
Two pilot backends split (
.github/backend-matrix.yml):-cpu-faster-whisper(small Python, fast baseline)-cpu-llama-cpp-quantization(heavier compile, stress test)For each: the single
platforms: 'linux/amd64,linux/arm64'matrix entry is replaced with two per-arch entries — amd64 leg onubuntu-latest, arm64 leg onubuntu-24.04-arm(native, ~5–10× faster than emulated). Each new entry carriesplatform-tag: 'amd64' | 'arm64', which the previously-merged Phase 2.1 wires intobackend_build.ymlto scope the registry cache and the digest artifact name.Merge-job infrastructure (reused by Task 2.5+):
.github/workflows/backend.ymlandbackend_pr.ymlforwardplatform-tagfrom matrix tobackend_build.yml.backend-merge-jobsjob in both workflows consumes amerge-matrixoutput fromgenerate-matrixand calls the existingbackend_merge.yml(already-shipped from PR ci: phase 1-3 of GHA free tier migration (path filter, multi-arch split prep, /mnt disk relief) #9726).scripts/changed-backends.jsgains acomputeMergeMatrix(entries)helper that groups filtered linux entries bytag-suffix, emits an entry only for groups of size ≥ 2, and warns iftag-latestdisagrees across legs (cheap insurance for the 34-backend fan-out coming next).github.event_name != 'pull_request'so the no-op-on-PR run doesn't even start.What's NOT in here (follow-ups)
image.yml/image-pr.yml(3 multi-arch entries).bigger-runnerandarc-runner-setjobs to free tier (depends on Phase 3 disk relief, already shipped).Decisions worth flagging
computeMergeMatrixhelper skips them.tag-latestmismatch guard: cheap warning surfaced if the two legs disagree ontag-latest. Won't fire today (the two pilot legs both say'auto'); future-proofs the 34-entry fan-out.if:is at the job level (github.event_name != 'pull_request'), so the matrix doesn't even instantiate on PRs — saves a runner over relying onbackend_merge.yml's internal step gates.Test plan
backend_pr.ymlgenerate-matrixjob runs and emits an emptymerge-matrix(no backend dir touched), sobackend-merge-jobsis correctly skipped viahas-merges == 'false'.backend/python/faster-whisper/schedules:backend_build.ymljobs (amd64 + arm64 native), each pushing by digest underdigests-cpu-faster-whisper-amd64/-arm64.backend-merge-jobsmatrix entry for-cpu-faster-whisperthat downloads both digests and runsdocker buildx imagetools createto produce the final tagged manifest list.docker buildx imagetools inspect quay.io/go-skynet/local-ai-backends:master-cpu-faster-whispershows two platforms (linux/amd64,linux/arm64).faster-whisperfinishes faster than the previous emulated multi-arch run (compare wall-clock from before/after).-cpu-llama-cpp-quantization(the heavier one).Plan reference:
docs/superpowers/plans/2026-05-08-ci-migration-to-gha-free-tier.md(uncommitted working artifact).Assisted-by: Claude:claude-opus-4-7