ci: pilot per-arch split + manifest merge for faster-whisper and llama-cpp-quantization by localai-bot · Pull Request #9727 · mudler/LocalAI

localai-bot · 2026-05-08T21:58:29Z

Summary

Pilots Phase 2.3 + 2.4 of the CI migration plan: convert two backends from QEMU-emulated multi-arch to native per-arch + manifest-list merge. This validates the split-and-merge pattern end-to-end on real CI before fanning out to the other 34 multi-arch entries (Task 2.5, follow-up PR).

What changes

Two pilot backends split (.github/backend-matrix.yml):

-cpu-faster-whisper (small Python, fast baseline)
-cpu-llama-cpp-quantization (heavier compile, stress test)

For each: the single platforms: 'linux/amd64,linux/arm64' matrix entry is replaced with two per-arch entries — amd64 leg on ubuntu-latest, arm64 leg on ubuntu-24.04-arm (native, ~5–10× faster than emulated). Each new entry carries platform-tag: 'amd64' | 'arm64', which the previously-merged Phase 2.1 wires into backend_build.yml to scope the registry cache and the digest artifact name.

Merge-job infrastructure (reused by Task 2.5+):

.github/workflows/backend.yml and backend_pr.yml forward platform-tag from matrix to backend_build.yml.
A new backend-merge-jobs job in both workflows consumes a merge-matrix output from generate-matrix and calls the existing backend_merge.yml (already-shipped from PR ci: phase 1-3 of GHA free tier migration (path filter, multi-arch split prep, /mnt disk relief) #9726).
scripts/changed-backends.js gains a computeMergeMatrix(entries) helper that groups filtered linux entries by tag-suffix, emits an entry only for groups of size ≥ 2, and warns if tag-latest disagrees across legs (cheap insurance for the 34-backend fan-out coming next).
PR-side merge job is also event-gated on github.event_name != 'pull_request' so the no-op-on-PR run doesn't even start.

What's NOT in here (follow-ups)

Task 2.5: fan out the same shape to the other 34 multi-arch entries.
Task 2.6: same pattern for image.yml / image-pr.yml (3 multi-arch entries).
Phases 4–5: migrate bigger-runner and arc-runner-set jobs to free tier (depends on Phase 3 disk relief, already shipped).

Decisions worth flagging

Singletons not merged: backends with a single matrix entry (single-arch) push by digest only and don't need a manifest list. The computeMergeMatrix helper skips them.
tag-latest mismatch guard: cheap warning surfaced if the two legs disagree on tag-latest. Won't fire today (the two pilot legs both say 'auto'); future-proofs the 34-entry fan-out.
PR variant gating: the merge job's if: is at the job level (github.event_name != 'pull_request'), so the matrix doesn't even instantiate on PRs — saves a runner over relying on backend_merge.yml's internal step gates.

Test plan

On this PR, the backend_pr.yml generate-matrix job runs and emits an empty merge-matrix (no backend dir touched), so backend-merge-jobs is correctly skipped via has-merges == 'false'.
After merge, the first push that touches backend/python/faster-whisper/ schedules:
- 2 per-arch backend_build.yml jobs (amd64 + arm64 native), each pushing by digest under digests-cpu-faster-whisper-amd64 / -arm64.
- 1 backend-merge-jobs matrix entry for -cpu-faster-whisper that downloads both digests and runs docker buildx imagetools create to produce the final tagged manifest list.
docker buildx imagetools inspect quay.io/go-skynet/local-ai-backends:master-cpu-faster-whisper shows two platforms (linux/amd64, linux/arm64).
arm64 native build of faster-whisper finishes faster than the previous emulated multi-arch run (compare wall-clock from before/after).
Same checks for -cpu-llama-cpp-quantization (the heavier one).
Weekly Sunday cron (added in PR ci: phase 1-3 of GHA free tier migration (path filter, multi-arch split prep, /mnt disk relief) #9726) still rebuilds the full matrix and the merge-matrix correctly contains the two pilots.

Plan reference: docs/superpowers/plans/2026-05-08-ci-migration-to-gha-free-tier.md (uncommitted working artifact).

Assisted-by: Claude:claude-opus-4-7

Convert two backends from QEMU-emulated multi-arch (linux/amd64,linux/arm64 on a single ubuntu-latest) to native per-arch + manifest-list merge: - amd64 leg on ubuntu-latest - arm64 leg on ubuntu-24.04-arm (native, ~5-10x faster than emulated) - merge job assembles both digests under the final tag via docker buildx imagetools create Backends piloted: - -cpu-faster-whisper (small Python, fast baseline) - -cpu-llama-cpp-quantization (heavier compile path, stress test) Infrastructure changes that the rest of Phase 2 (Tasks 2.5+) will reuse: - .github/backend-matrix.yml entries gain a `platform-tag` field ('amd64'/'arm64') for matrix entries that participate in the split. Other entries omit it; backend_build.yml already defaults missing values to '' (empty cache key suffix preserved as cache<suffix>-). - backend.yml + backend_pr.yml forward `platform-tag` from matrix to the reusable backend_build.yml. - scripts/changed-backends.js groups filtered entries by tag-suffix and emits a `merge-matrix` (plus `has-merges`) for groups of size>=2. Singletons aren't merged. - backend.yml + backend_pr.yml gain a `backend-merge-jobs` job that consumes merge-matrix and calls backend_merge.yml after backend-jobs. PR variant is also event-gated so the no-op-on-PR merge job doesn't even start. The other 34 multi-arch entries are unchanged in this PR -- Task 2.5 fans out the same shape to them once the pilot is observed green. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler · 2026-05-08T22:04:37Z

can only test on master

The PR that introduced the per-arch + manifest-merge pilot (#9727) only touched CI infrastructure files, so the path filter correctly skipped backend builds on its merge commit. To observe the new backend-merge-jobs flow assemble a real manifest list, this commit touches faster-whisper's Makefile so its two new per-arch entries schedule and the merge job runs. The trailing comment is the smallest possible diff and is harmless to the build. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

@sha256

…etire self-hosted, fix provenance) (#9730) * ci: add per-arch + manifest-merge support for LocalAI server image Mirror the backend_build.yml + backend_merge.yml pattern shipped in PR #9726 for the LocalAI server image: - image_build.yml accepts optional platform-tag (default ''), scopes registry cache to cache-localai<suffix>-<platform-tag>, and pushes by canonical digest only on push events. Digests upload as artifacts named digests-localai<suffix>-<platform-tag>, with a "-core" placeholder when tag-suffix is empty so the merge job's download pattern doesn't over-match across multiple suffixes. - image_merge.yml is a new reusable workflow that downloads matching digest artifacts and assembles the final tagged manifest list via docker buildx imagetools create. Image names differ from backend_*.yml: the LocalAI server is published under quay.io/go-skynet/local-ai and localai/localai (not -backends). Not yet wired into image.yml / image-pr.yml — Commit C does that. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: fan out per-arch split to remaining 34 backends Convert all remaining linux/amd64,linux/arm64 entries in backend-matrix.yml to per-arch + manifest-merge form. Each was a single matrix entry running both arches on x86 under QEMU emulation; each becomes two entries — amd64 on ubuntu-latest, arm64 on ubuntu-24.04-arm (native). Four backends that were on bigger-runner (-cpu-llama-cpp, -cpu-turboquant, -gpu-vulkan-llama-cpp, -gpu-vulkan-turboquant) have both legs moved to free tier as part of the same change. They are compile-only (no torch/CUDA install) and fit comfortably with the setup-build-disk /mnt relocation. Phase 4 (next commit) retires the remaining 5 single-arch bigger-runner entries. After this commit: - 271 total matrix entries (was 237) - 0 multi-arch entries left - 36 per-arch pairs (34 new + 2 pilots from PR #9727) - 5 bigger-runner entries remaining (single-arch, Phase 4 target) Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: split LocalAI image multi-arch entries per arch + merge Mirror the backend per-arch split for the main LocalAI image: - image.yml's core-image-build matrix: split the core ('') and -gpu-vulkan entries into amd64 + arm64 legs each. amd64 on ubuntu-latest, arm64 on ubuntu-24.04-arm (native). - New top-level core-image-merge and gpu-vulkan-image-merge jobs call image_merge.yml after core-image-build completes. - image-pr.yml's image-build matrix: split the -vulkan-core entry. No merge job added on the PR side — image_build.yml's digest-push is push-only-event-gated, so a PR-side merge would have nothing to download. After this commit, no workflow file references linux/amd64,linux/arm64 in a single matrix slot. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: retire bigger-runner from backend matrix (Phase 4) Migrate the remaining 5 single-arch bigger-runner entries to ubuntu-latest. Combined with the Phase 3 setup-build-disk /mnt relocation (PR #9726), free-tier ubuntu-latest now has ~100 GB of working space — enough for ROCm dev image (~16 GB), CUDA toolkit (~5 GB), and the per-backend compile/install steps these entries do. Backends migrated: - -gpu-nvidia-cuda-12-llama-cpp - -gpu-nvidia-cuda-12-turboquant - -gpu-rocm-hipblas-faster-whisper - -gpu-rocm-hipblas-coqui - -cpu-ik-llama-cpp After this commit, .github/backend-matrix.yml has zero bigger-runner references. The bigger-runner used in tests-vibevoice-cpp-grpc- transcription (test-extra.yml) is a separate concern handled in a follow-up. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: migrate 9 Intel oneAPI backends to free tier (Phase 5.1) Intel oneAPI base image is ~6 GB; each backend's wheel install stays well within the ~100 GB working space provided by Phase 3's setup-build-disk /mnt relocation. Lowest-risk batch of the arc-runner-set retirement. Backends migrated: vllm, sglang, vibevoice, qwen-asr, nemo, qwen-tts, fish-speech, voxcpm, pocket-tts (all -gpu-intel-* variants). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: migrate 15 ROCm Python backends to free tier (Phase 5.2) ROCm dev image (~16 GB) plus per-backend torch/wheels install fits on ubuntu-latest with the /mnt-relocated Docker root. These entries include the heavier vLLM/sglang/transformers/diffusers stack on ROCm; if any specific backend OOMs or runs out of disk, individual flips back to arc-runner-set are revertable per-entry. Backends migrated: all 15 -gpu-rocm-hipblas-* entries previously on arc-runner-set (vllm/vllm-omni/sglang/transformers/diffusers/ ace-step/kokoro/vibevoice/qwen-asr/nemo/qwen-tts/fish-speech/ voxcpm/pocket-tts/neutts). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: migrate 6 CUDA Python backends to free tier (Phase 5.3) vLLM/sglang stacks on CUDA 12 and CUDA 13 are the heaviest backends in the matrix — flash-attn intermediate layers can spike disk usage during build. setup-build-disk's /mnt relocation gives ~100 GB working space which fits the documented peak. Highest-risk batch of the arc-runner-set retirement; if any backend fails to build on free tier, the per-entry runs-on flip is the unit of revert. Backends migrated: -gpu-nvidia-cuda-{12,13}-{vllm,vllm-omni,sglang}. After this commit, .github/backend-matrix.yml has zero references to arc-runner-set or bigger-runner. The migration is complete. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: disable provenance on multi-registry digest pushes Root-caused on master via PR #9727's pilot: when docker/build-push-action@v7 pushes a single build to TWO registries simultaneously with push-by-digest=true, buildx generates a per-registry provenance attestation manifest (because mode=max — the default for push:true — includes the runner ID). That makes the resulting manifest-list digest diverge across registries: arm64 -cpu-faster-whisper build: image manifest: sha256:d3bdd34b... (identical, content-only) quay manifest list: sha256:66b4cfc8... (with quay attestation) dockerhub manifest list: sha256:e0733c3b... (with dockerhub attestation) steps.build.outputs.digest returns only one of the list digests (empirically the dockerhub one). The merge job then asks "quay.io/...@sha256:e0733c3b..." which doesn't exist on quay — that list has digest 66b4cfc8 there. Result: imagetools create fails with "not found" and the merge job fails (run 25581983094, job 75110021491). Setting provenance: false drops the per-registry attestation; the manifest-list digest becomes pure content, identical across both registries, and steps.build.outputs.digest works on either lookup. Applied to backend_build.yml and image_build.yml — both refactored to use the same multi-registry digest-push pattern in the prior PRs. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

mudler merged commit cb68cd1 into master May 8, 2026
51 checks passed

mudler deleted the ci/per-arch-split-pilot branch May 8, 2026 22:04

localai-bot mentioned this pull request May 9, 2026

ci: finish GHA free-tier migration (per-arch fan-out, image splits, retire self-hosted, fix provenance) #9730

Merged

5 tasks

localai-bot added the enhancement New feature or request label May 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: pilot per-arch split + manifest merge for faster-whisper and llama-cpp-quantization#9727

ci: pilot per-arch split + manifest merge for faster-whisper and llama-cpp-quantization#9727
mudler merged 1 commit intomasterfrom
ci/per-arch-split-pilot

localai-bot commented May 8, 2026

Uh oh!

mudler commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented May 8, 2026

Summary

What changes

What's NOT in here (follow-ups)

Decisions worth flagging

Test plan

Uh oh!

mudler commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants