Conversation
Consolidate the apt-clean + dotnet/android/ghc/boost removal blocks from backend_build.yml, image_build.yml, and test.yml into a single composite action. The three callers had slightly different inline blocks; the composite uses the more aggressive backend_build/image_build variant for all three callers — test.yml jobs now also purge snapd, edge/firefox/ powershell/r-base-core, and sweep /opt/ghc + /usr/local/share/boost + $AGENT_TOOLSDIRECTORY. Idempotent and skipped on self-hosted runners. In test.yml, actions/checkout now runs before the composite action call because the composite lives at ./.github/actions/free-disk-space and requires a checked-out repo. The original ordering relied on jlumbroso/free-disk-space@main being a remote action; this is the minimum-invasive change to support a local composite. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Run scripts/changed-backends.js on master pushes too (not just PRs) so unrelated commits don't rebuild all ~210 backend container images. Tag pushes still build the full matrix via FORCE_ALL. Push events use the GitHub Compare API to diff event.before..event.after. Edge cases (first push with zero base, API truncation beyond 300 files, missing fields, network failure) fall back to "run everything" — better safe than silently miss a backend. The matrix literal moves from .github/workflows/backend.yml into a new data-only file at .github/backend-matrix.yml (outside workflows/ so actionlint doesn't try to parse it as a workflow). Both backend.yml and backend_pr.yml now consume the dynamic matrix output uniformly via fromJson(needs.generate-matrix.outputs.matrix); the script reads the matrix from the new location. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Cap to 8 concurrent jobs to avoid queue starvation on the shared GHA free pool while migration is in flight. Lift after Phases 4-5 retire the self-hosted runners. Also drops a leftover commented-out max-parallel line that lived in backend.yml since the previous matrix shape. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Prepare backend_build.yml for the multi-arch split. The reusable
workflow now accepts a `platform-tag` input ("amd64" / "arm64") that
scopes the registry cache to cache<suffix>-<platform-tag> and (on push
events) pushes the resulting image by canonical digest only. Digests
are uploaded as artifacts named digests<suffix>-<platform-tag> for the
merge job (Task 2.2) to consume.
`platform-tag` is optional with empty default during the migration —
existing callers continue to work unchanged (their cache key just
becomes `cache<suffix>-`, an orphaned but valid key). Tasks 2.3+ will
update callers to pass an explicit "amd64" / "arm64" value. Phase 6
flips the input to required: true once every caller is wired.
PR builds keep their existing tag-based push to ci-tests but pick up
the per-arch cache key. Multi-arch PR builds remain emulated in this
commit; they migrate when the matrix entries split (Tasks 2.3+).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Joins per-arch digest artifacts (uploaded by backend_build.yml when called with platform-tag) into a single tagged multi-arch manifest list via `docker buildx imagetools create`. Called once per backend by backend.yml after both per-arch build jobs succeed. The workflow generates final tags identically to the previous monolithic build job (same docker/metadata-action invocation), so consumers of quay.io/go-skynet/local-ai-backends and localai/localai-backends see no tag-shape change. Two imagetools calls (one per registry) reference the same per-arch digests under different image names. Not yet wired into backend.yml — Tasks 2.3+ rewrite individual matrix entries to expand into per-arch + merge jobs that call this workflow. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
GHA hosted ubuntu-latest runners ship a ~75 GB /mnt drive that's unused by default. Stopping Docker, rsync'ing /var/lib/docker to /mnt, and restarting with data-root pointing there yields ~100 GB of working space (combined with the apt-clean from Task 1.1) — enough for ROCm dev image + vLLM torch install + flash-attn intermediate layers. This is the structural change that lets Phases 4 and 5 of the migration plan move the bigger-runner and arc-runner-set jobs onto ubuntu-latest. The composite action is no-op on self-hosted runners (where /mnt isn't expected) and on non-X64 runners (Task 3.2 verifies the arm64 hosted pool's /mnt shape separately before enabling). Wired into both backend_build.yml and image_build.yml between free-disk-space and the first Docker operation. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
buildx CLI runs as the unprivileged 'runner' user and creates config dirs under TMPDIR before binding them into the buildkit container. /mnt is root-owned by default, so the original mkdir produced a permission-denied when buildx tried to write there: ERROR: mkdir /mnt/docker-tmp/buildkitd-config2740457204: permission denied Mirror /tmp's permission mode (1777 — world-writable with sticky bit) on /mnt/docker-tmp so non-root processes can stage their config. Caught by the first PR run (image-build hipblas job) on PR #9726. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Path-filtering backend.yml master push (the previous commit's main optimization) skips backends whose source didn't change. That broke the DEPS_REFRESH cache-buster's coverage: the build-arg keyed on %Y-W%V busts the install layer's cache on a new ISO week, but only when the build actually runs. Untouched Python backends (torch, transformers, vllm with no version pin) would otherwise ship stale wheels indefinitely. Add a Sunday 06:00 UTC cron that fires the full matrix. Schedule events have no event.ref / event.before, so the script's changedFiles == null fallback (scripts/changed-backends.js) emits the full matrix automatically — no script change needed. C++/Go backends with pinned deps cache-hit and complete fast, so the weekly cost is dominated by Python re-resolves which is exactly what we want. workflow_dispatch added so a maintainer can trigger an ad-hoc full-matrix rebuild without faking a tag push. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Merged
6 tasks
mudler
added a commit
that referenced
this pull request
May 9, 2026
…etire self-hosted, fix provenance) (#9730) * ci: add per-arch + manifest-merge support for LocalAI server image Mirror the backend_build.yml + backend_merge.yml pattern shipped in PR #9726 for the LocalAI server image: - image_build.yml accepts optional platform-tag (default ''), scopes registry cache to cache-localai<suffix>-<platform-tag>, and pushes by canonical digest only on push events. Digests upload as artifacts named digests-localai<suffix>-<platform-tag>, with a "-core" placeholder when tag-suffix is empty so the merge job's download pattern doesn't over-match across multiple suffixes. - image_merge.yml is a new reusable workflow that downloads matching digest artifacts and assembles the final tagged manifest list via docker buildx imagetools create. Image names differ from backend_*.yml: the LocalAI server is published under quay.io/go-skynet/local-ai and localai/localai (not -backends). Not yet wired into image.yml / image-pr.yml — Commit C does that. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: fan out per-arch split to remaining 34 backends Convert all remaining linux/amd64,linux/arm64 entries in backend-matrix.yml to per-arch + manifest-merge form. Each was a single matrix entry running both arches on x86 under QEMU emulation; each becomes two entries — amd64 on ubuntu-latest, arm64 on ubuntu-24.04-arm (native). Four backends that were on bigger-runner (-cpu-llama-cpp, -cpu-turboquant, -gpu-vulkan-llama-cpp, -gpu-vulkan-turboquant) have both legs moved to free tier as part of the same change. They are compile-only (no torch/CUDA install) and fit comfortably with the setup-build-disk /mnt relocation. Phase 4 (next commit) retires the remaining 5 single-arch bigger-runner entries. After this commit: - 271 total matrix entries (was 237) - 0 multi-arch entries left - 36 per-arch pairs (34 new + 2 pilots from PR #9727) - 5 bigger-runner entries remaining (single-arch, Phase 4 target) Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: split LocalAI image multi-arch entries per arch + merge Mirror the backend per-arch split for the main LocalAI image: - image.yml's core-image-build matrix: split the core ('') and -gpu-vulkan entries into amd64 + arm64 legs each. amd64 on ubuntu-latest, arm64 on ubuntu-24.04-arm (native). - New top-level core-image-merge and gpu-vulkan-image-merge jobs call image_merge.yml after core-image-build completes. - image-pr.yml's image-build matrix: split the -vulkan-core entry. No merge job added on the PR side — image_build.yml's digest-push is push-only-event-gated, so a PR-side merge would have nothing to download. After this commit, no workflow file references linux/amd64,linux/arm64 in a single matrix slot. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: retire bigger-runner from backend matrix (Phase 4) Migrate the remaining 5 single-arch bigger-runner entries to ubuntu-latest. Combined with the Phase 3 setup-build-disk /mnt relocation (PR #9726), free-tier ubuntu-latest now has ~100 GB of working space — enough for ROCm dev image (~16 GB), CUDA toolkit (~5 GB), and the per-backend compile/install steps these entries do. Backends migrated: - -gpu-nvidia-cuda-12-llama-cpp - -gpu-nvidia-cuda-12-turboquant - -gpu-rocm-hipblas-faster-whisper - -gpu-rocm-hipblas-coqui - -cpu-ik-llama-cpp After this commit, .github/backend-matrix.yml has zero bigger-runner references. The bigger-runner used in tests-vibevoice-cpp-grpc- transcription (test-extra.yml) is a separate concern handled in a follow-up. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: migrate 9 Intel oneAPI backends to free tier (Phase 5.1) Intel oneAPI base image is ~6 GB; each backend's wheel install stays well within the ~100 GB working space provided by Phase 3's setup-build-disk /mnt relocation. Lowest-risk batch of the arc-runner-set retirement. Backends migrated: vllm, sglang, vibevoice, qwen-asr, nemo, qwen-tts, fish-speech, voxcpm, pocket-tts (all -gpu-intel-* variants). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: migrate 15 ROCm Python backends to free tier (Phase 5.2) ROCm dev image (~16 GB) plus per-backend torch/wheels install fits on ubuntu-latest with the /mnt-relocated Docker root. These entries include the heavier vLLM/sglang/transformers/diffusers stack on ROCm; if any specific backend OOMs or runs out of disk, individual flips back to arc-runner-set are revertable per-entry. Backends migrated: all 15 -gpu-rocm-hipblas-* entries previously on arc-runner-set (vllm/vllm-omni/sglang/transformers/diffusers/ ace-step/kokoro/vibevoice/qwen-asr/nemo/qwen-tts/fish-speech/ voxcpm/pocket-tts/neutts). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: migrate 6 CUDA Python backends to free tier (Phase 5.3) vLLM/sglang stacks on CUDA 12 and CUDA 13 are the heaviest backends in the matrix — flash-attn intermediate layers can spike disk usage during build. setup-build-disk's /mnt relocation gives ~100 GB working space which fits the documented peak. Highest-risk batch of the arc-runner-set retirement; if any backend fails to build on free tier, the per-entry runs-on flip is the unit of revert. Backends migrated: -gpu-nvidia-cuda-{12,13}-{vllm,vllm-omni,sglang}. After this commit, .github/backend-matrix.yml has zero references to arc-runner-set or bigger-runner. The migration is complete. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: disable provenance on multi-registry digest pushes Root-caused on master via PR #9727's pilot: when docker/build-push-action@v7 pushes a single build to TWO registries simultaneously with push-by-digest=true, buildx generates a per-registry provenance attestation manifest (because mode=max — the default for push:true — includes the runner ID). That makes the resulting manifest-list digest diverge across registries: arm64 -cpu-faster-whisper build: image manifest: sha256:d3bdd34b... (identical, content-only) quay manifest list: sha256:66b4cfc8... (with quay attestation) dockerhub manifest list: sha256:e0733c3b... (with dockerhub attestation) steps.build.outputs.digest returns only one of the list digests (empirically the dockerhub one). The merge job then asks "quay.io/...@sha256:e0733c3b..." which doesn't exist on quay — that list has digest 66b4cfc8 there. Result: imagetools create fails with "not found" and the merge job fails (run 25581983094, job 75110021491). Setting provenance: false drops the per-registry attestation; the manifest-list digest becomes pure content, identical across both registries, and steps.build.outputs.digest works on either lookup. Applied to backend_build.yml and image_build.yml — both refactored to use the same multi-registry digest-push pattern in the prior PRs. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR ships Phases 1–3 of the CI migration plan to retire
arc-runner-setandbigger-runnerin favor of GHA free tier (ubuntu-latest+ubuntu-24.04-arm). Self-hosted runners stay in use until Phases 4–5 land in follow-ups; this PR is the foundation those depend on.What's in here
backend.ymlmaster push —scripts/changed-backends.jsnow handles push events via the GitHub Compare API. Master pushes only rebuild backends whose source touched. Tag pushes still rebuild everything viaFORCE_ALL. The matrix literal moves frombackend.ymlto.github/backend-matrix.yml(data-only) so bothbackend.ymlandbackend_pr.ymluse the dynamic${{ fromJson(...) }}form.max-parallel: 8cap on backend matrices (lift after Phases 4–5).free-disk-spacecomposite action (.github/actions/free-disk-space/) consolidates the duplicated apt/dotnet/android/ghc cleanup blocks frombackend_build.yml,image_build.yml, andtest.yml.test.ymlnow runs the superset cleanup that previously only ran in the build workflows.backend_build.ymlper-arch refactor — accepts a new optionalplatform-taginput that scopes the registry cache (cache<suffix>-<platform-tag>). On push events the build pushes by canonical digest only and uploads the digest as an artifact nameddigests<suffix>-<platform-tag>for the merge job to consume.platform-tagis optional with empty default during migration to keep CI green per-task; it flips torequired: truein Phase 6 once all callers wire it up.backend_merge.ymlnew reusable workflow that downloads per-arch digest artifacts and runsdocker buildx imagetools createto assemble the final tagged manifest list. Tag generation matches the previous monolithic build job exactly, so consumers see no tag-shape change. Not yet wired in — Phase 2 follow-ups expandbackend.ymlmatrix entries to call it.setup-build-diskcomposite action (.github/actions/setup-build-disk/) relocates Docker's data-root to/mnt(~75 GB free, vs ~20 GB on/after the apt cleanup). Wired intobackend_build.ymlandimage_build.yml. No-op on self-hosted and on non-X64 hosted runners. This is the structural change that lets Phases 4–5 move the heavy CUDA/ROCm/vLLM jobs ontoubuntu-latest.What's NOT in here (follow-ups)
backend.yml(and 3 inimage*.yml) into amd64-leg + arm64-leg + merge-job triples. Needs CI observation per pilot before fan-out.bigger-runnerjobs toubuntu-latest.arc-runner-setjobs toubuntu-latest.arc-runner-set/bigger-runnerlabels). Skipped here — would break CI today since those labels still appear in.github/backend-matrix.yml.Plan reference:
docs/superpowers/plans/2026-05-08-ci-migration-to-gha-free-tier.md(uncommitted working artifact).Test plan
backend_pr.ymlexercises the newchanged-backends.jspush/PR/tag dispatch — confirm only changed backends scheduled when the PR touches one backend dirgenerate-matrixfilters correctly; an unrelated commit should schedule no backend rebuildsFORCE_ALL=truerebuilds the whole matrixtest.ymltests-linuxjob runs green with the consolidated cleanupfree-disk-spaceaction emits sanedf -hnumbers in workflow logssetup-build-diskaction successfully relocates Docker root on a representativeubuntu-latestjob;df -h /mntshows ≥ 80 GB free post-setup;docker inforeports/mnt/docker-dataas data rootbackend_build.ymlpush step uploads adigests<suffix>-artifact containing the canonical digest (verified once Phase 2.3+ wires a pilot)ubuntu-24.04-armL4T jobs (platform-tagoptional default keeps their cache key ascache<suffix>-, an orphan but valid key)Assisted-by: Claude:claude-opus-4-7