ci: refactor llama-cpp variant Dockerfiles to consume prebuilt base-grpc images (PR 2/2) by localai-bot · Pull Request #9738 · mudler/LocalAI

localai-bot · 2026-05-09T20:00:26Z

Summary

PR 2 of 2. Refactors Dockerfile.llama-cpp, Dockerfile.ik-llama-cpp, Dockerfile.turboquant to FROM the prebuilt base-grpc-* images shipped in PR #9737. Drops ~250 lines from each variant Dockerfile (gRPC stage + apt deps + CUDA/ROCm/Vulkan toolchain installs all live in the base now). On a LLAMA_VERSION bump, the gRPC compile (~25–35 min cold) is skipped — the variant build only re-compiles llama.cpp itself.

Commits

A e61ddedf — backend_build.yml adds optional builder-base-image input + BUILDER_BASE_IMAGE build-arg. Backward-compatible.
B f2867878 — refactor 3 variant Dockerfiles to FROM ${BUILDER_BASE_IMAGE}. Each shrinks ~300 → ~70 lines.
C 6142f986 — backend.yml + backend_pr.yml forward builder-base-image from matrix.
D 5a2e7115 — matrix entries: 21 of 23 llama-cpp-derived entries gain builder-base-image.
E 79c7c7b3 — adds the missing base-grpc-l4t-cuda-12-arm64 variant to base-images.yml matrix and maps the 2 legacy JetPack entries (-nvidia-l4t-arm64-llama-cpp, -nvidia-l4t-arm64-turboquant) to it.

After all 5 commits: 23 of 23 llama-cpp-derived matrix entries are mapped.

What changes

Variant Dockerfile	Before	After
`Dockerfile.llama-cpp`	320 lines (gRPC stage + apt + CUDA/ROCm/Vulkan installs + compile)	80 lines (FROM prebuilt + compile)
`Dockerfile.ik-llama-cpp`	305 lines	69 lines
`Dockerfile.turboquant`	317 lines	77 lines

The compile step + ccache mount + package step are unchanged.

Mapping

`(build-type, platforms)`	`builder-base-image`
`('', linux/amd64)`	`:base-grpc-amd64`
`('', linux/arm64)`	`:base-grpc-arm64`
`('cublas' cuda 12, linux/amd64)`	`:base-grpc-cuda-12-amd64`
`('cublas' cuda 13, linux/amd64)`	`:base-grpc-cuda-13-amd64`
`('cublas' cuda 13, linux/arm64)`	`:base-grpc-cuda-13-arm64`
`('hipblas', linux/amd64)`	`:base-grpc-rocm-amd64`
`('vulkan', linux/amd64)`	`:base-grpc-vulkan-amd64`
`('vulkan', linux/arm64)`	`:base-grpc-vulkan-arm64`
`('sycl_*', linux/amd64)`	`:base-grpc-intel-amd64`
`('cublas' cuda 12 + JetPack base, linux/arm64)`	`:base-grpc-l4t-cuda-12-arm64` (new)

Bootstrap before merging

PR 2's matrix references base-grpc-l4t-cuda-12-arm64 which doesn't exist on quay yet (PR 1 only built 9 variants; this PR adds the 10th). The branch's workflow_dispatch of base-images.yml has been triggered — see run 25610460480.

The 9 existing variants will hit BuildKit cache and finish in ~5 min; the new l4t-cuda-12-arm64 variant builds cold (~30 min). Merge after that run completes successfully.

Test plan

Bootstrap dispatch run on the branch completes — all 10 base-grpc-* tags are pushed to quay.
Merge PR.
Next master push that touches backend/cpp/llama-cpp/ (e.g. an organic LLAMA_VERSION bump) schedules the per-arch -cpu-llama-cpp build, the merge job, and the legacy -nvidia-l4t-arm64-llama-cpp build. All consume their respective prebuilt bases. Expected wall-clock: ~25–35 min faster cold per variant vs pre-PR (skips gRPC compile).
Compare a future hot-cache build (same LLAMA_VERSION) against pre-PR — ccache + skipped gRPC stage should bring -gpu-nvidia-cuda-13-llama-cpp from the recent 6h timeout to well under 1h.

Revert plan

Each commit is independently revertable:

E: revert if l4t-cuda-12 variant has issues — falls back to "those 2 entries aren't mapped" (broken builds).
D: revert if matrix mapping has issues — backends fall back to empty builder-base-image and Dockerfiles fail because they require it. Means revert C+D together.
C: same.
B: revert if the variant Dockerfile refactor has issues — the from-source path comes back. Revert B+C+D+E together.
A: revert only if builder-base-image input has issues — but this is purely additive, low-risk.

The cleanest safe revert is "revert PR 2 entirely" which restores the from-source variant Dockerfiles. The base images on quay remain harmlessly available for future use.

Assisted-by: Claude:claude-opus-4-7

…args Adds an optional builder-base-image input. When set, BUILDER_BASE_IMAGE is forwarded as a build-arg AND BUILDER_TARGET=builder-prebuilt is set to select the variant Dockerfile's prebuilt-base stage. When empty, BUILDER_TARGET=builder-fromsource (the default) keeps the existing from-source build path. This makes the prebuilt-base optimization opt-in per matrix entry without breaking local `make backends/<name>` invocations or backends whose Dockerfile doesn't have a prebuilt path. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

…rebuilt + from-source Restructure the three llama.cpp-derived Dockerfiles so each supports two builder paths in a single file, selected via the BUILDER_TARGET build-arg: BUILDER_TARGET=builder-fromsource (default) - Standalone build: gRPC stage + apt installs + (conditionally) CUDA/ROCm/Vulkan + compile. - Used by `make backends/llama-cpp` locally and any caller that doesn't supply a prebuilt base. BUILDER_TARGET=builder-prebuilt - FROM \${BUILDER_BASE_IMAGE} (one of quay.io/go-skynet/ci-cache: base-grpc-* shipped in PR #9737). - Skips ~25-35 min of gRPC compile + ~5-10 min of toolchain installs. - Used by CI when the matrix entry sets builder-base-image. Final FROM scratch resolves BUILDER_TARGET via an aliasing FROM stage (BuildKit doesn't support variable expansion directly in COPY --from), then COPY --from=builder pulls package output from the chosen path. BuildKit prunes the unreferenced builder, so each build only does the work for the chosen path. The compile RUN is identical between both builder stages, so it's factored into .docker/<name>-compile.sh and bind-mounted into both. ccache mount + cache-id stay per-arch / per-build-type. Local DX preserved: `make backends/llama-cpp` (no extra args) defaults to BUILDER_TARGET=builder-fromsource and works exactly as before. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Plumbs the new optional builder-base-image input from matrix into backend_build.yml. backend_build.yml derives BUILDER_TARGET from whether builder-base-image is set, so matrix entries that map to a prebuilt base get the prebuilt path; entries that don't (python/go/ rust backends) fall through to the default builder-fromsource (which their own Dockerfiles don't reference, so it's a no-op for them). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

For every entry whose Dockerfile is llama-cpp/ik-llama-cpp/turboquant, add a builder-base-image field pointing at the appropriate prebuilt quay.io/go-skynet/ci-cache:base-grpc-* tag. backend_build.yml derives BUILDER_TARGET from this field's presence: non-empty -> builder-prebuilt; empty -> builder-fromsource. So this commit alone activates the prebuilt-base path for these 23 backends in CI, while local `make backends/<name>` (no extra args) keeps the from-source path. Mapping by (build-type, arch): - '' / amd64 -> base-grpc-amd64 - '' / arm64 -> base-grpc-arm64 - cublas-12 / amd64 -> base-grpc-cuda-12-amd64 - cublas-13 / amd64 -> base-grpc-cuda-13-amd64 - cublas-13 / arm64 -> base-grpc-cuda-13-arm64 - hipblas / amd64 -> base-grpc-rocm-amd64 - vulkan / amd64 -> base-grpc-vulkan-amd64 - vulkan / arm64 -> base-grpc-vulkan-arm64 - sycl_* / amd64 -> base-grpc-intel-amd64 - cublas-12 + JetPack r36.4.0 / arm64 -> base-grpc-l4t-cuda-12-arm64 Cold-build savings expected: ~25-35 min per variant (skips the gRPC compile + toolchain install that's now in the base). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Two matrix entries (-nvidia-l4t-arm64-llama-cpp, -nvidia-l4t-arm64- turboquant) build against nvcr.io/nvidia/l4t-jetpack:r36.4.0 + CUDA 12 ARM64. They're distinct from -nvidia-l4t-cuda-13-arm64-* which use Ubuntu 24.04 + CUDA 13 sbsa. Add the missing JetPack-based variant to base-images.yml so those two entries' builder-base-image mapping in the previous commit resolves. Bootstrap order before merging this PR (re-run base-images.yml on this branch — 9 existing variants hit BuildKit cache, only the new l4t-cuda-12-arm64 builds cold): gh workflow run base-images.yml --ref ci/base-images-consumers Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Pre-extraction, the apt + protoc + cmake + conditional CUDA/ROCm/Vulkan + gRPC install logic was duplicated across four files: - backend/Dockerfile.base-grpc-builder (CI prebuilt-base source of truth) - backend/Dockerfile.llama-cpp (builder-fromsource stage) - backend/Dockerfile.ik-llama-cpp (builder-fromsource stage) - backend/Dockerfile.turboquant (builder-fromsource stage) A bump to e.g. CUDA toolkit packages had to be made in 4 places, and drift between the prebuilt base and the variant-Dockerfile from-source path was a real concern (ik-llama-cpp's hipblas branch was already missing the rocBLAS Kernels echo that llama-cpp / turboquant / base-grpc-builder all had). Factor the install logic into a single .docker/install-base-deps.sh that reads its inputs from env vars and runs conditionally on BUILD_TYPE / CUDA_*_VERSION / TARGETARCH. Each Dockerfile now bind- mounts the script alongside .docker/apt-mirror.sh and invokes it from a single RUN step. The variant Dockerfiles' grpc-source stage is removed entirely — the script handles gRPC compile + install at /opt/grpc, and the builder-fromsource stage mirrors builder-prebuilt by copying /opt/grpc/. to /usr/local/. Result: - install-base-deps.sh: 244 lines (one source of truth) - Dockerfile.base-grpc-builder: 268 -> 98 lines - Dockerfile.llama-cpp: 361 -> 157 lines - Dockerfile.ik-llama-cpp: 348 -> 151 lines - Dockerfile.turboquant: 355 -> 154 lines - Total Dockerfile bytes: 1332 -> 560 lines (58% reduction) Bit-equivalence between prebuilt and from-source paths is now enforced by construction: both invoke the same script with the same inputs. A side-effect is that ik-llama-cpp now also gets the rocBLAS Kernels echo + clblas block parity it was previously missing. Includes the BUILD_TYPE=clblas branch (libclblast-dev) for parity even though no current CI matrix entry uses it. After this commit's force-push, base-images.yml needs to be redispatched on this branch — the Dockerfile.base-grpc-builder content shifts so the existing cache won't apply for the install layer (gRPC layer also rebuilds since it's now in the same RUN step). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

cuda-nvcc-12-0 isn't installable via apt on the JetPack r36.4.0 base image — JetPack ships CUDA preinstalled at /usr/local/cuda and its apt feed doesn't carry the cuda-nvcc-* packages from the public repositories. The original matrix entry for -nvidia-l4t-arm64-llama-cpp on master sets skip-drivers: 'true' for exactly this reason; the new base-grpc-l4t-cuda-12-arm64 base needs to match. Also forwards SKIP_DRIVERS as a build-arg from matrix into the build (was missing entirely before this commit). Caught by run 25612030775 — l4t-cuda-12-arm64 failed at: E: Package 'cuda-nvcc-12-0' has no installation candidate Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

#9742) The migration shipped over a sequence of PRs (#9726 → #9727 → #9730 → #9731 → #9737 → #9738 plus a handful of direct-to-master fixes) and left the .agents/ docs significantly out of date. Updated: - .agents/ci-caching.md (significant rewrite) - Cache key shape: now includes per-arch suffix (cache<suffix>-<arch>). - New "Workflow surfaces" overview table. - New "Pre-built base images (base-grpc-*)" section covering the 10 quay.io/go-skynet/ci-cache:base-grpc-* tags, the multi-target Dockerfile pattern (builder-fromsource / builder-prebuilt / aliasing FROM), the BUILDER_BASE_IMAGE → BUILDER_TARGET derivation, the bootstrap-on-branch order for new variants. - New "Per-arch native builds + manifest merge" section: split matrix entries, push-by-digest, backend_merge.yml, why provenance: false matters. - New "Path filter on master push" section: changed-backends.js handles push events via the Compare API; weekly Sunday cron is the safety net for unpinned Python deps. - New "ccache for C++ backend builds" section. - New "Composite actions" section: free-disk-space and setup-build-disk. - New "Concurrency" section documenting the per-PR-per-commit group fix. - Darwin section gains the brew link --overwrite note (after- cache-restore symlinks weren't restored) and the llama-cpp-darwin consolidation context. - "Self-hosted runners" section confirming the matrix is free of arc-runner-set / bigger-runner references except the residual test-extra.yml vibevoice case. - "Touching the cache pipeline" rule list extended (provenance, install-base-deps.sh single-source-of-truth, base-images bootstrap order). - .agents/adding-backends.md - Section 2 title: backend.yml -> backend-matrix.yml (path moved). - New paragraph on per-arch entries (platform-tag + paired matrix rows + auto-firing merge job). - New paragraph on builder-base-image for llama-cpp / ik-llama-cpp / turboquant. - Final checklist line updated accordingly. - .agents/building-and-testing.md - Reference: backend.yml -> backend-matrix.yml. - Note about builder-base-image and BUILDER_TARGET defaulting to builder-fromsource for local builds. - AGENTS.md - One-line description update for ci-caching.md to mention the new infrastructure (per-arch keys, base-grpc-*, manifest-merge, setup-build-disk, path filter). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

mudler force-pushed the ci/base-images-consumers branch from 79c7c7b to b060813 Compare May 9, 2026 20:48

mudler added 5 commits May 9, 2026 20:50

mudler force-pushed the ci/base-images-consumers branch from b060813 to 8411ea5 Compare May 9, 2026 20:50

mudler added 2 commits May 9, 2026 21:15

mudler merged commit 593f3a8 into master May 9, 2026
62 checks passed

mudler deleted the ci/base-images-consumers branch May 9, 2026 22:03

localai-bot mentioned this pull request May 9, 2026

docs(agents): update CI caching docs after the GHA-free-tier migration #9742

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: refactor llama-cpp variant Dockerfiles to consume prebuilt base-grpc images (PR 2/2)#9738

ci: refactor llama-cpp variant Dockerfiles to consume prebuilt base-grpc images (PR 2/2)#9738
mudler merged 7 commits into
masterfrom
ci/base-images-consumers

localai-bot commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented May 9, 2026

Summary

Commits

What changes

Mapping

Bootstrap before merging

Test plan

Revert plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants