Skip to content

ci: add pre-built base-grpc-builder image infrastructure (PR 1/2)#9737

Merged
mudler merged 1 commit into
masterfrom
ci/base-images-grpc-cuda
May 9, 2026
Merged

ci: add pre-built base-grpc-builder image infrastructure (PR 1/2)#9737
mudler merged 1 commit into
masterfrom
ci/base-images-grpc-cuda

Conversation

@localai-bot
Copy link
Copy Markdown
Collaborator

Summary

This is PR 1 of a 2-PR rollout. Lands the infrastructure for pre-built builder base images that PR 2's variant Dockerfile refactor will consume. Landing this PR alone changes no existing CI behavior — the variant Dockerfiles (Dockerfile.llama-cpp, Dockerfile.ik-llama-cpp, Dockerfile.turboquant) and the matrix are untouched. The base images don't exist on quay until someone manually runs the new workflow.

What's in here

  • backend/Dockerfile.base-grpc-builder (new) — single parameterized Dockerfile that produces a fully-prepped builder base for a (BASE_IMAGE, BUILD_TYPE, CUDA_*, TARGETARCH) tuple. Final image contents: apt build deps + protoc v27.1 + cmake 3.31.10 + gRPC v1.65.0 at /opt/grpc + (conditionally) CUDA / ROCm 7.2.1 / Vulkan SDK 1.4.335.0 toolchains. Source of truth for install logic is the existing Dockerfile.llama-cpp lines 9–261; install steps are copied verbatim so the produced image is bit-equivalent to what variant builds produce today.

  • .github/workflows/base-images.yml (new) — workflow that builds + pushes 9 base-image variants to quay.io/go-skynet/ci-cache::

    • base-grpc-amd64 / base-grpc-arm64 (Ubuntu 24.04, CPU-only)
    • base-grpc-cuda-12-amd64 (Ubuntu 24.04 + CUDA 12.8)
    • base-grpc-cuda-13-amd64 (Ubuntu 22.04 + CUDA 13.0)
    • base-grpc-cuda-13-arm64 (Ubuntu 24.04 + CUDA 13.0 sbsa)
    • base-grpc-rocm-amd64 (rocm/dev-ubuntu-24.04:7.2.1)
    • base-grpc-vulkan-amd64 / base-grpc-vulkan-arm64
    • base-grpc-intel-amd64 (intel/oneapi-basekit)

    Triggers: Saturdays 05:00 UTC cron + workflow_dispatch + master push when this workflow or Dockerfile.base-grpc-builder itself changes. Uses native arm64 hosted runners (ubuntu-24.04-arm); no QEMU emulation. Each variant has its own BuildKit registry cache scoped per tag.

Bootstrap (one-time after merging this PR)

gh workflow run base-images.yml --ref master

Wait ~30 min for all 9 variants to push to quay. Then PR 2 (consumer-side refactor) can land.

Why split into two PRs

PR 2 will refactor the three variant Dockerfiles to FROM these prebuilt bases, with no from-source fallback. Their CI builds will fail if the bases don't exist on quay yet. Sequential rollout (infrastructure → bootstrap → consumers) avoids a broken-master window.

Test plan

  • Static checks pass (actionlint, yamllint, docker buildx build --check) — already verified locally.
  • Merge this PR — should be inert; existing backend.yml and image.yml runs are unaffected.
  • Run gh workflow run base-images.yml --ref master — observe all 9 matrix variants build successfully and push to quay.io/go-skynet/ci-cache:base-grpc-*. Native arm64 legs land on ubuntu-24.04-arm; no emulated builds.
  • Subsequent runs (cron / dispatch) hit BuildKit registry cache and complete much faster.
  • PR 2 (separate) consumes the bases by switching variant Dockerfiles to FROM ${BUILDER_BASE_IMAGE}.

Assisted-by: Claude:claude-opus-4-7

Introduces a parameterized Dockerfile.base-grpc-builder that produces
a fully-prepped builder base image (apt deps + protoc + cmake + gRPC
at /opt/grpc + conditional CUDA/ROCm/Vulkan toolchains) and a
base-images.yml workflow that builds + pushes 9 variants to
quay.io/go-skynet/ci-cache:base-grpc-*:

  base-grpc-amd64                 (Ubuntu 24.04, CPU-only)
  base-grpc-arm64                 (Ubuntu 24.04, CPU-only)
  base-grpc-cuda-12-amd64         (Ubuntu 24.04 + CUDA 12.8)
  base-grpc-cuda-13-amd64         (Ubuntu 22.04 + CUDA 13.0)
  base-grpc-cuda-13-arm64         (Ubuntu 24.04 + CUDA 13.0 sbsa)
  base-grpc-rocm-amd64            (rocm/dev-ubuntu-24.04:7.2.1 + hipblas)
  base-grpc-vulkan-amd64          (Ubuntu 24.04 + Vulkan SDK 1.4.335)
  base-grpc-vulkan-arm64          (Ubuntu 24.04 + Vulkan SDK ARM 1.4.335)
  base-grpc-intel-amd64           (intel/oneapi-basekit:2025.3.2)

The variant Dockerfiles (Dockerfile.llama-cpp, ik-llama-cpp, turboquant)
are NOT touched in this PR. PR 2 will refactor them to FROM these
prebuilt bases. This PR is intentionally inert - landing it changes no
existing CI behavior. The base images don't exist on quay until
someone manually triggers the workflow.

Bootstrap after merge:
  gh workflow run base-images.yml --ref master
Wait ~30 min for all 9 variants to push, then merge PR 2 (the
consumer-side refactor that uses BUILDER_BASE_IMAGE build-arg to
FROM these tags).

Triggers afterwards:
  - Saturdays 05:00 UTC (cron) - picks up upstream security updates,
    runs ~24h before the backend.yml Sunday cron so bases are fresh.
  - workflow_dispatch - manual ad-hoc rebuild.
  - master push touching Dockerfile.base-grpc-builder or this workflow.

Why split into two PRs: the variant Dockerfiles in PR 2 will FROM the
prebuilt bases and have no from-source fallback. Their CI builds fail
if the bases don't exist on quay yet. Landing infrastructure first +
manual bootstrap + then consumer refactor avoids a broken-master window.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler merged commit 28e2962 into master May 9, 2026
52 checks passed
@mudler mudler deleted the ci/base-images-grpc-cuda branch May 9, 2026 16:44
mudler added a commit that referenced this pull request May 9, 2026
…rebuilt + from-source

Restructure the three llama.cpp-derived Dockerfiles so each supports
two builder paths in a single file, selected via the BUILDER_TARGET
build-arg:

  BUILDER_TARGET=builder-fromsource (default)
    - Standalone build: gRPC stage + apt installs + (conditionally)
      CUDA/ROCm/Vulkan + compile.
    - Used by `make backends/llama-cpp` locally and any caller that
      doesn't supply a prebuilt base.

  BUILDER_TARGET=builder-prebuilt
    - FROM \${BUILDER_BASE_IMAGE} (one of quay.io/go-skynet/ci-cache:
      base-grpc-* shipped in PR #9737).
    - Skips ~25-35 min of gRPC compile + ~5-10 min of toolchain installs.
    - Used by CI when the matrix entry sets builder-base-image.

Final FROM scratch resolves BUILDER_TARGET via an aliasing FROM stage
(BuildKit doesn't support variable expansion directly in COPY --from),
then COPY --from=builder pulls package output from the chosen path.
BuildKit prunes the unreferenced builder, so each build only does the
work for the chosen path.

The compile RUN is identical between both builder stages, so it's
factored into .docker/<name>-compile.sh and bind-mounted into both.
ccache mount + cache-id stay per-arch / per-build-type.

Local DX preserved: `make backends/llama-cpp` (no extra args) defaults
to BUILDER_TARGET=builder-fromsource and works exactly as before.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
mudler added a commit that referenced this pull request May 9, 2026
…rebuilt + from-source

Restructure the three llama.cpp-derived Dockerfiles so each supports
two builder paths in a single file, selected via the BUILDER_TARGET
build-arg:

  BUILDER_TARGET=builder-fromsource (default)
    - Standalone build: gRPC stage + apt installs + (conditionally)
      CUDA/ROCm/Vulkan + compile.
    - Used by `make backends/llama-cpp` locally and any caller that
      doesn't supply a prebuilt base.

  BUILDER_TARGET=builder-prebuilt
    - FROM \${BUILDER_BASE_IMAGE} (one of quay.io/go-skynet/ci-cache:
      base-grpc-* shipped in PR #9737).
    - Skips ~25-35 min of gRPC compile + ~5-10 min of toolchain installs.
    - Used by CI when the matrix entry sets builder-base-image.

Final FROM scratch resolves BUILDER_TARGET via an aliasing FROM stage
(BuildKit doesn't support variable expansion directly in COPY --from),
then COPY --from=builder pulls package output from the chosen path.
BuildKit prunes the unreferenced builder, so each build only does the
work for the chosen path.

The compile RUN is identical between both builder stages, so it's
factored into .docker/<name>-compile.sh and bind-mounted into both.
ccache mount + cache-id stay per-arch / per-build-type.

Local DX preserved: `make backends/llama-cpp` (no extra args) defaults
to BUILDER_TARGET=builder-fromsource and works exactly as before.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
mudler added a commit that referenced this pull request May 9, 2026
…rpc images (PR 2/2) (#9738)

* ci(backend_build): plumb builder-base-image and BUILDER_TARGET build-args

Adds an optional builder-base-image input. When set, BUILDER_BASE_IMAGE
is forwarded as a build-arg AND BUILDER_TARGET=builder-prebuilt is set
to select the variant Dockerfile's prebuilt-base stage. When empty,
BUILDER_TARGET=builder-fromsource (the default) keeps the existing
from-source build path.

This makes the prebuilt-base optimization opt-in per matrix entry
without breaking local `make backends/<name>` invocations or backends
whose Dockerfile doesn't have a prebuilt path.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(llama-cpp,ik-llama-cpp,turboquant): multi-target Dockerfiles for prebuilt + from-source

Restructure the three llama.cpp-derived Dockerfiles so each supports
two builder paths in a single file, selected via the BUILDER_TARGET
build-arg:

  BUILDER_TARGET=builder-fromsource (default)
    - Standalone build: gRPC stage + apt installs + (conditionally)
      CUDA/ROCm/Vulkan + compile.
    - Used by `make backends/llama-cpp` locally and any caller that
      doesn't supply a prebuilt base.

  BUILDER_TARGET=builder-prebuilt
    - FROM \${BUILDER_BASE_IMAGE} (one of quay.io/go-skynet/ci-cache:
      base-grpc-* shipped in PR #9737).
    - Skips ~25-35 min of gRPC compile + ~5-10 min of toolchain installs.
    - Used by CI when the matrix entry sets builder-base-image.

Final FROM scratch resolves BUILDER_TARGET via an aliasing FROM stage
(BuildKit doesn't support variable expansion directly in COPY --from),
then COPY --from=builder pulls package output from the chosen path.
BuildKit prunes the unreferenced builder, so each build only does the
work for the chosen path.

The compile RUN is identical between both builder stages, so it's
factored into .docker/<name>-compile.sh and bind-mounted into both.
ccache mount + cache-id stay per-arch / per-build-type.

Local DX preserved: `make backends/llama-cpp` (no extra args) defaults
to BUILDER_TARGET=builder-fromsource and works exactly as before.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(backend.yml,backend_pr.yml): forward builder-base-image from matrix

Plumbs the new optional builder-base-image input from matrix into
backend_build.yml. backend_build.yml derives BUILDER_TARGET from
whether builder-base-image is set, so matrix entries that map to a
prebuilt base get the prebuilt path; entries that don't (python/go/
rust backends) fall through to the default builder-fromsource (which
their own Dockerfiles don't reference, so it's a no-op for them).

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(backend-matrix): wire builder-base-image to llama-cpp variants

For every entry whose Dockerfile is llama-cpp/ik-llama-cpp/turboquant,
add a builder-base-image field pointing at the appropriate prebuilt
quay.io/go-skynet/ci-cache:base-grpc-* tag.

backend_build.yml derives BUILDER_TARGET from this field's presence:
non-empty -> builder-prebuilt; empty -> builder-fromsource. So this
commit alone activates the prebuilt-base path for these 23 backends
in CI, while local `make backends/<name>` (no extra args) keeps the
from-source path.

Mapping by (build-type, arch):
- '' / amd64        -> base-grpc-amd64
- '' / arm64        -> base-grpc-arm64
- cublas-12 / amd64 -> base-grpc-cuda-12-amd64
- cublas-13 / amd64 -> base-grpc-cuda-13-amd64
- cublas-13 / arm64 -> base-grpc-cuda-13-arm64
- hipblas / amd64   -> base-grpc-rocm-amd64
- vulkan / amd64    -> base-grpc-vulkan-amd64
- vulkan / arm64    -> base-grpc-vulkan-arm64
- sycl_* / amd64    -> base-grpc-intel-amd64
- cublas-12 + JetPack r36.4.0 / arm64 -> base-grpc-l4t-cuda-12-arm64

Cold-build savings expected: ~25-35 min per variant (skips the gRPC
compile + toolchain install that's now in the base).

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci: add base-grpc-l4t-cuda-12-arm64 variant for legacy JetPack entries

Two matrix entries (-nvidia-l4t-arm64-llama-cpp, -nvidia-l4t-arm64-
turboquant) build against nvcr.io/nvidia/l4t-jetpack:r36.4.0 + CUDA
12 ARM64. They're distinct from -nvidia-l4t-cuda-13-arm64-* which use
Ubuntu 24.04 + CUDA 13 sbsa. Add the missing JetPack-based variant
to base-images.yml so those two entries' builder-base-image mapping
in the previous commit resolves.

Bootstrap order before merging this PR (re-run base-images.yml on
this branch — 9 existing variants hit BuildKit cache, only the new
l4t-cuda-12-arm64 builds cold):

  gh workflow run base-images.yml --ref ci/base-images-consumers

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci: extract base-builder install logic into .docker/install-base-deps.sh

Pre-extraction, the apt + protoc + cmake + conditional CUDA/ROCm/Vulkan
+ gRPC install logic was duplicated across four files:
  - backend/Dockerfile.base-grpc-builder (CI prebuilt-base source of truth)
  - backend/Dockerfile.llama-cpp (builder-fromsource stage)
  - backend/Dockerfile.ik-llama-cpp (builder-fromsource stage)
  - backend/Dockerfile.turboquant (builder-fromsource stage)

A bump to e.g. CUDA toolkit packages had to be made in 4 places, and
drift between the prebuilt base and the variant-Dockerfile from-source
path was a real concern (ik-llama-cpp's hipblas branch was already
missing the rocBLAS Kernels echo that llama-cpp / turboquant /
base-grpc-builder all had).

Factor the install logic into a single .docker/install-base-deps.sh
that reads its inputs from env vars and runs conditionally on
BUILD_TYPE / CUDA_*_VERSION / TARGETARCH. Each Dockerfile now bind-
mounts the script alongside .docker/apt-mirror.sh and invokes it from
a single RUN step.

The variant Dockerfiles' grpc-source stage is removed entirely — the
script handles gRPC compile + install at /opt/grpc, and the
builder-fromsource stage mirrors builder-prebuilt by copying
/opt/grpc/. to /usr/local/.

Result:
  - install-base-deps.sh: 244 lines (one source of truth)
  - Dockerfile.base-grpc-builder: 268 -> 98 lines
  - Dockerfile.llama-cpp: 361 -> 157 lines
  - Dockerfile.ik-llama-cpp: 348 -> 151 lines
  - Dockerfile.turboquant: 355 -> 154 lines
  - Total Dockerfile bytes: 1332 -> 560 lines (58% reduction)

Bit-equivalence between prebuilt and from-source paths is now enforced
by construction: both invoke the same script with the same inputs.
A side-effect is that ik-llama-cpp now also gets the rocBLAS Kernels
echo + clblas block parity it was previously missing.

Includes the BUILD_TYPE=clblas branch (libclblast-dev) for parity even
though no current CI matrix entry uses it.

After this commit's force-push, base-images.yml needs to be redispatched
on this branch — the Dockerfile.base-grpc-builder content shifts so the
existing cache won't apply for the install layer (gRPC layer also
rebuilds since it's now in the same RUN step).

Assisted-by: Claude:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(base-images): skip-drivers on JetPack l4t variant

cuda-nvcc-12-0 isn't installable via apt on the JetPack r36.4.0 base
image — JetPack ships CUDA preinstalled at /usr/local/cuda and its
apt feed doesn't carry the cuda-nvcc-* packages from the public
repositories. The original matrix entry for -nvidia-l4t-arm64-llama-cpp
on master sets skip-drivers: 'true' for exactly this reason; the
new base-grpc-l4t-cuda-12-arm64 base needs to match.

Also forwards SKIP_DRIVERS as a build-arg from matrix into the build
(was missing entirely before this commit).

Caught by run 25612030775 — l4t-cuda-12-arm64 failed at:
  E: Package 'cuda-nvcc-12-0' has no installation candidate

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
mudler added a commit that referenced this pull request May 9, 2026
#9742)

The migration shipped over a sequence of PRs (#9726#9727#9730#9731#9737#9738 plus a handful of direct-to-master fixes) and
left the .agents/ docs significantly out of date.

Updated:

- .agents/ci-caching.md (significant rewrite)
  - Cache key shape: now includes per-arch suffix (cache<suffix>-<arch>).
  - New "Workflow surfaces" overview table.
  - New "Pre-built base images (base-grpc-*)" section covering the 10
    quay.io/go-skynet/ci-cache:base-grpc-* tags, the multi-target
    Dockerfile pattern (builder-fromsource / builder-prebuilt /
    aliasing FROM), the BUILDER_BASE_IMAGE → BUILDER_TARGET derivation,
    the bootstrap-on-branch order for new variants.
  - New "Per-arch native builds + manifest merge" section: split
    matrix entries, push-by-digest, backend_merge.yml, why
    provenance: false matters.
  - New "Path filter on master push" section: changed-backends.js
    handles push events via the Compare API; weekly Sunday cron is
    the safety net for unpinned Python deps.
  - New "ccache for C++ backend builds" section.
  - New "Composite actions" section: free-disk-space and
    setup-build-disk.
  - New "Concurrency" section documenting the per-PR-per-commit group
    fix.
  - Darwin section gains the brew link --overwrite note (after-
    cache-restore symlinks weren't restored) and the llama-cpp-darwin
    consolidation context.
  - "Self-hosted runners" section confirming the matrix is free of
    arc-runner-set / bigger-runner references except the residual
    test-extra.yml vibevoice case.
  - "Touching the cache pipeline" rule list extended (provenance,
    install-base-deps.sh single-source-of-truth, base-images bootstrap
    order).

- .agents/adding-backends.md
  - Section 2 title: backend.yml -> backend-matrix.yml (path moved).
  - New paragraph on per-arch entries (platform-tag + paired matrix
    rows + auto-firing merge job).
  - New paragraph on builder-base-image for llama-cpp / ik-llama-cpp /
    turboquant.
  - Final checklist line updated accordingly.

- .agents/building-and-testing.md
  - Reference: backend.yml -> backend-matrix.yml.
  - Note about builder-base-image and BUILDER_TARGET defaulting to
    builder-fromsource for local builds.

- AGENTS.md
  - One-line description update for ci-caching.md to mention the new
    infrastructure (per-arch keys, base-grpc-*, manifest-merge,
    setup-build-disk, path filter).

Assisted-by: Claude:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants