Skip to content

ci: migrate CI and release workflows to Blacksmith runners#5300

Merged
avallete merged 10 commits into
developfrom
claude/ci-optimization-blacksmith-sQKP3
May 19, 2026
Merged

ci: migrate CI and release workflows to Blacksmith runners#5300
avallete merged 10 commits into
developfrom
claude/ci-optimization-blacksmith-sQKP3

Conversation

@avallete
Copy link
Copy Markdown
Member

@avallete avallete commented May 19, 2026

What

Moves the CI and release pipelines onto Blacksmith runners and threads Blacksmith's caching layers through the workflows that benefit from them.

Runner moves:

  • PR-critical jobs in test.yml (check, test-core, test-e2e) → blacksmith-8vcpu-ubuntu-2404.
  • cli-go-ci.yml::test (Go unit/integration) → blacksmith-8vcpu-ubuntu-2404.
  • Release build (heavy multi-target Bun compile) → blacksmith-32vcpu-ubuntu-2404.
  • Release smoke-test matrix: Linux → blacksmith-8vcpu-ubuntu-2404, macOS → blacksmith-6vcpu-macos-latest, Windows → blacksmith-8vcpu-windows-2025. macos-15-intel stays on GitHub-hosted (Blacksmith macOS is ARM-only) and is skipped on prereleases so beta wall-clock isn't gated by the slowest leg — stable releases on main still run it.
  • Release publish / publish-homebrew / publish-scoopblacksmith-2vcpu-ubuntu-2404.
  • Low-frequency Linux jobs (cli-go-api-sync, cli-go-mirror, cli-go-codeql non-Swift legs) → matching Blacksmith sizes for consistency.

Caching:

  • useblacksmith/checkout@v1 on test-e2e to exploit Blacksmith's sticky-disk git mirror with fetch-depth: 0 (the cli-e2e shards need full history for nx affected).
  • Upstream actions/cache@v5, actions/setup-node@v6, actions/setup-go@v6 — all SHA-pinned — for cache + toolchain setup. Initially this PR swapped to the useblacksmith/cache, useblacksmith/setup-node, useblacksmith/setup-go forks; partway through, those forks were archived in favor of Blacksmith's runner-level interception, which transparently routes upstream cache API calls to the same colocated backend. The final commit reverts to upstream and bumps the residual actions/cache@v4 in test.yml to @v5 so every cache step gets the new acceleration with continued upstream security patches.

All third-party action pins use full commit SHAs with trailing # v<N> comments.

Why

The CI wall-clock on this repo had grown into the territory where it visibly slowed merges (test.yml routinely ~20+ min, release build ~30+ min). Blacksmith's larger runners + colocated cache cut both materially, and the test-e2e checkout in particular benefits from the sticky-disk git mirror because fetch-depth: 0 was the dominant fixed cost in that job.

The upstream-vs-fork pivot matters for hygiene: the archived useblacksmith/* cache forks still execute when SHA-pinned, but get no future security patches and miss out on Blacksmith's newer transparent acceleration. Going through upstream actions gives us both — Blacksmith routing and normal upstream maintenance — at no behavioral cost.

Scope deliberately excluded

  • cli-go-pg-prove.yml / cli-go-publish-migra.yml Docker builds: separate follow-up, requires migrating to useblacksmith/setup-docker-builder@v1 + useblacksmith/build-push-action@v2 and dropping cache-from: type=gha (the GHA cache backend is not transparently routed by Blacksmith). Tracked, not in this PR.
  • Native arm64 smoke runner (replacing the QEMU emulation in the Linux smoke leg): planned follow-up.
  • Whether to delete test.yml's explicit Go-binary cache step now that $GOCACHE is colocated: deferred until warm-cache rebuild time can be measured on Blacksmith.

https://claude.ai/code/session_01KgHCbVTurxo4K9KivytQbt

First step of the Blacksmith rollout from CLI-1497. Every job that runs
.github/actions/setup pays 30-45s for the bun install + pnpm install
chain; the Blacksmith variants back the bun toolchain cache and pnpm
store with sticky-disk reads, so warm runs should drop to ~5-10s across
every job.

useblacksmith/cache@v5 and useblacksmith/setup-node@v5 fall back to the
upstream actions on non-Blacksmith runners, so the macOS and Windows
smoke-test legs that consume this composite are unaffected.

Refs CLI-1498.
@avallete avallete requested a review from a team as a code owner May 19, 2026 13:05
@avallete avallete enabled auto-merge (squash) May 19, 2026 13:06
@avallete avallete disabled auto-merge May 19, 2026 15:26
claude and others added 9 commits May 19, 2026 15:27
Step 2 of the Blacksmith rollout from CLI-1497. Swaps runs-on from
ubuntu-latest (GitHub-hosted 4 vCPU / 16 GB) to blacksmith-8vcpu-ubuntu-2404
on the four CPU-bound jobs that gate every PR:

  - test.yml::check
  - test.yml::test-core
  - test.yml::test-e2e (3 shards)
  - cli-go-ci.yml::test

Combined with the sticky-disk setup action from the previous commit, the
plan estimates PR wall-clock roughly halves (~7m -> ~3-4m). test-e2e-summary,
coverage, lint, start, link, and codegen stay on ubuntu-latest -- they're
short, not CPU-bound, and outside this PR's scope.

Refs CLI-1499.
Step 3 of the Blacksmith rollout from CLI-1497. The build job is the
longest serial step on the release critical path (~6m47s on
large-linux-x86), driving 8x bun --compile plus 6x go build. Moving it
to a 32 vCPU Blacksmith runner should shave 1.5-2 minutes; the
sticky-disk node_modules and go-build caches from the previous steps
compound on top.

Supersedes the org-provisioned large-linux-x86 swap -- per the parent
plan, that label was the only remaining holdout once everything else
migrates to Blacksmith.

Refs CLI-1500.
Step 4 of the Blacksmith rollout from CLI-1497. Replaces ubuntu-latest
with blacksmith-8vcpu-ubuntu-2404 in the smoke-test matrix. The Linux
leg pulls 6 images x 2 platforms via docker; Blacksmith runners ship a
local registry mirror that makes those pulls near-instant, so we rely
on it instead of any explicit pre-pull/cache machinery.

The QEMU setup stays -- arm64 docker subtests still execute via
emulation. A future PR can split the leg into native amd64 + arm64
Blacksmith runners (per the parent plan's PR7).

The macOS and Windows entries are unchanged; Blacksmith is Linux-only.

Refs CLI-1501.
Extends the smoke-test matrix migration:

  - macos-latest          -> blacksmith-6vcpu-macos-latest (drop-in,
    both Apple Silicon ARM64)
  - windows-latest        -> blacksmith-8vcpu-windows-2025 (Public
    Beta; bumps OS from Server 2022 to Server 2025). The Windows
    smoke test (apps/cli/tests/smoke-test-windows.ts) does not use
    Docker or WSL, so Blacksmith's "no Linux containers on Windows"
    caveat does not apply here.

macos-15-intel stays on GitHub-hosted -- Blacksmith macOS is ARM-only,
so the gating leg of the release pipeline cannot move yet.

Refs CLI-1501.
The macos-15-intel runner is the only smoke-test leg that cannot move
to Blacksmith (no Intel macOS option) and is the wall-clock floor of
every release. Beta releases trade Intel coverage for speed: stable
promotion to main still runs the full matrix and catches Intel-only
regressions before npm publish.

PR smoke (smoke-test-pr.yml passes prerelease: true) is also skipped
under this rule, accepting the trade-off that Intel-only issues will
surface at stable promotion rather than at PR time.

Refs CLI-1497.
Step 5 of the Blacksmith rollout from CLI-1497.

setup-go -> useblacksmith/setup-go@v5: $GOCACHE moves to a sticky disk,
so cgo + race-instrumented test binaries don't fully rebuild on every
run. Swapped in every cli-go-ci.yml job (test, lint, start, link,
codegen) plus release-shared.yml::build and test.yml's check / test-core
/ test-e2e.

checkout -> useblacksmith/checkout@v1 only on jobs that need
fetch-depth: 0 (semantic-release plan, fast-forward, and nx-affected
in test-e2e). Default depth-1 checkouts are left on actions/checkout@v6
because the incremental fetch win there is negligible.

Refs CLI-1502.
Match the action-hardening policy (d2ddf9f) by replacing the floating
@v5 / @v1 refs introduced in earlier rollout commits with full 40-char
commit SHAs and a trailing "# v<N>" comment, matching the format
already used for actions/checkout, actions/setup-node, actions/setup-go,
oven-sh/setup-bun, etc.

Resolved tag SHAs (via git ls-remote, 2026-05-19):
  useblacksmith/cache       v5 -> 71c7c918062ba3861252d84b07fe5ab2a6b467a6
  useblacksmith/setup-node  v5 -> 65c6ca86fdeb0ab3d85e78f57e4f6a7e4780b391
  useblacksmith/setup-go    v5 -> f12a3dabb4171193018e496855e47349b360c056
  useblacksmith/checkout    v1 -> 41cdeedae8edb2e684ba22896a5fd2a3cb85db6b

Dependabot (github-actions ecosystem) already groups major bumps in
.github/dependabot.yml, so these will get automated updates the same
way the upstream actions do.
The useblacksmith/cache, useblacksmith/setup-node, useblacksmith/setup-go
forks are now archived. Blacksmith's colocated cache is applied at the
runner level (network/DNS interception of cache API calls), so upstream
actions/cache@v5, actions/setup-node@v6 and actions/setup-go@v6 hit the
same 4x backend transparently — with continued security patches and
upstream improvements.

Revert prior swaps (PR1 19c4534, PR5 9b05731/3dad0ae) to SHA-pinned
upstream actions, bump test.yml's residual actions/cache@v4 to v5, and
migrate previously-skipped low-frequency Linux jobs (api-sync, mirror,
codeql, release publish/homebrew/scoop) to Blacksmith runners for
consistency.

useblacksmith/checkout is unchanged — it's a separate sticky-disk
product, not part of the deprecated cache-fork family.
@avallete avallete changed the title ci: replace GitHub Actions with Blacksmith for setup ci: migrate CI and release workflows to Blacksmith runners May 19, 2026
@avallete avallete merged commit aa818c1 into develop May 19, 2026
9 checks passed
@avallete avallete deleted the claude/ci-optimization-blacksmith-sQKP3 branch May 19, 2026 17:04
avallete added a commit that referenced this pull request May 20, 2026
…5312)

The release workflow's `publish` job was migrated to a Blacksmith runner
in #5300, which broke npm publish:

```
npm error 422 Unprocessable Entity - PUT https://registry.npmjs.org/@supabase%2fcli-darwin-arm64
  - Error verifying sigstore provenance bundle:
    Unsupported GitHub Actions runner environment: "self-hosted".
    Only "github-hosted" runners are supported when publishing with provenance.
```

`publish.ts` passes `--provenance` to `pnpm publish`, which has sigstore
attest the build against the runner's OIDC identity. Blacksmith runners
present as `self-hosted` to sigstore, so npm rejects the upload with
E422.

Move only the `publish` job back to `ubuntu-latest`. `build` and
`smoke-test` stay on Blacksmith; `publish-homebrew` and `publish-scoop`
don't go through npm/sigstore (they push to the tap/bucket repos via
git) and also stay on Blacksmith. The publish job is short and not
compute-bound, so the wall-clock cost of github-hosted is negligible.

Failed run that motivated this:
https://github.com/supabase/cli/actions/runs/26153946606

---
_Generated by [Claude
Code](https://claude.ai/code/session_01RDNmHeyREpf3ZBQLggK75q)_

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants