Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
4cc1676
emulator pull progress
BilalG1 Apr 15, 2026
a65022b
emulator fast-start via VM snapshot + live secret rotation
BilalG1 Apr 15, 2026
30dbdff
faster snapshot resume via mapped-ram + rotation opt-out
BilalG1 Apr 15, 2026
6021a04
build QEMU 10.2.2 from source in CI for mapped-ram support
BilalG1 Apr 15, 2026
0c0d726
build stack-cli's workspace deps in emulator CI
BilalG1 Apr 15, 2026
b03486e
fix emulator pull --pr/--run snapshot detection
BilalG1 Apr 15, 2026
0b3a9cf
fix sentinel marker path in docker/server entrypoint
BilalG1 Apr 15, 2026
cfdc882
Merge remote-tracking branch 'origin/dev' into local-emulator-qol-fixes
BilalG1 Apr 15, 2026
2c8ad4c
address unresolved PR review comments on snapshot resume path
BilalG1 Apr 15, 2026
76f9543
simplify emulator fast-start: tighter polls, drop dead wrappers
BilalG1 Apr 15, 2026
3586115
fix snapshot resume host fs + restore standalone run-emulator.sh path
BilalG1 Apr 15, 2026
037755b
retry tsdown migration build to survive qemu-user futex hangs
BilalG1 Apr 15, 2026
894c1ce
fix CLI artifact download + build arm64 emulator on macOS runner
BilalG1 Apr 16, 2026
54ecda8
fix colima on GHA macOS: use QEMU backend instead of VZ driver
BilalG1 Apr 16, 2026
49a20ed
split arm64 build: Docker on Linux, QEMU snapshot on macOS
BilalG1 Apr 16, 2026
11531eb
fix check_deps: skip docker requirement when SKIP_DOCKER_BUILD=1
BilalG1 Apr 16, 2026
7534637
fix lint warning + remove invalid `local` in top-level loop
BilalG1 Apr 16, 2026
288b80e
fix empty array expansion under bash 3.2 (macOS)
BilalG1 Apr 16, 2026
d94aa66
capture emulator snapshot locally during pull instead of shipping fro…
BilalG1 Apr 16, 2026
7db9fe4
fix CI verify step: use freshly-built qcow2 via STACK_EMULATOR_HOME
BilalG1 Apr 16, 2026
510ef38
fix PCI slot mismatch in snapshot capture + stale runtime ISO on dire…
BilalG1 Apr 16, 2026
39b5c08
fix smoke test: skip shell ISO regen when CLI already wrote it
BilalG1 Apr 16, 2026
7acb3ed
fix capture path: guard against set -u + preserve cmd_capture's empty…
BilalG1 Apr 16, 2026
38974ca
Merge branch 'dev' into local-emulator-qol-fixes
BilalG1 Apr 20, 2026
8f9b9c1
emulator build: split snapshot-bake from savevm capture
BilalG1 Apr 20, 2026
fbd3207
seed: bump session activity events tx timeout to 30s
BilalG1 Apr 20, 2026
c8630c6
emulator: bump Postgres statement_timeout 30s → 120s
BilalG1 Apr 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
211 changes: 169 additions & 42 deletions .github/workflows/qemu-emulator-build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,16 @@ concurrency:

env:
EMULATOR_IMAGE_NAME: stack-local-emulator
# Shell scripts (build-image.sh, run-emulator.sh) read these directly.
EMULATOR_IMAGE_DIR: ${{ github.workspace }}/docker/local-emulator/qemu/images
EMULATOR_RUN_DIR: ${{ github.workspace }}/docker/local-emulator/qemu/run
# The stack-cli ignores EMULATOR_IMAGE_DIR/RUN_DIR and derives its own paths
# from STACK_EMULATOR_HOME. Point it at the same workspace so `emulator
# start` finds the freshly-built qcow2 from build-image.sh and cold-boots
# it, instead of auto-pulling from a prior release. CI doesn't capture a
# savevm (EMULATOR_CAPTURE_SAVEVM defaults to 0); users capture locally
# on first `stack emulator pull`.
STACK_EMULATOR_HOME: ${{ github.workspace }}/docker/local-emulator/qemu

jobs:
build:
Expand All @@ -34,15 +42,16 @@ jobs:
fail-fast: false
matrix:
include:
# amd64 runs natively under KVM on ubicloud's amd64 runner.
# Both arches build on ubicloud's amd64 runner. amd64 uses KVM;
# arm64 runs under cross-arch TCG (slow, but only cloud-init
# provisioning has to complete — the boot/verify smoke test below
# is gated to amd64 because TCG can't boot Next.js in any
# reasonable time). Snapshots are NOT published — `stack emulator
# pull` captures one locally on first run, which is the only way
# to guarantee KVM/HVF/TCG + `-cpu max` compatibility on the
# user's machine.
- arch: amd64
runner: ubicloud-standard-8
# arm64 runs under cross-arch TCG on ubicloud's amd64 runner.
# No KVM for arm64 guests on an amd64 host; cortex-a72 + V8
# --jitless together sidestep the SIGTRAPs that cross-arch TCG
# hits on aggressive arm64 JIT code. Smoke test is still skipped
# because the backend can't come up reliably under cross-arch
# TCG within any sane window.
- arch: arm64
runner: ubicloud-standard-8

Expand All @@ -55,10 +64,60 @@ jobs:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Install QEMU dependencies
# Node/pnpm are needed on both arches: arm64 also runs
# generate-env-development.mjs inside build-image.sh. amd64 additionally
# builds and runs the CLI for the verification steps below.
- uses: pnpm/action-setup@v4
with:
version: 10.23.0

- uses: actions/setup-node@v4
with:
node-version: 22
cache: pnpm
Comment thread
BilalG1 marked this conversation as resolved.

- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y qemu-system-x86 qemu-system-arm qemu-kvm qemu-utils genisoimage socat qemu-efi-aarch64
# qemu-utils gives us qemu-img; qemu-efi-aarch64 provides the arm64
# UEFI firmware. The actual qemu-system-* binaries come from the
# source build below — Ubuntu 24.04 ships QEMU 8.2 which predates
# the mapped-ram migration capability we rely on.
sudo apt-get install -y qemu-utils qemu-efi-aarch64 socat genisoimage zstd \
ninja-build pkg-config python3-venv \
libglib2.0-dev libpixman-1-dev libslirp-dev libepoxy-dev libgbm-dev

# QEMU 10.2.2 is required for the mapped-ram + multifd migration path
# used by the fast-resume snapshot. Cache the compiled prefix so CI
# only pays the ~5-8 min build cost once per runner image.
- name: Restore QEMU 10.2.2 cache
id: qemu-cache
uses: actions/cache@v4
with:
path: /opt/qemu
key: qemu-10.2.2-${{ runner.os }}-${{ runner.arch }}-v1

- name: Build QEMU 10.2.2 from source
if: steps.qemu-cache.outputs.cache-hit != 'true'
run: |
set -euxo pipefail
curl -fsSL https://download.qemu.org/qemu-10.2.2.tar.xz -o /tmp/qemu.tar.xz
mkdir -p /tmp/qemu-src
tar -xf /tmp/qemu.tar.xz -C /tmp/qemu-src --strip-components=1
cd /tmp/qemu-src
./configure --prefix=/opt/qemu \
--target-list=x86_64-softmmu,aarch64-softmmu \
--enable-kvm --enable-slirp --enable-tcg \
--disable-docs --disable-gtk --disable-sdl --disable-vnc \
--disable-guest-agent --disable-tools
make -j"$(nproc)"
sudo make install

Comment thread
BilalG1 marked this conversation as resolved.
- name: Put QEMU 10.2.2 on PATH
run: |
echo "/opt/qemu/bin" >> "$GITHUB_PATH"
/opt/qemu/bin/qemu-system-x86_64 --version
/opt/qemu/bin/qemu-system-aarch64 --version

- name: Enable KVM access
run: |
Expand All @@ -82,41 +141,56 @@ jobs:
- name: Generate emulator env
run: node docker/local-emulator/generate-env-development.mjs

# arm64 runs under cross-arch TCG on an amd64 runner; the backend's
# V8 TurboFan JIT re-triggers the SIGTRAPs we dodge in migrations
# with --no-opt, and even if it didn't, boot is too slow under TCG
# to verify in any sane window. amd64 KVM already exercises the
# service stack; real arm64 hosts have KVM for end-users.
- name: Start emulator and verify
# amd64 runs under KVM on the runner so we can boot the newly-built
# image to verify it works end-to-end before publishing. arm64 runs
# under cross-arch TCG on an amd64 host, which can't reliably boot
# Next.js within any sane window — skipped.
- name: Build stack-cli (for emulator CLI)
if: matrix.arch == 'amd64'
run: |
chmod +x docker/local-emulator/qemu/run-emulator.sh
EMULATOR_ARCH=${{ matrix.arch }} \
EMULATOR_READY_TIMEOUT=3200 \
docker/local-emulator/qemu/run-emulator.sh start
pnpm install --frozen-lockfile --filter '@stackframe/stack-cli...'
# Turbo's trailing `...` filter builds stack-cli AND its workspace
# deps (@stackframe/js, @stackframe/stack-shared, etc.) — stack-cli
# imports them at runtime from their dist/ outputs.
pnpm exec turbo run build --filter='@stackframe/stack-cli...'

- name: Start emulator and verify
if: matrix.arch == 'amd64'
env:
EMULATOR_ARCH: ${{ matrix.arch }}
EMULATOR_READY_TIMEOUT: 3200
EMULATOR_IMAGE_DIR: ${{ env.EMULATOR_IMAGE_DIR }}
EMULATOR_RUN_DIR: ${{ env.EMULATOR_RUN_DIR }}
run: node packages/stack-cli/dist/index.js emulator start

- name: Verify services are healthy
if: matrix.arch == 'amd64'
run: |
EMULATOR_ARCH=${{ matrix.arch }} \
docker/local-emulator/qemu/run-emulator.sh status
env:
EMULATOR_ARCH: ${{ matrix.arch }}
EMULATOR_IMAGE_DIR: ${{ env.EMULATOR_IMAGE_DIR }}
EMULATOR_RUN_DIR: ${{ env.EMULATOR_RUN_DIR }}
run: node packages/stack-cli/dist/index.js emulator status

- name: Stop emulator
if: always() && matrix.arch == 'amd64'
run: |
EMULATOR_ARCH=${{ matrix.arch }} \
docker/local-emulator/qemu/run-emulator.sh stop
env:
EMULATOR_ARCH: ${{ matrix.arch }}
EMULATOR_IMAGE_DIR: ${{ env.EMULATOR_IMAGE_DIR }}
EMULATOR_RUN_DIR: ${{ env.EMULATOR_RUN_DIR }}
run: node packages/stack-cli/dist/index.js emulator stop
Comment thread
BilalG1 marked this conversation as resolved.

- name: Package image
run: |
BASE_IMG="docker/local-emulator/qemu/images/stack-emulator-${{ matrix.arch }}.qcow2"
cp "$BASE_IMG" "stack-emulator-${{ matrix.arch }}.qcow2"
ls -lh "stack-emulator-${{ matrix.arch }}.qcow2"

- name: Upload image artifact
uses: actions/upload-artifact@v4
with:
name: qemu-emulator-${{ matrix.arch }}
path: stack-emulator-${{ matrix.arch }}.qcow2
if-no-files-found: error
retention-days: 30
compression-level: 0

Expand All @@ -134,31 +208,80 @@ jobs:
steps:
- uses: actions/checkout@v6

- name: Install QEMU dependencies
- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y qemu-system-x86 qemu-utils genisoimage socat
sudo apt-get install -y qemu-utils socat zstd \
ninja-build pkg-config python3-venv \
libglib2.0-dev libpixman-1-dev libslirp-dev libepoxy-dev libgbm-dev

- name: Restore QEMU 10.2.2 cache
id: qemu-cache
uses: actions/cache@v4
with:
path: /opt/qemu
key: qemu-10.2.2-${{ runner.os }}-${{ runner.arch }}-v1

- name: Build QEMU 10.2.2 from source
if: steps.qemu-cache.outputs.cache-hit != 'true'
run: |
set -euxo pipefail
curl -fsSL https://download.qemu.org/qemu-10.2.2.tar.xz -o /tmp/qemu.tar.xz
mkdir -p /tmp/qemu-src
tar -xf /tmp/qemu.tar.xz -C /tmp/qemu-src --strip-components=1
cd /tmp/qemu-src
./configure --prefix=/opt/qemu \
--target-list=x86_64-softmmu,aarch64-softmmu \
--enable-kvm --enable-slirp --enable-tcg \
--disable-docs --disable-gtk --disable-sdl --disable-vnc \
--disable-guest-agent --disable-tools
make -j"$(nproc)"
sudo make install

- name: Put QEMU 10.2.2 on PATH
run: |
echo "/opt/qemu/bin" >> "$GITHUB_PATH"
/opt/qemu/bin/qemu-system-x86_64 --version

- uses: pnpm/action-setup@v4
with:
version: 10.23.0

- uses: actions/setup-node@v4
with:
node-version: 22
cache: pnpm

- name: Install stack-cli deps + build
run: |
pnpm install --frozen-lockfile --filter '@stackframe/stack-cli...'
# Turbo's trailing `...` filter builds stack-cli AND its workspace
# deps (@stackframe/js, @stackframe/stack-shared, etc.) — stack-cli
# imports them at runtime from their dist/ outputs.
pnpm exec turbo run build --filter='@stackframe/stack-cli...'

- name: Download built image
uses: actions/download-artifact@v4
with:
name: qemu-emulator-${{ matrix.arch }}
path: docker/local-emulator/qemu/images/
path: ${{ github.workspace }}/.stack-emulator-images/

- name: Generate emulator env
run: node docker/local-emulator/generate-env-development.mjs
- name: Place qcow2 into STACK_EMULATOR_HOME layout
run: |
mkdir -p "$STACK_EMULATOR_HOME/images"
cp "${{ github.workspace }}/.stack-emulator-images/stack-emulator-${{ matrix.arch }}.qcow2" "$STACK_EMULATOR_HOME/images/"
ls -lh "$STACK_EMULATOR_HOME/images/"

- name: Start emulator from artifact
# No savevm.zst artifact (users capture locally via `emulator pull`),
# so `emulator start` cold-boots the qcow2. Budget accordingly.
- name: Start emulator via CLI
run: |
chmod +x docker/local-emulator/qemu/run-emulator.sh docker/local-emulator/qemu/common.sh
EMULATOR_ARCH=${{ matrix.arch }} \
EMULATOR_READY_TIMEOUT=600 \
docker/local-emulator/qemu/run-emulator.sh start
node packages/stack-cli/dist/index.js emulator start

- name: Verify services are healthy
run: |
EMULATOR_ARCH=${{ matrix.arch }} \
docker/local-emulator/qemu/run-emulator.sh status
run: node packages/stack-cli/dist/index.js emulator status

- name: Smoke test — backend health
run: curl -sf http://localhost:26701/health?db=1
Expand All @@ -174,13 +297,11 @@ jobs:

- name: Stop emulator
if: always()
run: |
EMULATOR_ARCH=${{ matrix.arch }} \
docker/local-emulator/qemu/run-emulator.sh stop
run: node packages/stack-cli/dist/index.js emulator stop

- name: Print serial log on failure
if: failure()
run: tail -100 docker/local-emulator/qemu/run/vm/serial.log 2>/dev/null || true
run: tail -100 "$STACK_EMULATOR_HOME/run/vm/serial.log" 2>/dev/null || true

publish:
name: Publish to GitHub Releases
Expand Down Expand Up @@ -220,8 +341,14 @@ jobs:
### Images
| File | Description |
|------|-------------|
| \`stack-emulator-arm64.qcow2\` | ARM64 emulator image |
| \`stack-emulator-amd64.qcow2\` | AMD64 emulator image |
| \`stack-emulator-arm64.qcow2\` | ARM64 disk image |
| \`stack-emulator-amd64.qcow2\` | AMD64 disk image |

\`emulator pull\` downloads the qcow2 and captures a local fast-start
snapshot (~1-3 min). Subsequent \`emulator start\`s resume in ~3-8 s.
Snapshots are captured locally because QEMU migration state isn't
portable across accelerators (KVM / HVF / TCG) or \`-cpu max\`
feature sets.

### Usage
\`\`\`bash
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -144,3 +144,6 @@ packages/stack/*
!packages/react/package.json
!packages/next/package.json
!packages/stack/package.json

# claude code
.claude/scheduled_tasks.lock
6 changes: 6 additions & 0 deletions apps/backend/src/lib/seed-dummy-data.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1485,6 +1485,12 @@ async function seedDummySessionActivityEvents(options: SessionActivityEventSeedO
await tx.event.createMany({
data: events,
});
}, {
// Under cross-arch arm64 TCG in the emulator qcow2 build, this batch
// takes ~10s; Prisma's default is 5s. Production (KVM/native) runs it
// in well under 1s, so the looser bound only kicks in when the DB is
// genuinely slow.
timeout: 30_000,
});

if (clickhouseClient && clickhouseRows.length > 0) {
Expand Down
21 changes: 18 additions & 3 deletions docker/local-emulator/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,22 @@ ENV NEXT_PUBLIC_STACK_STRIPE_PUBLISHABLE_KEY=pk_test_mock_publishable_key_for_lo
# Build the backend NextJS app
RUN pnpm turbo run docker-build --filter=@stackframe/backend... --filter=@stackframe/dashboard...

# Build the self-host seed script
RUN cd apps/backend && pnpm build-self-host-migration-script
# Build the self-host seed script.
# tsdown -> rolldown is multi-threaded Rust; under qemu-user (cross-arch
# arm64-on-amd64) its futex emulation occasionally deadlocks and the build
# hangs forever. Bound each attempt and retry to ride out the race.
RUN cd apps/backend && \
attempt=1; \
while :; do \
timeout --kill-after=30s 600s pnpm build-self-host-migration-script && break; \
rc=$?; \
if [ "$attempt" -ge 3 ]; then \
echo "build-self-host-migration-script failed after $attempt attempts (last rc=$rc)" >&2; \
exit "$rc"; \
fi; \
echo "build-self-host-migration-script attempt $attempt failed (rc=$rc); retrying..." >&2; \
attempt=$((attempt + 1)); \
done


# Prune node_modules for runtime: remove dev tools, heavy UI packages,
Expand Down Expand Up @@ -263,10 +277,11 @@ COPY docker/local-emulator/run-cron-jobs.sh /run-cron-jobs.sh
COPY docker/local-emulator/entrypoint.sh /entrypoint.sh
COPY docker/local-emulator/init-services.sh /init-services.sh
COPY docker/local-emulator/start-app.sh /start-app.sh
COPY docker/local-emulator/rotate-secrets.sh /usr/local/bin/rotate-secrets
COPY docker/local-emulator/clickhouse-config.xml /etc/clickhouse-server/config.xml
COPY docker/local-emulator/clickhouse-users.xml /etc/clickhouse-server/users.xml
COPY docker/server/entrypoint.sh /app-entrypoint.sh
RUN chmod +x /entrypoint.sh /init-services.sh /start-app.sh /app-entrypoint.sh /run-cron-jobs.sh
RUN chmod +x /entrypoint.sh /init-services.sh /start-app.sh /app-entrypoint.sh /run-cron-jobs.sh /usr/local/bin/rotate-secrets

# PostgreSQL: 5432, Redis: 6379, Inbucket: 2500/9001/1100,
# Svix: 8071, ClickHouse: 8123/9009, MinIO: 9090, QStash: 8080
Expand Down
8 changes: 7 additions & 1 deletion docker/local-emulator/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,12 @@ fi
# baked-in mock value from .env.development to be a usable credential against
# a running emulator. Overriding here propagates to both the backend and the
# run-cron-jobs.sh loop via supervisord's inherited environment.
export CRON_SECRET="$(openssl rand -hex 32)"
#
# In snapshot-build mode the VM supplies a deterministic placeholder via the
# --env-file so the baked snapshot doesn't contain a real secret; on resume,
# /usr/local/bin/rotate-secrets swaps in a fresh per-install value.
if [ -z "${CRON_SECRET:-}" ]; then
export CRON_SECRET="$(openssl rand -hex 32)"
fi
Comment thread
BilalG1 marked this conversation as resolved.

exec /usr/bin/supervisord -n -c /etc/supervisor/conf.d/supervisord.conf
Loading
Loading