Skip to content

local emulator build improvements#1330

Merged
BilalG1 merged 19 commits intodevfrom
emulator-arm64-ubicloud-jitless
Apr 14, 2026
Merged

local emulator build improvements#1330
BilalG1 merged 19 commits intodevfrom
emulator-arm64-ubicloud-jitless

Conversation

@BilalG1
Copy link
Copy Markdown
Collaborator

@BilalG1 BilalG1 commented Apr 13, 2026

… V8 --jitless

2.6 GB to 1.3 GB final image

Flip arm64 matrix back to ubicloud-standard-8 so both arches share one runner fleet. Cross-arch TCG on an amd64 host previously SIGTRAP'd in migrations because V8's JIT emitted arm64 instructions that QEMU's cross-arch translator mis-handled; pair the existing -cpu cortex-a72 fallback with NODE_OPTIONS=--jitless on the migration docker exec to force V8 to stay on the interpreter. Does not affect amd64 migrations (KVM, no TCG).

Summary by CodeRabbit

  • Chores

    • Optimized emulator images with binary stripping, compression, and preservation of standalone runtime dependencies.
    • Improved multi-architecture build matrix, added optional KVM detection/fallback, and gated certain emulator runtime steps for arm64.
    • Enhanced build scripts to generate and include env files and persist richer logs and artifacts.
  • New Features

    • Centralized provision entrypoint to streamline install → migrations → slimming sequence.
  • Tests

    • Added a fast QEMU serial boot test for architecture validation.

BilalG1 added 10 commits April 9, 2026 14:21
Provisioning used to silently wait out the full 6000s timeout on any
guest-side failure because the cleanup trap only logged the error. Now
it writes STACK_CLOUD_INIT_FAILED and shuts the VM down, and the host
waiter breaks on that marker and reports it distinctly.

Also bump smoke test timeout 120s->300s, dump docker ps / container
logs / free -m / verbose curl when it fails, log the qemu accel path,
and enable /dev/kvm on the CI runner so the VM isn't stuck in TCG.
The arm64 matrix entry cross-compiles on the amd64 CI runner, so the
guest runs under QEMU TCG. Under -cpu max, V8 emits armv8.5+ JIT code
that TCG mistranslates and node crashes with SIGTRAP (exit 133)
during migrations. Three changes together get it working:

- Drop to -cpu cortex-a72 for TCG arm64 guests. Limits V8 to
  armv8.0-a which TCG handles cleanly. Native paths (HVF/KVM) keep
  -cpu max for full performance.
- Run migrations with NODE_OPTIONS=--jitless as belt-and-suspenders.
  Migrations are I/O-bound so the perf hit is negligible.
- Skip the in-guest smoke test on arm64. A full Next.js backend under
  cross-arch TCG either SIGTRAPs or times out; the amd64 build still
  runs the smoke test, which covers every non-arch-specific code
  path. Arch is propagated into the guest via a new build-arch.env
  marker in the stack-bundle ISO.
The previous commit set NODE_OPTIONS=--jitless on the migration
docker exec. That was wrong for two reasons:
- --jitless disables eval and new Function, which some code in the
  migration path uses, so it broke amd64 builds that had been passing.
- --jitless is a V8 feature gate, not a TCG workaround. If it breaks
  one arch it breaks both — it could never have helped arm64 either.

Revert the --jitless flag and rely on -cpu cortex-a72 (added in the
parent commit) as the root-cause fix for the arm64 TCG SIGTRAP.

Keep the stdout/stderr capture for the migration exec so the next
failure dumps the actual node error through log-provision instead of
being swallowed by the serial-only stream.
Cross-arch TCG on ubicloud-standard-8 either SIGTRAPs during migrations
(old QEMU) or hangs in wait-for-deps with no progress. GitHub's
ubuntu-24.04-arm runner is an Azure arm64 VM — same-arch TCG, no KVM
(no nested virt exposed) — but empirically completes migrations, the
dep setup, and image packaging end-to-end (verified on the diagnostics
branch run). Only failure there was the backend smoke test hitting its
300s timeout, which the parent commit on this branch already skips on
arm64.

Keep amd64 on ubicloud-standard-8 for its KVM acceleration.
wait-for-deps used to loop forever on each service, so any single
dep that failed to start (e.g. a service crash-looping under TCG)
hung the build until the outer 6000s provision timeout.

Rewrite as a wait_for helper with:
- Hard 1500s budget across the full dep wait (overridable via
  STACK_DEPS_TIMEOUT). On timeout, dump docker ps -a, last 300 lines
  of the deps container, and per-service reachability, then exit 1
  so provision-build's cleanup trap fires and the VM shuts down fast.
- "<service> ready (Ns)" log lines on each service so successful
  runs show which service was the bottleneck.
- 30s heartbeat per service so long-running waits don't look frozen.

amd64 is unaffected — services come up in ~1s each under KVM, which
is well inside any threshold here.
Same-arch TCG (e.g. arm64 guest on the arm64 ubuntu-24.04-arm runner
that has no nested virt) was falling through to -cpu cortex-a72 too.
Empirically that hangs wait-for-deps indefinitely — services never
reach a ready state — probably because QEMU's TCG emulation of named
CPU models is less well-tested than -cpu max, especially for the LSE
atomic fallback paths the dep services exercise.

The cortex-a72 workaround is only needed for cross-arch TCG, where V8
emits JIT instructions the amd64 host's TCG mistranslates. Restrict
it to that case; same-arch TCG now gets -cpu max, matching the known
working config from the diagnostics branch run on ubuntu-24.04-arm.
… V8 --jitless

Flip arm64 matrix back to ubicloud-standard-8 so both arches share one
runner fleet. Cross-arch TCG on an amd64 host previously SIGTRAP'd in
migrations because V8's JIT emitted arm64 instructions that QEMU's
cross-arch translator mis-handled; pair the existing -cpu cortex-a72
fallback with NODE_OPTIONS=--jitless on the migration docker exec to
force V8 to stay on the interpreter. Does not affect amd64 migrations
(KVM, no TCG).
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bbffe64c-ac49-429c-8b80-2219d1001b96

📥 Commits

Reviewing files that changed from the base of the PR and between 9ec08f4 and 1104023.

📒 Files selected for processing (1)
  • docker/local-emulator/qemu/cloud-init/emulator/user-data
🚧 Files skipped from review as they are similar to previous changes (1)
  • docker/local-emulator/qemu/cloud-init/emulator/user-data

📝 Walkthrough

Walkthrough

Adds multi‑arch QEMU CI, KVM enablement, UPX compression for emulator service binaries, enhanced QEMU build/run provisioning and logging, refactors cloud-init into a centralized provision-build flow with robust dependency waiting, and adds a headless per‑arch QEMU boot test script.

Changes

Cohort / File(s) Summary
CI Workflow
.github/workflows/qemu-emulator-build.yaml
Matrix-driven runner selection for amd64/arm64, install qemu-kvm, add KVM enablement step, and gate emulator start/verify/stop to amd64.
Docker image build
docker/local-emulator/Dockerfile
New upx-compress stage: strip/UPX-compress selected service binaries; final image sources compressed/stripped artifacts; preserve standalone pre-migration node_modules.
QEMU build script
docker/local-emulator/qemu/build-image.sh
Per-arch -cpu selection for TCG, qcow2 options, virtfs host share, generate build env files, improved provisioning log handling and timeout/classification, persist logs and compress temp disk.
QEMU runtime script
docker/local-emulator/qemu/run-emulator.sh
Emit additional host port mappings into runtime.env (dashboard/backend/minio/inbucket host ports).
Cloud-init / provision scripts
docker/local-emulator/qemu/cloud-init/emulator/user-data
Replace inline runcmd with /usr/local/bin/provision-build; add/replace wait-for-deps, log-provision*, run-build-migrations, slim-docker-image, provision-build; hard dependency timeout, diagnostics, and writing of STACK_CLOUD_INIT_{DONE,FAILED} to host logs/console.
Test tooling
docker/local-emulator/qemu/test-serial.sh
New headless per-arch QEMU boot tester: builds seed ISO, runs QEMU, tails serial for STACK_CLOUD_INIT_DONE, prints incremental provision logs, and exits with diagnostics on timeout.

Sequence Diagram(s)

sequenceDiagram
  participant Host as Host CI / Builder
  participant Storage as Host files / qcow2
  participant QEMU as QEMU VM (guest)
  participant Guest as Guest init (/usr/local/bin/provision-build)
  participant Deps as Deps container (inside guest)

  Host->>Storage: prepare qcow2, seed ISO, build.env
  Host->>QEMU: launch VM (arch-specific CPU, virtfs mount)
  QEMU->>Guest: boot → cloud-init invokes provision-build
  Guest->>Deps: start deps container (run-build-migrations)
  Deps->>Guest: service probes report readiness
  Guest->>Guest: write provision.log and stream to serial
  Guest->>Host: write STACK_CLOUD_INIT_DONE/FAILED to hostfs/serial
  Host->>Storage: collect serial.log & provision.log (on success/failure)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • emu with a q stuff #1266: Modifies the same QEMU emulator workflow and related build/provisioning scripts; likely directly related to these emulator infra changes.

Suggested reviewers

  • N2D4

Poem

🐰 Hopping bytes and tiny logs I bring,
Two arches march while KVM takes wing,
Binaries tighten, boot tests drum,
Provision hums until it's done,
A rabbit cheers — the emulator's sprung! 🎉

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'local emulator build improvements' is vague and generic, lacking specificity about the main changes. Provide a more specific title that highlights the primary change, such as 'Reduce emulator image size and fix cross-arch TCG migrations with JIT disabling' or similar.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The description provides concrete details about the changes and objectives, but does not follow the repository's description template structure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch emulator-arm64-ubicloud-jitless

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
stack-auth-hosted-components Ready Ready Preview, Comment Apr 14, 2026 1:27am
stack-backend Ready Ready Preview, Comment Apr 14, 2026 1:27am
stack-dashboard Ready Ready Preview, Comment Apr 14, 2026 1:27am
stack-demo Ready Ready Preview, Comment Apr 14, 2026 1:27am
stack-docs Ready Ready Preview, Comment Apr 14, 2026 1:27am
stack-preview-backend Ready Ready Preview, Comment Apr 14, 2026 1:27am
stack-preview-dashboard Ready Ready Preview, Comment Apr 14, 2026 1:27am

Plain --jitless disables V8's Wasm side-effectfully, which breaks
Prisma 7's wasm query compiler on import (ReferenceError: WebAssembly
is not defined in decodeBase64AsWasm). --no-opt only disables the
TurboFan optimizer — the tier responsible for the aggressive arm64
instructions (PAC/BTI/LSE) that cross-arch TCG mistranslates — while
leaving Sparkplug baseline and the Wasm JIT intact, so Prisma's wasm
compiler runs at full speed.
Node's NODE_OPTIONS allowlist rejects --no-opt (unlike --jitless, which
it happens to permit). Put the flag directly on the node command line
inside the docker exec so V8 actually picks it up.
The clickhouse binary since 22.x is a small ELF loader with a
ZSTD-compressed payload appended after the section table. strip
rewrites the ELF and can invalidate the loader's lookup of its own
trailing payload, causing it to decompress garbage and spin forever —
the exact symptom on cross-arch TCG runs where clickhouse-server
produced zero log output while postgres/redis/svix/minio/qstash
(none of them self-extracting) all started fine under identical
settings. Stripping was a no-op for size anyway; the payload bytes
live outside any section and strip can't touch them.
With strip no longer corrupting the ClickHouse self-extractor,
clickhouse-server now reaches first-instruction execution and
immediately SIGILLs in a supervisord crash loop. Root cause: its
statically-linked LSE atomics (armv8.1) are rejected under
-cpu cortex-a72 (armv8.0).

cortex-a76 is armv8.2-a: LSE available, but no PAC (v8.3) and no BTI
(v8.5), so V8's aggressive JIT tiers still don't see the feature flags
that tripped cross-arch TCG's translator on the old -cpu max runs.
Combined with `node --no-opt` on migrations (Ignition+Sparkplug only,
no TurboFan/Maglev), this is the narrow CPU profile that should let
both V8 and ClickHouse coexist under cross-arch TCG.
Two independent reasons this can't work under cross-arch TCG on the
ubicloud amd64 runner:

  1. The backend at runtime runs without --no-opt (we only apply the
     flag to the one-shot migration exec). That means TurboFan is live
     and will re-emit the aggressive arm64 JIT code the original
     -cpu max runs SIGTRAP'd on. Baking --no-opt into the runtime
     entrypoint would ship in the image and permanently degrade real
     arm64 users (who have KVM and don't need it).
  2. Even if we fixed (1), next start under cross-arch TCG is too slow
     to come up within any reasonable timeout.

amd64 verify under KVM already exercises the image's service stack;
the arm64 artifact is built from the same Dockerfile and trusted to
run on real arm64 hardware where KVM is available.
After fixing the JS-side SIGTRAP with --no-opt and getting ClickHouse
happy on cortex-a76, migrations finally ran on cross-arch TCG and
immediately hit a V8 internal assertion in Runtime_WasmTriggerTierUp:

    Check failed: it->second.Size() > offset.
    Heap::GcSafeFindCodeForInnerPointer
    InnerPointerToCodeCache::GetCacheEntry
    StackFrameIterator::Advance
    Runtime_WasmTriggerTierUp

Same class of bug as the JS SIGTRAP, just in the wasm pipeline: once
Prisma 7's wasm query compiler gets hot, V8 walks the stack to promote
from Liftoff to TurboFan, and cross-arch TCG's translated inner
pointers don't line up with V8's code-cache entries.

--no-opt only affects the JS tiers; wasm has its own. Pin wasm to
Liftoff with --no-wasm-tier-up. Liftoff is still JITed (unlike
--wasm-jitless, which would force the interpreter and tank migration
time), so Prisma speed stays reasonable.
@BilalG1 BilalG1 changed the base branch from emulator-arm64-ubuntu-runner to dev April 13, 2026 23:18
@BilalG1 BilalG1 changed the title emulator: move arm64 back to ubicloud cross-arch, run migrations with… local emulator build improvements Apr 13, 2026
@BilalG1 BilalG1 marked this pull request as ready for review April 13, 2026 23:20
@BilalG1 BilalG1 requested a review from N2D4 April 13, 2026 23:20
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 13, 2026

Greptile Summary

This PR improves the local emulator build by: (1) consolidating both amd64 and arm64 CI jobs onto a single ubicloud-standard-8 (amd64) runner fleet, working around cross-arch TCG + V8 JIT SIGTRAPs via cortex-a76 CPU model and --no-opt --no-wasm-tier-up node flags; (2) cutting the final Docker image ~50% (2.6 GB → 1.3 GB) via UPX compression of service binaries, pre-baked migrations, and docker export/import flattening; (3) adding richer build-time observability through host-shared provisioning logs and structured error detection.

Confidence Score: 5/5

Safe to merge — all findings are P2 style/robustness suggestions that do not affect correctness or the primary build path.

The cross-arch TCG workarounds (cortex-a76, --no-opt, --no-wasm-tier-up) are well-reasoned and thoroughly commented. The CI workflow correctly gates smoke tests to amd64. The Dockerfile UPX stage carefully skips ClickHouse. Remaining issues are limited to the test-serial.sh utility (macOS-only accelerator, duplicate log printing) and a missing container cleanup in slim-docker-image — none of these block the production build path.

docker/local-emulator/qemu/test-serial.sh (hvf hardcoded, Linux incompatible) and docker/local-emulator/qemu/cloud-init/emulator/user-data (flatten container not cleaned on failure)

Important Files Changed

Filename Overview
.github/workflows/qemu-emulator-build.yaml Consolidates both arch jobs onto one runner fleet; adds KVM setup step; gates smoke-test and stop steps to amd64-only — logic is sound and well-commented.
docker/local-emulator/Dockerfile Adds upx-compress stage to strip/compress service binaries; correctly skips ClickHouse (self-extracting binary); preserves standalone node_modules before migration-pruner overwrites them.
docker/local-emulator/qemu/build-image.sh Switches cross-arch arm64 to cortex-a76 CPU model; adds virtfs host-sharing for provision log; adds STACK_CLOUD_INIT_FAILED detection and guest-exited detection; improves qcow2 conversion.
docker/local-emulator/qemu/cloud-init/emulator/user-data Major refactor: pre-bakes migrations inside the VM, adds slim/flatten pipeline, new log-provision/run-build-migrations/slim-docker-image scripts; flatten container not cleaned on failure is a minor P2.
docker/local-emulator/qemu/run-emulator.sh Adds four host-port variables to runtime config ISO so the guest can construct correct browser-accessible URLs.
docker/local-emulator/qemu/test-serial.sh New developer utility for quick serial-log validation; hardcodes hvf (macOS-only) for both archs and re-prints all log lines on every iteration.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[CI: ubicloud-standard-8 runner] --> B{matrix.arch}
    B -->|amd64| C[KVM acceleration]
    B -->|arm64| D[cross-arch TCG\ncortex-a76 CPU]
    C --> E[build-image.sh]
    D --> E
    E --> F[Boot QEMU VM with cloud-init]
    F --> G[install-emulator-containers]
    G --> H[run-build-migrations\nnode --no-opt --no-wasm-tier-up]
    H --> I[slim-docker-image]
    I --> J{BUILD_ARCH == arm64?}
    J -->|yes| K[Skip smoke test]
    J -->|no| L[Smoke test curl /health?db=1]
    K --> M[docker export/import flatten]
    L --> M
    M --> N[Nuke + rebuild Docker storage]
    N --> O[Zero free space + fstrim]
    O --> P[shutdown -P now]
    P --> Q[Host: qemu-img convert qcow2]
    Q --> R{matrix.arch == amd64?}
    R -->|yes| S[Start + Verify + Stop emulator]
    R -->|no| T[Skip runtime verify]
Loading

Comments Outside Diff (2)

  1. docker/local-emulator/qemu/test-serial.sh, line 930-939 (link)

    P2 hvf hardcoded — script fails on Linux

    Both arm64 and amd64 cases use hvf (macOS Hypervisor.framework), which is unavailable on Linux. Any Linux developer running this test gets a fatal QEMU error immediately. The production helpers (build-image.sh, run-emulator.sh) both check HOST_OS and fall back to kvm/tcg; this script skips that logic.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: docker/local-emulator/qemu/test-serial.sh
    Line: 930-939
    
    Comment:
    **`hvf` hardcoded — script fails on Linux**
    
    Both `arm64` and `amd64` cases use `hvf` (macOS Hypervisor.framework), which is unavailable on Linux. Any Linux developer running this test gets a fatal QEMU error immediately. The production helpers (`build-image.sh`, `run-emulator.sh`) both check `HOST_OS` and fall back to `kvm`/`tcg`; this script skips that logic.
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. docker/local-emulator/qemu/test-serial.sh, line 969-975 (link)

    P2 Duplicate log lines printed every iteration

    Every 2-second loop tick re-greps the whole serial log and reprints every STACK_PROVISION line that has ever appeared. After a few seconds of output you'll see the same lines many times over. build-image.sh solves this with last_build_lines / sed -n "..." to print only new lines; the same approach should be used here.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: docker/local-emulator/qemu/test-serial.sh
    Line: 969-975
    
    Comment:
    **Duplicate log lines printed every iteration**
    
    Every 2-second loop tick re-greps the whole serial log and reprints every `STACK_PROVISION` line that has ever appeared. After a few seconds of output you'll see the same lines many times over. `build-image.sh` solves this with `last_build_lines` / `sed -n "..."` to print only new lines; the same approach should be used here.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: docker/local-emulator/qemu/test-serial.sh
Line: 930-939

Comment:
**`hvf` hardcoded — script fails on Linux**

Both `arm64` and `amd64` cases use `hvf` (macOS Hypervisor.framework), which is unavailable on Linux. Any Linux developer running this test gets a fatal QEMU error immediately. The production helpers (`build-image.sh`, `run-emulator.sh`) both check `HOST_OS` and fall back to `kvm`/`tcg`; this script skips that logic.

```suggestion
case "$ARCH" in
  arm64)
    firmware="$(find_aarch64_firmware)"
    if [ "$HOST_OS" = "darwin" ]; then
      qemu_base="qemu-system-aarch64 -machine virt -accel hvf -cpu max -bios $firmware"
    elif [ -w /dev/kvm ]; then
      qemu_base="qemu-system-aarch64 -machine virt -accel kvm -cpu max -bios $firmware"
    else
      qemu_base="qemu-system-aarch64 -machine virt -accel tcg -cpu cortex-a76 -bios $firmware"
    fi
    ;;
  amd64)
    if [ "$HOST_OS" = "darwin" ]; then
      qemu_base="qemu-system-x86_64 -machine q35 -accel hvf -cpu max"
    elif [ -w /dev/kvm ]; then
      qemu_base="qemu-system-x86_64 -machine q35 -accel kvm -cpu max"
    else
      qemu_base="qemu-system-x86_64 -machine q35 -accel tcg -cpu max"
    fi
    ;;
esac
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: docker/local-emulator/qemu/test-serial.sh
Line: 969-975

Comment:
**Duplicate log lines printed every iteration**

Every 2-second loop tick re-greps the whole serial log and reprints every `STACK_PROVISION` line that has ever appeared. After a few seconds of output you'll see the same lines many times over. `build-image.sh` solves this with `last_build_lines` / `sed -n "..."` to print only new lines; the same approach should be used here.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: docker/local-emulator/qemu/cloud-init/emulator/user-data
Line: 409-420

Comment:
**`flatten` container not removed on failure**

`docker create --name flatten` is called but `docker rm flatten` only runs on the happy path. If the pipe or anything between those two lines fails (and `set -euo pipefail` will make it exit), the named container is left behind. A subsequent re-run of `slim-docker-image` would immediately fail with `"Conflict. The container name \"/flatten\" is already in use"`.

Add a guard before the `docker create`:

```bash
# Before docker create:
docker rm -f flatten 2>/dev/null || true
docker create --name flatten stack-local-emulator-slim /bin/true
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "emulator: add --no-wasm-tier-up to migra..." | Re-trigger Greptile

Comment thread docker/local-emulator/qemu/cloud-init/emulator/user-data
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
docker/local-emulator/qemu/build-image.sh (1)

277-280: Verify BUILD_ENV_FILE is set before use.

BUILD_ENV_FILE is used on line 277 (cp "$BUILD_ENV_FILE" ...) but is only set later on line 393. Since build_one is called from the loop starting at line 395, which comes after line 393, this should work correctly in practice.

However, if someone were to call build_one directly or reorder the script, this would fail. Consider moving the env generation to the top of the script or passing BUILD_ENV_FILE as a parameter to build_one.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/local-emulator/qemu/build-image.sh` around lines 277 - 280, The script
copies BUILD_ENV_FILE inside build_one but BUILD_ENV_FILE is only set later, so
ensure build_one always has a valid BUILD_ENV_FILE: either (a) move the
generation of BUILD_ENV_FILE to run before build_one is ever called, or (b)
change build_one to accept BUILD_ENV_FILE as a parameter and validate it at the
top of build_one (error out if unset or non-readable) before the cp
"$BUILD_ENV_FILE" "$bundle_dir/build.env" line; reference the build_one function
and the BUILD_ENV_FILE variable when making the change so callers are updated
accordingly.
docker/local-emulator/qemu/test-serial.sh (1)

104-109: Polling loop prints duplicate STACK_PROVISION lines on every iteration.

The current logic greps all STACK_PROVISION lines every 2 seconds, so the same lines are printed repeatedly until success or timeout. This clutters the output.

🔧 Track last printed line count to show only new lines
 elapsed=0
 timeout=120
+last_provision_lines=0
 while [ "$elapsed" -lt "$timeout" ]; do
   if grep -q "STACK_CLOUD_INIT_DONE" "$TMP_DIR/serial.log" 2>/dev/null; then
     echo ""
     echo "=== SUCCESS: STACK_CLOUD_INIT_DONE received ==="
     echo ""
     echo "=== All STACK_PROVISION lines ==="
     grep "STACK_PROVISION" "$TMP_DIR/serial.log" || echo "(none found)"
     exit 0
   fi

   # Show any STACK_PROVISION lines as they appear
-  if grep -q "STACK_PROVISION" "$TMP_DIR/serial.log" 2>/dev/null; then
-    grep "STACK_PROVISION" "$TMP_DIR/serial.log" | while IFS= read -r line; do
-      echo "  [${elapsed}s] $line"
-    done
-  fi
+  current_lines=$(grep -c "STACK_PROVISION" "$TMP_DIR/serial.log" 2>/dev/null || echo 0)
+  if [ "$current_lines" -gt "$last_provision_lines" ]; then
+    grep "STACK_PROVISION" "$TMP_DIR/serial.log" | tail -n +$((last_provision_lines + 1)) | while IFS= read -r line; do
+      echo "  [${elapsed}s] $line"
+    done
+    last_provision_lines=$current_lines
+  fi

   sleep 2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/local-emulator/qemu/test-serial.sh` around lines 104 - 109, The
polling loop prints duplicate STACK_PROVISION lines because each iteration greps
the entire "$TMP_DIR/serial.log"; change it to print only new lines by tracking
the last printed line count (e.g., last_provision_lines or last_index) and using
tail (or grep with tail -n +<last+1>) against "$TMP_DIR/serial.log" instead of
re-grepping the whole file; update the counter after printing (use the number of
lines currently matching STACK_PROVISION via grep -c or wc -l) so subsequent
iterations only echo new STACK_PROVISION entries and avoid duplicates while
still showing elapsed with each new line.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker/local-emulator/qemu/test-serial.sh`:
- Around line 65-74: The qemu invocation hardcodes accel="hvf" for both ARCH
branches causing failures on non-macOS hosts; update the logic around ARCH,
accel, and qemu_base to detect the host accelerator the same way as
build-image.sh or run-emulator.sh (reuse their detection function/logic) and
fall back to a portable option (e.g., tcg) when hvf is unavailable; ensure
find_aarch64_firmware is still used for arm64 but only set accel when detection
returns a supported backend and include that accel variable in the qemu_base
construction.

---

Nitpick comments:
In `@docker/local-emulator/qemu/build-image.sh`:
- Around line 277-280: The script copies BUILD_ENV_FILE inside build_one but
BUILD_ENV_FILE is only set later, so ensure build_one always has a valid
BUILD_ENV_FILE: either (a) move the generation of BUILD_ENV_FILE to run before
build_one is ever called, or (b) change build_one to accept BUILD_ENV_FILE as a
parameter and validate it at the top of build_one (error out if unset or
non-readable) before the cp "$BUILD_ENV_FILE" "$bundle_dir/build.env" line;
reference the build_one function and the BUILD_ENV_FILE variable when making the
change so callers are updated accordingly.

In `@docker/local-emulator/qemu/test-serial.sh`:
- Around line 104-109: The polling loop prints duplicate STACK_PROVISION lines
because each iteration greps the entire "$TMP_DIR/serial.log"; change it to
print only new lines by tracking the last printed line count (e.g.,
last_provision_lines or last_index) and using tail (or grep with tail -n
+<last+1>) against "$TMP_DIR/serial.log" instead of re-grepping the whole file;
update the counter after printing (use the number of lines currently matching
STACK_PROVISION via grep -c or wc -l) so subsequent iterations only echo new
STACK_PROVISION entries and avoid duplicates while still showing elapsed with
each new line.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2512a391-b48e-43c0-a6ba-4b2539dfaff4

📥 Commits

Reviewing files that changed from the base of the PR and between 389199b and 44e4079.

📒 Files selected for processing (6)
  • .github/workflows/qemu-emulator-build.yaml
  • docker/local-emulator/Dockerfile
  • docker/local-emulator/qemu/build-image.sh
  • docker/local-emulator/qemu/cloud-init/emulator/user-data
  • docker/local-emulator/qemu/run-emulator.sh
  • docker/local-emulator/qemu/test-serial.sh

Comment thread docker/local-emulator/qemu/test-serial.sh
@BilalG1 BilalG1 assigned N2D4 and unassigned BilalG1 Apr 13, 2026
…pers

- wait-for-deps: table-drive wait probes via a SERVICES array (name|probe pairs)
- add /usr/local/bin/log-provision-stream for the repeated `cmd | while read line; do log prefix: $line; done` pattern
- factor write_marker_to_consoles in provision-build
- drop dead case sanitizer in build-image.sh:line_count
- tighten narrating comments in workflow and run-build-migrations
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
.github/workflows/qemu-emulator-build.yaml (1)

83-103: Arm64 is no longer boot-validated in this workflow.

With the build-job runtime checks gated to amd64 and the test job only covering amd64, the packaged arm64 qcow2 never gets a post-build boot check. That makes first-boot regressions in the final image releasable. Please add a lightweight arm64 boot/readiness check instead of dropping validation entirely.

Also applies to: 121-130

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/qemu-emulator-build.yaml around lines 83 - 103, The
workflow currently only starts and verifies the emulator when matrix.arch ==
'amd64', leaving arm64 qcow2 unvalidated; update the steps "Start emulator and
verify", "Verify services are healthy", and "Stop emulator" to also run for
arm64 but in a lightweight smoke-check mode: change the if conditions to include
matrix.arch == 'arm64' as well, and when matrix.arch == 'arm64' set a flag/env
var (for example EMULATOR_SMOKE_CHECK=true or EMULATOR_MODE=smoke) and reduced
checks/timeouts (e.g. EMULATOR_READY_TIMEOUT lower and fewer service probes)
before invoking docker/local-emulator/qemu/run-emulator.sh with
EMULATOR_ARCH=${{ matrix.arch }} so the arm64 image receives a minimal
boot/readiness validation without full amd64-length testing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker/local-emulator/qemu/cloud-init/emulator/user-data`:
- Around line 269-279: The docker run invocation for the deps container uses
--rm which removes the container immediately on exit, preventing later
inspection; remove the --rm flag from the docker run that creates the container
named stack-build-init and instead add explicit cleanup logic that removes (and
force-removes on failure if needed) the stack-build-init container at the end of
the surrounding script or in an EXIT trap; update the start sequence (the docker
run with name stack-build-init and env STACK_DEPS_ONLY) and add a corresponding
docker rm (or docker rm -f) cleanup step so logs and docker ps -a remain
available for diagnostics until the script explicitly cleans up.
- Around line 301-315: The migration docker exec invocation uses node flags
"--no-opt --no-wasm-tier-up" which still allows baseline JIT; update the sh -c
command that runs "node --no-opt --no-wasm-tier-up dist/db-migrations.mjs
migrate" and the subsequent seed invocation to use "--jitless --no-wasm-tier-up"
instead (replace both "--no-opt" occurrences) so db-migrations.mjs runs in
interpreter-only mode; keep the surrounding docker exec, env-file usage, and
redirection to migrate_log unchanged.

---

Nitpick comments:
In @.github/workflows/qemu-emulator-build.yaml:
- Around line 83-103: The workflow currently only starts and verifies the
emulator when matrix.arch == 'amd64', leaving arm64 qcow2 unvalidated; update
the steps "Start emulator and verify", "Verify services are healthy", and "Stop
emulator" to also run for arm64 but in a lightweight smoke-check mode: change
the if conditions to include matrix.arch == 'arm64' as well, and when
matrix.arch == 'arm64' set a flag/env var (for example EMULATOR_SMOKE_CHECK=true
or EMULATOR_MODE=smoke) and reduced checks/timeouts (e.g. EMULATOR_READY_TIMEOUT
lower and fewer service probes) before invoking
docker/local-emulator/qemu/run-emulator.sh with EMULATOR_ARCH=${{ matrix.arch }}
so the arm64 image receives a minimal boot/readiness validation without full
amd64-length testing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ca391b1b-bd15-4420-b9f6-00222b89cc62

📥 Commits

Reviewing files that changed from the base of the PR and between 44e4079 and 9ec08f4.

📒 Files selected for processing (3)
  • .github/workflows/qemu-emulator-build.yaml
  • docker/local-emulator/qemu/build-image.sh
  • docker/local-emulator/qemu/cloud-init/emulator/user-data
🚧 Files skipped from review as they are similar to previous changes (1)
  • docker/local-emulator/qemu/build-image.sh

Comment thread docker/local-emulator/qemu/cloud-init/emulator/user-data
Comment thread docker/local-emulator/qemu/cloud-init/emulator/user-data
Copy link
Copy Markdown
Contributor

@N2D4 N2D4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HYPE

@github-actions github-actions Bot assigned BilalG1 and unassigned N2D4 Apr 14, 2026
Comment thread docker/local-emulator/qemu/cloud-init/emulator/user-data
…ation exec

Migrations under cross-arch TCG were flaky with
Runtime_WasmTriggerTierUp -> StackFrameIterator check failures
(SIGTRAP/exit 133). --no-wasm-tier-up alone didn't suppress the
dynamic tier-up decision runtime call, and Wasm code-GC during a
stack walk can leave the inner-pointer-to-code cache stale.
@BilalG1 BilalG1 merged commit 5399142 into dev Apr 14, 2026
36 checks passed
@BilalG1 BilalG1 deleted the emulator-arm64-ubicloud-jitless branch April 14, 2026 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants