Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughAdds UPX compression and stripping of emulator service binaries; enhances QEMU build tooling with provision log tracking, incremental output, virtfs-mounted provisioning, and direct image conversion; restructures cloud-init to run build migrations, slim the emulator image, and emit completion markers. Changes
Sequence Diagram(s)sequenceDiagram
participant Guest as QEMU Guest
participant CI as cloud-init (guest scripts)
participant Build as Build Scripts (/usr/local/bin)
participant DockerHost as Docker Host
CI->>Guest: mount runtime-config.iso (with STACK_EMULATOR_*_HOST_PORT)
CI->>Guest: exec /usr/local/bin/provision-build
provision-build->>Build: log-provision -> run-build-migrations
Build->>DockerHost: start deps-only container (uses /etc/stack-build.env + computed env)
DockerHost->>DockerHost: wait for init/deps markers
DockerHost->>DockerHost: run backend migrations & seed
DockerHost-->>Build: migrations complete
Build->>DockerHost: slim-docker-image (swap node_modules, smoke test, flatten/export)
DockerHost-->>Build: return slim image
Build->>Guest: write STACK_CLOUD_INIT_DONE to serial/console
Guest-->>CI: serial shows STACK_CLOUD_INIT_DONE (provision complete)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
docker/local-emulator/qemu/cloud-init/emulator/user-data (1)
291-311: Aggressive Docker storage nuke is intentional but worth documenting.This approach (stop Docker → delete
/var/lib/docker→ restart → restore) is the most reliable way to reclaim space sincedocker image prunecan't free layers referenced by build cache. It's appropriate for a build-time operation where we control the full Docker state.Consider adding a brief comment explaining why a simple prune isn't sufficient, to prevent future maintainers from "simplifying" this.
📝 Optional: Add explanatory comment
# Save the final image and volume data, nuke ALL Docker storage # (images, build cache, overlay2 layers), then reload. This is the # only reliable way to reclaim space — the build cache holds refs # to old layers, preventing docker image prune from freeing them. + # A simple `docker system prune` is insufficient because build cache + # references keep old layers allocated even after intermediate images + # are removed. docker rm flatten🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/local-emulator/qemu/cloud-init/emulator/user-data` around lines 291 - 311, Add a short explanatory comment above the Docker storage reset sequence in the cloud-init user-data (the block that runs docker rm flatten, docker save stack-local-emulator:final, cp -a /var/lib/docker/volumes, systemctl stop docker containerd, rm -rf /var/lib/docker /var/lib/containerd, and subsequent restore) stating that a full removal of /var/lib/docker is required because build cache and image references prevent docker image prune / docker image prune -a from freeing layered storage, and that this is intentional and safe here because this is a controlled build-time operation; keep the comment concise and reference that the script restores images/volumes afterward to reassure maintainers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docker/local-emulator/qemu/cloud-init/emulator/user-data`:
- Around line 182-190: The wait loop for the sentinel file
(/var/run/stack-local-init-services.done) can time out silently; after the while
loop in user-data that runs `docker exec stack-build-init test -f
/var/run/stack-local-init-services.done` add an explicit check: if the sentinel
still doesn't exist, emit an error (e.g., echo with context and optionally dump
`docker logs stack-build-init`) and exit non-zero (exit 1) so the build stops
instead of proceeding to migrations; reference the timeout/elapsed variables and
the same docker exec test to perform the check.
---
Nitpick comments:
In `@docker/local-emulator/qemu/cloud-init/emulator/user-data`:
- Around line 291-311: Add a short explanatory comment above the Docker storage
reset sequence in the cloud-init user-data (the block that runs docker rm
flatten, docker save stack-local-emulator:final, cp -a /var/lib/docker/volumes,
systemctl stop docker containerd, rm -rf /var/lib/docker /var/lib/containerd,
and subsequent restore) stating that a full removal of /var/lib/docker is
required because build cache and image references prevent docker image prune /
docker image prune -a from freeing layered storage, and that this is intentional
and safe here because this is a controlled build-time operation; keep the
comment concise and reference that the script restores images/volumes afterward
to reassure maintainers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cb920ace-540a-4aed-a942-841cb08cc523
📒 Files selected for processing (4)
docker/local-emulator/Dockerfiledocker/local-emulator/qemu/build-image.shdocker/local-emulator/qemu/cloud-init/emulator/user-datadocker/local-emulator/qemu/run-emulator.sh
Greptile SummaryThis PR optimizes the local emulator image size through three main strategies: (1) adding UPX compression for Confidence Score: 5/5Safe to merge — all findings are P2 suggestions; build-time gates (UPX exit code) and the smoke test provide reasonable safeguards. All remaining comments are P2: one is a speculative runtime risk for UPX-compressed binaries (caught at build time if upx fails, and by the smoke test for the backend), one is a harmless glob-quoting style issue, and one is a silent-failure risk that would also be caught by the smoke test. No P0/P1 defects found. docker/local-emulator/Dockerfile (upx-compress stage) and docker/local-emulator/qemu/cloud-init/emulator/user-data (slim-docker-image smoke test coverage)
|
| Filename | Overview |
|---|---|
| docker/local-emulator/Dockerfile | Adds strip-clickhouse and upx-compress intermediate stages; saves standalone node_modules before overwriting with migration-pruner. UPX compression applied to four binaries without service-level smoke test coverage. |
| docker/local-emulator/qemu/cloud-init/emulator/user-data | Adds run-build-migrations and slim-docker-image scripts; performs full Docker storage wipe and reload cycle to reclaim space; includes smoke test for backend health but not for UPX-compressed services (MinIO, Svix, QStash). |
| docker/local-emulator/qemu/build-image.sh | Adds discard=on,detect-zeroes=unmap to QEMU drive (TRIM support), bundles build.env into the ISO, and generates the env file at script level before the build loop. Clean improvement. |
| docker/local-emulator/qemu/run-emulator.sh | Adds four host-side port variables (dashboard, backend, MinIO, inbucket) to the runtime config ISO so the VM can construct accurate browser-facing URLs. Straightforward and correct. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[build-image.sh] --> B[Docker buildx: stack-local-emulator]
B --> B1[upx-compress stage\nminio / mc / svix-server / qstash]
B --> B2[strip-clickhouse stage\nclickhouse binary]
B --> B3[node_modules.standalone\nsaved before migration-pruner overwrite]
B1 --> C[Docker image bundled into ISO]
B2 --> C
B3 --> C
A --> D[QEMU VM boots with cloud-init]
D --> E[install-emulator-containers\nloads Docker image + build.env]
E --> F[run-build-migrations\nstart deps-only container\nrun migrations & seed]
F --> G[slim-docker-image\nswap node_modules to standalone-traced\nbuild stack-local-emulator-slim]
G --> H[Smoke test: backend /health?db=1]
H -->|pass| I[Flatten to single layer\ndocker export + import]
H -->|fail| Z[Build fails]
I --> J[Wipe Docker storage\nsave + reload final image\nrestore volumes]
J --> K[Zero free space + fstrim]
K --> L[STACK_CLOUD_INIT_DONE emitted]
L --> M[qemu-img convert -c\ncompressed qcow2 final image]
Prompt To Fix All With AI
This is a comment left during a code review.
Path: docker/local-emulator/Dockerfile
Line: 121
Comment:
**UPX on Go binaries — smoke test doesn't cover these services**
`upx -9` is applied to `minio`, `mc`, `qstash` (Go binaries) and `svix-server` (Rust). UPX compression can succeed at build time yet produce a binary that crashes at runtime, particularly for newer statically-linked Go binaries on ARM64. The smoke test that follows only checks `http://127.0.0.1:8102/health?db=1` (backend + Postgres), so a broken MinIO, Svix, or QStash binary would pass the build gate and only surface when users try to upload files or fire webhooks.
Consider extending the smoke test to include at least one health probe per compressed binary before the flatten/wipe step:
```bash
# After smoke_passed check, verify UPX-compressed services are up
curl -sf http://127.0.0.1:9090/minio/health/live || { echo "MinIO health failed"; exit 1; }
curl -sf http://127.0.0.1:8071/api/v1/health/ || { echo "Svix health failed"; exit 1; }
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: docker/local-emulator/qemu/cloud-init/emulator/user-data
Line: 264
Comment:
**Unquoted URL with glob-special character `?`**
The URL `http://127.0.0.1:8102/health?db=1` is passed unquoted to `curl`. Under `set -euo pipefail`, bash will attempt glob expansion on the `?` character (which matches any single character). This works in practice only because no file matching `http:/127.0.0.1:8102/health.db=1` exists, but it is fragile. Quote the URL for correctness:
```suggestion
code=$(curl -s -o /dev/null -w "%{http_code}" --max-time 3 "http://127.0.0.1:8102/health?db=1" 2>/dev/null || true)
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: docker/local-emulator/Dockerfile
Line: 188
Comment:
**Silent `cp` failure creates an empty `node_modules.standalone`**
The `2>/dev/null || mkdir -p /app/node_modules.standalone` fallback means that if the `cp -a` silently fails (e.g. disk full, source missing), the build continues with an empty `node_modules.standalone`. When `slim-docker-image` later swaps it in, the container would have no node_modules and fail at runtime — likely caught only by the smoke test. Consider removing the suppression and fallback to surface the failure fast:
```suggestion
RUN cp -a /app/node_modules /app/node_modules.standalone
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "local emulator image opt" | Re-trigger Greptile
| COPY --from=minio-bin /usr/bin/minio /out/minio | ||
| COPY --from=mc-bin /usr/bin/mc /out/mc | ||
| COPY --from=qstash-bin /qstash-binary /out/qstash | ||
| RUN upx -9 /out/minio /out/svix-server /out/mc /out/qstash |
There was a problem hiding this comment.
UPX on Go binaries — smoke test doesn't cover these services
upx -9 is applied to minio, mc, qstash (Go binaries) and svix-server (Rust). UPX compression can succeed at build time yet produce a binary that crashes at runtime, particularly for newer statically-linked Go binaries on ARM64. The smoke test that follows only checks http://127.0.0.1:8102/health?db=1 (backend + Postgres), so a broken MinIO, Svix, or QStash binary would pass the build gate and only surface when users try to upload files or fire webhooks.
Consider extending the smoke test to include at least one health probe per compressed binary before the flatten/wipe step:
# After smoke_passed check, verify UPX-compressed services are up
curl -sf http://127.0.0.1:9090/minio/health/live || { echo "MinIO health failed"; exit 1; }
curl -sf http://127.0.0.1:8071/api/v1/health/ || { echo "Svix health failed"; exit 1; }Prompt To Fix With AI
This is a comment left during a code review.
Path: docker/local-emulator/Dockerfile
Line: 121
Comment:
**UPX on Go binaries — smoke test doesn't cover these services**
`upx -9` is applied to `minio`, `mc`, `qstash` (Go binaries) and `svix-server` (Rust). UPX compression can succeed at build time yet produce a binary that crashes at runtime, particularly for newer statically-linked Go binaries on ARM64. The smoke test that follows only checks `http://127.0.0.1:8102/health?db=1` (backend + Postgres), so a broken MinIO, Svix, or QStash binary would pass the build gate and only surface when users try to upload files or fire webhooks.
Consider extending the smoke test to include at least one health probe per compressed binary before the flatten/wipe step:
```bash
# After smoke_passed check, verify UPX-compressed services are up
curl -sf http://127.0.0.1:9090/minio/health/live || { echo "MinIO health failed"; exit 1; }
curl -sf http://127.0.0.1:8071/api/v1/health/ || { echo "Svix health failed"; exit 1; }
```
How can I resolve this? If you propose a fix, please make it concise.| smoke_elapsed=0 | ||
| smoke_passed=false | ||
| while [ "$smoke_elapsed" -lt "$smoke_timeout" ]; do | ||
| code=$(curl -s -o /dev/null -w "%{http_code}" --max-time 3 http://127.0.0.1:8102/health?db=1 2>/dev/null || true) |
There was a problem hiding this comment.
Unquoted URL with glob-special character
?
The URL http://127.0.0.1:8102/health?db=1 is passed unquoted to curl. Under set -euo pipefail, bash will attempt glob expansion on the ? character (which matches any single character). This works in practice only because no file matching http:/127.0.0.1:8102/health.db=1 exists, but it is fragile. Quote the URL for correctness:
| code=$(curl -s -o /dev/null -w "%{http_code}" --max-time 3 http://127.0.0.1:8102/health?db=1 2>/dev/null || true) | |
| code=$(curl -s -o /dev/null -w "%{http_code}" --max-time 3 "http://127.0.0.1:8102/health?db=1" 2>/dev/null || true) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docker/local-emulator/qemu/cloud-init/emulator/user-data
Line: 264
Comment:
**Unquoted URL with glob-special character `?`**
The URL `http://127.0.0.1:8102/health?db=1` is passed unquoted to `curl`. Under `set -euo pipefail`, bash will attempt glob expansion on the `?` character (which matches any single character). This works in practice only because no file matching `http:/127.0.0.1:8102/health.db=1` exists, but it is fragile. Quote the URL for correctness:
```suggestion
code=$(curl -s -o /dev/null -w "%{http_code}" --max-time 3 "http://127.0.0.1:8102/health?db=1" 2>/dev/null || true)
```
How can I resolve this? If you propose a fix, please make it concise.| # Save the standalone-traced node_modules (runtime deps only) before the full | ||
| # migration-pruner copy overwrites it. The slim-docker-image step in the QEMU | ||
| # build restores this after migrations are baked in. | ||
| RUN cp -a /app/node_modules /app/node_modules.standalone 2>/dev/null || mkdir -p /app/node_modules.standalone |
There was a problem hiding this comment.
Silent
cp failure creates an empty node_modules.standalone
The 2>/dev/null || mkdir -p /app/node_modules.standalone fallback means that if the cp -a silently fails (e.g. disk full, source missing), the build continues with an empty node_modules.standalone. When slim-docker-image later swaps it in, the container would have no node_modules and fail at runtime — likely caught only by the smoke test. Consider removing the suppression and fallback to surface the failure fast:
| RUN cp -a /app/node_modules /app/node_modules.standalone 2>/dev/null || mkdir -p /app/node_modules.standalone | |
| RUN cp -a /app/node_modules /app/node_modules.standalone |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docker/local-emulator/Dockerfile
Line: 188
Comment:
**Silent `cp` failure creates an empty `node_modules.standalone`**
The `2>/dev/null || mkdir -p /app/node_modules.standalone` fallback means that if the `cp -a` silently fails (e.g. disk full, source missing), the build continues with an empty `node_modules.standalone`. When `slim-docker-image` later swaps it in, the container would have no node_modules and fail at runtime — likely caught only by the smoke test. Consider removing the suppression and fallback to surface the failure fast:
```suggestion
RUN cp -a /app/node_modules /app/node_modules.standalone
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
docker/local-emulator/qemu/test-serial.sh (2)
65-74: Hardcodedhvfacceleration limits portability to macOS.The accelerator is hardcoded to
hvfwhich only works on macOS. If this script needs to run on Linux (e.g., CI runners using KVM), it will fail. Consider detecting the host OS and selecting the appropriate accelerator.Also, there's no fallback case—if
$ARCHis neitherarm64noramd64,qemu_baseremains unset and the script will fail with an unclear error.♻️ Proposed fix to support both macOS and Linux
case "$ARCH" in arm64) - accel="hvf" + case "$(uname -s)" in + Darwin) accel="hvf" ;; + Linux) accel="kvm" ;; + *) accel="tcg" ;; + esac firmware="$(find_aarch64_firmware)" qemu_base="qemu-system-aarch64 -machine virt -accel $accel -cpu max -bios $firmware" ;; amd64) - qemu_base="qemu-system-x86_64 -machine q35 -accel hvf -cpu max" + case "$(uname -s)" in + Darwin) accel="hvf" ;; + Linux) accel="kvm" ;; + *) accel="tcg" ;; + esac + qemu_base="qemu-system-x86_64 -machine q35 -accel $accel -cpu max" + ;; + *) + echo "Unsupported architecture: $ARCH" >&2 + exit 1 ;; esac🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/local-emulator/qemu/test-serial.sh` around lines 65 - 74, The script hardcodes accel="hvf" and leaves qemu_base unset for unknown ARCH; update the qemu setup to detect the host OS (via uname or similar) and choose the accelerator dynamically (e.g., use "hvf" on Darwin/macOS, "kvm" on Linux, and fallback to "tcg" if none available), apply this logic for both the arm64 branch (where firmware is set via find_aarch64_firmware) and the amd64 branch (set qemu_base using the chosen $accel), and add a default case for the case statement that emits a clear error and exits if $ARCH is not supported so qemu_base cannot remain unset.
104-109: Duplicate progress lines printed on each poll iteration.The
grep "STACK_PROVISION"inside the polling loop prints all matching lines every 2 seconds, causing repeated output. Consider tracking which lines have already been shown.♻️ Proposed fix to track and show only new lines
elapsed=0 timeout=120 +last_line_count=0 while [ "$elapsed" -lt "$timeout" ]; do if grep -q "STACK_CLOUD_INIT_DONE" "$TMP_DIR/serial.log" 2>/dev/null; then echo "" echo "=== SUCCESS: STACK_CLOUD_INIT_DONE received ===" echo "" echo "=== All STACK_PROVISION lines ===" grep "STACK_PROVISION" "$TMP_DIR/serial.log" || echo "(none found)" exit 0 fi # Show any STACK_PROVISION lines as they appear - if grep -q "STACK_PROVISION" "$TMP_DIR/serial.log" 2>/dev/null; then - grep "STACK_PROVISION" "$TMP_DIR/serial.log" | while IFS= read -r line; do - echo " [${elapsed}s] $line" - done - fi + current_lines=$(grep -c "STACK_PROVISION" "$TMP_DIR/serial.log" 2>/dev/null || echo 0) + if [ "$current_lines" -gt "$last_line_count" ]; then + grep "STACK_PROVISION" "$TMP_DIR/serial.log" | tail -n +"$((last_line_count + 1))" | while IFS= read -r line; do + echo " [${elapsed}s] $line" + done + last_line_count="$current_lines" + fi sleep 2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/local-emulator/qemu/test-serial.sh` around lines 104 - 109, The polling loop currently reprints all "STACK_PROVISION" matches from "$TMP_DIR/serial.log" every iteration; modify the loop that contains grep "STACK_PROVISION" "$TMP_DIR/serial.log" | while ... to only print new lines by tracking progress (e.g., keep a shown_lines_count or last_offset variable for "$TMP_DIR/serial.log"), use that to read/grep only the unseen portion (sed -n or tail --lines=+N or tail --follow/--bytes with stored offset) and update the tracker after printing; ensure the tracker variable is initialized before polling and updated inside the while so each "STACK_PROVISION" line is shown exactly once.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docker/local-emulator/qemu/cloud-init/emulator/user-data`:
- Around line 308-324: The current sequence moves Docker volumes to
/var/tmp/volumes-backup then nukes /var/lib/docker which can leave volumes
orphaned if any step (e.g., docker load, docker tag, systemctl) fails; add a
rollback/cleanup mechanism (use set -e and a trap ERR/EXIT or explicit error
checks) around the critical section that ensures on any failure you move
/var/tmp/volumes-backup back to /var/lib/docker/volumes and restart
docker/containerd, and only delete /var/tmp/final-image.tar and the backup after
confirming docker load, docker tag (stack-local-emulator:final ->
stack-local-emulator) and the service restarts succeeded; reference the mv
/var/tmp/volumes-backup, docker load -i /var/tmp/final-image.tar, docker tag
stack-local-emulator:final stack-local-emulator, and systemctl stop/start docker
containerd lines to locate where to add the trap and checks.
---
Nitpick comments:
In `@docker/local-emulator/qemu/test-serial.sh`:
- Around line 65-74: The script hardcodes accel="hvf" and leaves qemu_base unset
for unknown ARCH; update the qemu setup to detect the host OS (via uname or
similar) and choose the accelerator dynamically (e.g., use "hvf" on
Darwin/macOS, "kvm" on Linux, and fallback to "tcg" if none available), apply
this logic for both the arm64 branch (where firmware is set via
find_aarch64_firmware) and the amd64 branch (set qemu_base using the chosen
$accel), and add a default case for the case statement that emits a clear error
and exits if $ARCH is not supported so qemu_base cannot remain unset.
- Around line 104-109: The polling loop currently reprints all "STACK_PROVISION"
matches from "$TMP_DIR/serial.log" every iteration; modify the loop that
contains grep "STACK_PROVISION" "$TMP_DIR/serial.log" | while ... to only print
new lines by tracking progress (e.g., keep a shown_lines_count or last_offset
variable for "$TMP_DIR/serial.log"), use that to read/grep only the unseen
portion (sed -n or tail --lines=+N or tail --follow/--bytes with stored offset)
and update the tracker after printing; ensure the tracker variable is
initialized before polling and updated inside the while so each
"STACK_PROVISION" line is shown exactly once.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7d7d3843-8e75-47a3-9856-02b083c3954c
📒 Files selected for processing (4)
docker/local-emulator/Dockerfiledocker/local-emulator/qemu/build-image.shdocker/local-emulator/qemu/cloud-init/emulator/user-datadocker/local-emulator/qemu/test-serial.sh
🚧 Files skipped from review as they are similar to previous changes (2)
- docker/local-emulator/Dockerfile
- docker/local-emulator/qemu/build-image.sh
| log "Saving final image to /var/tmp..." | ||
| docker rm flatten | ||
| docker save stack-local-emulator:final -o /var/tmp/final-image.tar | ||
| mv /var/lib/docker/volumes /var/tmp/volumes-backup | ||
| log "Nuking Docker storage and reloading..." | ||
| systemctl stop docker containerd | ||
| rm -rf /var/lib/docker /var/lib/containerd | ||
| systemctl start docker containerd | ||
| until docker info >/dev/null 2>&1; do sleep 1; done | ||
| docker load -i /var/tmp/final-image.tar | ||
| docker tag stack-local-emulator:final stack-local-emulator | ||
| docker rmi stack-local-emulator:final || true | ||
| rm -f /var/tmp/final-image.tar | ||
| systemctl stop docker | ||
| rm -rf /var/lib/docker/volumes | ||
| mv /var/tmp/volumes-backup /var/lib/docker/volumes | ||
| systemctl start docker |
There was a problem hiding this comment.
Risky volume backup during Docker storage rebuild.
The sequence of moving volumes to /var/tmp, nuking Docker storage, then moving them back is fragile. If the script fails between lines 311 and 323 (e.g., docker load fails, system crashes), the volumes remain orphaned in /var/tmp and Docker storage is gone.
Consider adding error handling to restore volumes on failure:
🛡️ Proposed fix to add rollback on failure
log "Saving final image to /var/tmp..."
docker rm flatten
docker save stack-local-emulator:final -o /var/tmp/final-image.tar
mv /var/lib/docker/volumes /var/tmp/volumes-backup
+
+ restore_volumes() {
+ if [ -d /var/tmp/volumes-backup ]; then
+ log "Restoring volumes backup..."
+ rm -rf /var/lib/docker/volumes 2>/dev/null || true
+ mv /var/tmp/volumes-backup /var/lib/docker/volumes
+ fi
+ }
+ trap 'restore_volumes' ERR
+
log "Nuking Docker storage and reloading..."
systemctl stop docker containerd
rm -rf /var/lib/docker /var/lib/containerd📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| log "Saving final image to /var/tmp..." | |
| docker rm flatten | |
| docker save stack-local-emulator:final -o /var/tmp/final-image.tar | |
| mv /var/lib/docker/volumes /var/tmp/volumes-backup | |
| log "Nuking Docker storage and reloading..." | |
| systemctl stop docker containerd | |
| rm -rf /var/lib/docker /var/lib/containerd | |
| systemctl start docker containerd | |
| until docker info >/dev/null 2>&1; do sleep 1; done | |
| docker load -i /var/tmp/final-image.tar | |
| docker tag stack-local-emulator:final stack-local-emulator | |
| docker rmi stack-local-emulator:final || true | |
| rm -f /var/tmp/final-image.tar | |
| systemctl stop docker | |
| rm -rf /var/lib/docker/volumes | |
| mv /var/tmp/volumes-backup /var/lib/docker/volumes | |
| systemctl start docker | |
| log "Saving final image to /var/tmp..." | |
| docker rm flatten | |
| docker save stack-local-emulator:final -o /var/tmp/final-image.tar | |
| mv /var/lib/docker/volumes /var/tmp/volumes-backup | |
| restore_volumes() { | |
| if [ -d /var/tmp/volumes-backup ]; then | |
| log "Restoring volumes backup..." | |
| rm -rf /var/lib/docker/volumes 2>/dev/null || true | |
| mv /var/tmp/volumes-backup /var/lib/docker/volumes | |
| fi | |
| } | |
| trap 'restore_volumes' ERR | |
| log "Nuking Docker storage and reloading..." | |
| systemctl stop docker containerd | |
| rm -rf /var/lib/docker /var/lib/containerd | |
| systemctl start docker containerd | |
| until docker info >/dev/null 2>&1; do sleep 1; done | |
| docker load -i /var/tmp/final-image.tar | |
| docker tag stack-local-emulator:final stack-local-emulator | |
| docker rmi stack-local-emulator:final || true | |
| rm -f /var/tmp/final-image.tar | |
| systemctl stop docker | |
| rm -rf /var/lib/docker/volumes | |
| mv /var/tmp/volumes-backup /var/lib/docker/volumes | |
| systemctl start docker |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/local-emulator/qemu/cloud-init/emulator/user-data` around lines 308 -
324, The current sequence moves Docker volumes to /var/tmp/volumes-backup then
nukes /var/lib/docker which can leave volumes orphaned if any step (e.g., docker
load, docker tag, systemctl) fails; add a rollback/cleanup mechanism (use set -e
and a trap ERR/EXIT or explicit error checks) around the critical section that
ensures on any failure you move /var/tmp/volumes-backup back to
/var/lib/docker/volumes and restart docker/containerd, and only delete
/var/tmp/final-image.tar and the backup after confirming docker load, docker tag
(stack-local-emulator:final -> stack-local-emulator) and the service restarts
succeeded; reference the mv /var/tmp/volumes-backup, docker load -i
/var/tmp/final-image.tar, docker tag stack-local-emulator:final
stack-local-emulator, and systemctl stop/start docker containerd lines to locate
where to add the trap and checks.
| accel="hvf" | ||
| firmware="$(find_aarch64_firmware)" | ||
| qemu_base="qemu-system-aarch64 -machine virt -accel $accel -cpu max -bios $firmware" | ||
| ;; | ||
| amd64) | ||
| qemu_base="qemu-system-x86_64 -machine q35 -accel hvf -cpu max" |
Provisioning used to silently wait out the full 6000s timeout on any guest-side failure because the cleanup trap only logged the error. Now it writes STACK_CLOUD_INIT_FAILED and shuts the VM down, and the host waiter breaks on that marker and reports it distinctly. Also bump smoke test timeout 120s->300s, dump docker ps / container logs / free -m / verbose curl when it fails, log the qemu accel path, and enable /dev/kvm on the CI runner so the VM isn't stuck in TCG.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/qemu-emulator-build.yaml:
- Around line 52-63: The "Enable KVM access" step and its log are misleading
because build-image.sh only enables -accel kvm when host arch == guest arch, so
making /dev/kvm writable on an x86 runner does not speed up the arm64 matrix
leg; update the workflow to route the arm64 job to the same-arch runner (use
ubicloud-standard-8-arm for the arm64 matrix entry) or, if you prefer not to
change runners, modify the step's log to explicitly state "KVM here only helps
same-arch guests (host arch must match guest arch for -accel kvm)" and keep the
existing udev commands; reference the "Enable KVM access" step, the matrix entry
for arm64, and build-image.sh to ensure the fix is consistent.
In `@docker/local-emulator/qemu/build-image.sh`:
- Around line 367-369: The script calls node to run generate-env-development.mjs
but check_deps() doesn’t include node, so add "node" to the dependency list used
by check_deps() (the same place where other tools like docker/awk are listed) so
that check_deps() fails early if node is missing; update the dependency array or
call to check_deps() that feeds into the check_deps function and keep the
BUILD_ENV_FILE and the node
"$REPO_ROOT/docker/local-emulator/generate-env-development.mjs" invocation
unchanged.
- Around line 358-360: The qemu-img convert currently writes directly to
"$final_img" which can leave a partially overwritten corrupt file on failure;
change the workflow around the qemu-img convert invocation (the block using
tmp_img and final_img) to write the output to a temporary file (e.g.,
"$final_img.tmp" or similar) and then atomically rename/move it to "$final_img"
only after convert succeeds, and still ensure tmp_dir cleanup (rm -rf
"$tmp_dir") happens appropriately on both success and failure; additionally,
update the dependency check in the check_deps function to include "node" in the
tooling list so the script validates node is installed before invoking it later.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0da9cb3f-5870-4986-83d6-34baa6e95ce0
📒 Files selected for processing (3)
.github/workflows/qemu-emulator-build.yamldocker/local-emulator/qemu/build-image.shdocker/local-emulator/qemu/cloud-init/emulator/user-data
🚧 Files skipped from review as they are similar to previous changes (1)
- docker/local-emulator/qemu/cloud-init/emulator/user-data
| - name: Enable KVM access | ||
| run: | | ||
| echo 'KERNEL=="kvm", GROUP="kvm", MODE="0666", OPTIONS+="static_node=kvm"' \ | ||
| | sudo tee /etc/udev/rules.d/99-kvm4all.rules | ||
| sudo udevadm control --reload-rules | ||
| sudo udevadm trigger --name-match=kvm || true | ||
| ls -la /dev/kvm || echo "no /dev/kvm present" | ||
| if [ -w /dev/kvm ]; then | ||
| echo "KVM is writable — hardware acceleration will be used" | ||
| else | ||
| echo "WARNING: /dev/kvm is not writable — will fall back to TCG (very slow)" | ||
| fi |
There was a problem hiding this comment.
This KVM step does not speed up the arm64 matrix leg on ubicloud-standard-8.
build-image.sh only switches to -accel kvm when the host arch matches the guest arch, so on the current x86 runner the arm64 build still falls back to TCG even if /dev/kvm is writable. If the goal is to optimize both architectures, route the arm64 job to ubicloud-standard-8-arm or make the log message explicit that KVM here only helps same-arch guests.
Based on learnings, this workflow can use both ubicloud-standard-8 and ubicloud-standard-8-arm, so there is an existing same-arch runner option for the arm64 build.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/qemu-emulator-build.yaml around lines 52 - 63, The "Enable
KVM access" step and its log are misleading because build-image.sh only enables
-accel kvm when host arch == guest arch, so making /dev/kvm writable on an x86
runner does not speed up the arm64 matrix leg; update the workflow to route the
arm64 job to the same-arch runner (use ubicloud-standard-8-arm for the arm64
matrix entry) or, if you prefer not to change runners, modify the step's log to
explicitly state "KVM here only helps same-arch guests (host arch must match
guest arch for -accel kvm)" and keep the existing udev commands; reference the
"Enable KVM access" step, the matrix entry for arm64, and build-image.sh to
ensure the fix is consistent.
| log "Compressing final image (this may take several minutes)..." | ||
| qemu-img convert -p -O qcow2 -c "$final_img" "$final_img.tmp" | ||
| mv "$final_img.tmp" "$final_img" | ||
| qemu-img convert -p -O qcow2 -c "$tmp_img" "$final_img" | ||
| rm -rf "$tmp_dir" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check the file size and read lines around 358-360
wc -l docker/local-emulator/qemu/build-image.shRepository: stack-auth/stack-auth
Length of output: 110
🏁 Script executed:
# Read the relevant section
sed -n '350,375p' docker/local-emulator/qemu/build-image.shRepository: stack-auth/stack-auth
Length of output: 951
🏁 Script executed:
# Search for node invocations and check_deps function
rg -n "node " docker/local-emulator/qemu/build-image.sh | head -20Repository: stack-auth/stack-auth
Length of output: 137
🏁 Script executed:
# Find the check_deps function to see if node is validated
rg -n "check_deps\|qemu-img\|node" docker/local-emulator/qemu/build-image.sh | head -30Repository: stack-auth/stack-auth
Length of output: 47
🏁 Script executed:
# Search for check_deps function more broadly
rg -n "check_deps|declare -a|qemu-img" docker/local-emulator/qemu/build-image.shRepository: stack-auth/stack-auth
Length of output: 274
🏁 Script executed:
# Get the full check for dependencies/requirements
sed -n '1,100p' docker/local-emulator/qemu/build-image.sh | rg -n "check|qemu|curl|docker|gzip|node"Repository: stack-auth/stack-auth
Length of output: 453
🏁 Script executed:
# Search for any validation of dependencies in the entire file
rg -B2 -A5 "command -v|which " docker/local-emulator/qemu/build-image.shRepository: stack-auth/stack-auth
Length of output: 610
Avoid writing the final qcow2 in place.
Line 359 converts directly into "$final_img". If qemu-img convert fails or is interrupted, an existing image can be left partially overwritten, which makes reruns or package steps pick up a corrupt artifact. Use a temporary file with atomic rename to ensure failure doesn't corrupt the final artifact.
Additionally, node is invoked at line 368 but is not validated in the check_deps() function (line 50), creating an incomplete dependency check.
Suggested fix
- qemu-img convert -p -O qcow2 -c "$tmp_img" "$final_img"
- rm -rf "$tmp_dir"
+ local tmp_final="${final_img}.tmp"
+ rm -f "$tmp_final"
+ qemu-img convert -p -O qcow2 -c "$tmp_img" "$tmp_final"
+ mv "$tmp_final" "$final_img"
+ rm -rf "$tmp_dir"Also add node to the dependency check loop at line 50.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| log "Compressing final image (this may take several minutes)..." | |
| qemu-img convert -p -O qcow2 -c "$final_img" "$final_img.tmp" | |
| mv "$final_img.tmp" "$final_img" | |
| qemu-img convert -p -O qcow2 -c "$tmp_img" "$final_img" | |
| rm -rf "$tmp_dir" | |
| log "Compressing final image (this may take several minutes)..." | |
| local tmp_final="${final_img}.tmp" | |
| rm -f "$tmp_final" | |
| qemu-img convert -p -O qcow2 -c "$tmp_img" "$tmp_final" | |
| mv "$tmp_final" "$final_img" | |
| rm -rf "$tmp_dir" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/local-emulator/qemu/build-image.sh` around lines 358 - 360, The
qemu-img convert currently writes directly to "$final_img" which can leave a
partially overwritten corrupt file on failure; change the workflow around the
qemu-img convert invocation (the block using tmp_img and final_img) to write the
output to a temporary file (e.g., "$final_img.tmp" or similar) and then
atomically rename/move it to "$final_img" only after convert succeeds, and still
ensure tmp_dir cleanup (rm -rf "$tmp_dir") happens appropriately on both success
and failure; additionally, update the dependency check in the check_deps
function to include "node" in the tooling list so the script validates node is
installed before invoking it later.
| log "Generating emulator build env file..." | ||
| node "$REPO_ROOT/docker/local-emulator/generate-env-development.mjs" | ||
| BUILD_ENV_FILE="$REPO_ROOT/docker/local-emulator/.env.development" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the script file around lines 367-369
cat -n docker/local-emulator/qemu/build-image.sh | sed -n '360,375p'Repository: stack-auth/stack-auth
Length of output: 718
🏁 Script executed:
# Now let's find the check_deps() function
rg -A 15 "check_deps\(\)" docker/local-emulator/qemu/build-image.shRepository: stack-auth/stack-auth
Length of output: 576
🏁 Script executed:
# Let's also search for all occurrences of "node" in the script to see if it's used elsewhere
rg -n "node" docker/local-emulator/qemu/build-image.shRepository: stack-auth/stack-auth
Length of output: 137
Add node to the upfront dependency check.
This script unconditionally calls node at line 368, but check_deps() doesn't verify it's available. Adding it to the existing dependency check will catch missing node early instead of failing at runtime.
Suggested fix
- for cmd in qemu-img curl docker gzip; do
+ for cmd in qemu-img curl docker gzip node; do
command -v "$cmd" >/dev/null 2>&1 || missing+=("$cmd")
done🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/local-emulator/qemu/build-image.sh` around lines 367 - 369, The script
calls node to run generate-env-development.mjs but check_deps() doesn’t include
node, so add "node" to the dependency list used by check_deps() (the same place
where other tools like docker/awk are listed) so that check_deps() fails early
if node is missing; update the dependency array or call to check_deps() that
feeds into the check_deps function and keep the BUILD_ENV_FILE and the node
"$REPO_ROOT/docker/local-emulator/generate-env-development.mjs" invocation
unchanged.
Summary by CodeRabbit
Chores
Tests