Optimize Firecracker snapshot resume#261
Conversation
dffc792 to
05bc363
Compare
| } | ||
|
|
||
| parent := filepath.Dir(path) | ||
| fd, err := unix.InotifyInit1(unix.IN_CLOEXEC | unix.IN_NONBLOCK) |
There was a problem hiding this comment.
we attempt to not wait via polling, then we fall back to polling
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit b7c7c05. Configure here.
| defer snapshotSourceAliasMu.Unlock() | ||
| return withSnapshotSourceDirAlias(meta, filepath.Dir(socketPath), func() error { | ||
| return hv.loadSnapshot(ctx, snapshotPath, meta.NetworkOverrides) | ||
| return hv.loadSnapshot(ctx, snapshotPath, meta.NetworkOverrides, resumeOnLoad) |
There was a problem hiding this comment.
Guest runs before alias cleanup
Medium Severity
With resume-on-load enabled, snapshot load can start the guest while still inside withSnapshotSourceDirAlias, before the temporary source-data symlink is removed. Restore then skips the separate Resume call when RestoredResumed is set, so alias restores no longer guarantee the guest stays paused until after that teardown finishes.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit b7c7c05. Configure here.
|
Created a monitoring plan for this PR. What this PR does: Speeds up Firecracker VM restore by resuming the VM inside the snapshot load call (instead of a separate step) and replacing 50ms-interval polling with inotify-based socket detection on Linux. Adds an env-var escape hatch ( Intended effect:
Risks:
Status updates will be posted automatically on this PR as monitoring progresses. |
|
closing this one intentionally. resume-on-load saves a small Firecracker API step, but it complicates the restore invariant: when a temporary snapshot source alias is needed, |
|
Closing intentionally; see latest comment. The dependent PRs are being rebased directly onto main without the resume-on-load layer. |


Summary
resume_vmsupport so restored guests can resume during/snapshot/loadinstead of requiring a separate post-loadResumecall.hypervisor.RestoredResumed, and skips the extra resume step only in that case.Tests
go test ./lib/hypervisor ./lib/hypervisor/firecracker ./lib/mailbox ./lib/guest ./lib/system/guest_agent ./lib/oapi -count=1git diff --check