feat: implement sandbox orchestrator and harden Lima provider#3
feat: implement sandbox orchestrator and harden Lima provider#3
Conversation
PR #3 — Sandbox Orchestrator: - Add internal/sandbox/ package implementing api.SandboxManager - sandbox.go: Manager struct with JSON-backed state persistence, Resetter interface for snapshot support, injectable ResourceChecker - create.go: Full create workflow (validate -> resources -> runtime detection -> profile resolution -> VM provision -> network policy -> mount registration) - run.go: Run workflow (start VM, exec command as user) + Start method - stop.go: Graceful stop workflow - delete.go: Destroy workflow (stop -> delete VM -> cleanup mounts -> state) - reset.go: Reset workflow (check snapshot -> stop -> restore -> restart) - list.go: List and Status with live state reconciliation from VM - sandbox_test.go: 36 tests with faked dependencies covering all workflows, error paths, persistence, and edge cases - Compile-time interface check: var _ api.SandboxManager = (*Manager)(nil) Security fixes (from audit): - Fix #1: Lima config file permissions 0644 -> 0600 (owner-only) - Fix #2: Block sensitive host mount paths (/etc, /root, /proc, /sys, /dev, /home, /var/run/docker.sock, etc.) via allowlist - Fix #3: Reject shell metacharacters (;;&|$`!) in provision commands - Fix #4: Individual argument escaping in ExecAsUser preserves argument boundaries (was joining then escaping, now escapes each arg separately) - Fix #5: Permission masking in snapshot/restore strips SUID, SGID, and world-write bits via safeFilePerm() mask to 0755 AI-assisted: this PR was generated with AI assistance. Every line has been reviewed, understood, and owned by the human who commits it.
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 2 minutes and 1 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (11)
WalkthroughThis PR implements a complete sandbox orchestration system with a Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant Manager
participant Detector
participant Profiles
participant Provider
participant Network
participant Mounts
Client->>Manager: Create(ctx, spec)
Manager->>Manager: Validate name uniqueness
Manager->>Manager: Check available resources
Manager->>Detector: Detect runtime from spec.Source
Manager->>Profiles: Resolve profile (default: "cautious")
Manager->>Manager: Compute VM spec (CPU/Memory/Disk)
Manager->>Provider: Create VM
Manager->>Provider: Start VM
Manager->>Network: Apply network policy from profile
Manager->>Network: Lock sandbox (if profile.LockAfterSetup)
Manager->>Mounts: Register mount from spec.Source
Manager->>Manager: Persist sandbox state (StateRunning)
Manager->>Client: Return SandboxInfo
sequenceDiagram
actor Client
participant Manager
participant Provider
participant Storage
Client->>Manager: List(ctx)
Manager->>Storage: Load all sandbox entries
loop For each sandbox
Manager->>Provider: IsRunning(ctx, name)
alt Provider reports state change
Manager->>Manager: Update info.State
Manager->>Storage: Persist updated state
end
end
Manager->>Client: Return SandboxInfo[]
Client->>Manager: Status(ctx, name)
Manager->>Storage: Get sandbox by name
alt Sandbox state is Errored
Manager->>Client: Return cached info
else
Manager->>Provider: IsRunning(ctx, name)
alt State mismatch detected
Manager->>Manager: Update info.State
Manager->>Storage: Persist updated state
end
Manager->>Client: Return updated SandboxInfo
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 14
🧹 Nitpick comments (8)
internal/vm/lima/snapshot.go (1)
82-82: Minor: Redundant.Perm()call.
safeFilePermalready returnsm.Perm() & 0755, which is a permission value. The subsequent.Perm()call is a no-op since the type bits are already stripped.♻️ Optional simplification
- return os.Chmod(targetPath, safeFilePerm(info.Mode()).Perm()) + return os.Chmod(targetPath, safeFilePerm(info.Mode()))And similarly in
copyDirat line 211.Also applies to: 211-211
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/vm/lima/snapshot.go` at line 82, The call to .Perm() is redundant because safeFilePerm already returns a FileMode with permissions applied; remove the trailing .Perm() when passing its result to os.Chmod. Update the os.Chmod invocation in snapshot.go (the call that uses targetPath and safeFilePerm) and the analogous call inside copyDir to pass safeFilePerm(...) directly instead of safeFilePerm(...).Perm().internal/sandbox/run.go (1)
50-77:Startmethod has consistent patterns with other lifecycle methods.The implementation follows the same lock/check/execute/update pattern. It shares the same stale state concerns, but the pattern is at least consistent across the codebase.
One note: when the VM is already running (lines 63-65), the method returns
nilwithout updating persistence. This is correct since no state change occurred, but consider whether the persisted state should be reconciled if it somehow got out of sync (e.g.,info.Statewas notStateRunningbut VM is actually running).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/sandbox/run.go` around lines 50 - 77, The Start method returns early when provider.IsRunning reports the VM is already running but doesn't reconcile persisted state; update Start to, after detecting running == true, acquire the same lock, fetch or use the existing info, set info.State = api.StateRunning if it differs, call m.put(info) to persist, and then return nil, ensuring locking around access to info and put (use m.mu.Lock()/Unlock() similar to the rest of the method) so the stored state stays consistent with provider.IsRunning.internal/vm/lima/config.go (2)
13-14: Import ordering:strconvshould be grouped with standard library imports.Go convention groups standard library imports together, separate from external packages.
♻️ Fix import grouping
import ( "fmt" "path/filepath" "regexp" + "strconv" "strings" - "strconv" - "github.com/muneebs/airlock/internal/api" "gopkg.in/yaml.v3" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/vm/lima/config.go` around lines 13 - 14, The import ordering in internal/vm/lima/config.go is incorrect: move "strconv" into the block with other standard library imports so all stdlib packages are grouped together (update the import block containing strconv and other stdlib entries in that file to follow Go convention).
19-35: Sensitive mount paths list contains redundant entries and may be overly restrictive.
Redundancy:
/etc/shadow,/etc/ssh,/etc/ssl,/etc/pkiare already blocked by the/etcprefix check inisSensitiveMountPath.Usability concern: Blocking
/homeentirely prevents legitimate use cases on Linux where users typically mount project directories from/home/username/projects/. The test confirms/home/user/projects/appis blocked while/Users/alice/projects/app(macOS) is allowed.If this asymmetry between Linux and macOS is intentional (e.g., assuming macOS-only usage), consider documenting this. Otherwise, consider a more granular approach that blocks
/homeroot but allows subdirectories beyond the first level.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/vm/lima/config.go` around lines 19 - 35, The sensitiveMountPaths list contains redundant entries under /etc and is overbroad for /home; remove entries "/etc/shadow", "/etc/ssh", "/etc/ssl", "/etc/pki" from the sensitiveMountPaths slice and update the isSensitiveMountPath logic so that "/etc" continues to block everything under /etc but "/home" only blocks the root "/home" itself (allowing user subdirectories like /home/user/...); reference sensitiveMountPaths and isSensitiveMountPath to locate and change the list and the matching logic accordingly.internal/sandbox/sandbox_test.go (1)
483-483: Remove vestigial blank identifier assignment.The
_ = resetterassignment serves no purpose sinceresetteris already used at line 499.Proposed fix
if info.State != api.StateRunning { t.Errorf("expected state running after reset, got %s", info.State) } - - _ = resetter }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/sandbox/sandbox_test.go` at line 483, Remove the vestigial blank identifier assignment "_ = resetter" — it is redundant because the variable resetter is used later (e.g., at line where resetter is referenced) so delete the "_ = resetter" statement to clean up dead code and avoid pointless no-op assignments.internal/sandbox/sandbox.go (2)
200-227: DRY: Default values duplicated across multiple functions.The defaults (CPU=2, Memory="4GiB", Disk="20GiB") are repeated in
newSandboxInfo,CheckResourcesForSpec, anddefaultSpec()in create.go. Consider extracting these to package-level constants or a singledefaultResources()function.Also applies to: 229-251
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/sandbox/sandbox.go` around lines 200 - 227, The CPU, Memory and Disk default values are duplicated across newSandboxInfo, CheckResourcesForSpec, and defaultSpec; extract them into shared package-level constants (e.g., defaultCPU, defaultMemory, defaultDisk) or a single helper function (e.g., defaultResources() returning CPU, Memory, Disk) and update newSandboxInfo, CheckResourcesForSpec, and defaultSpec to use those constants/helper instead of hardcoding 2/"4GiB"/"20GiB" so defaults live in one place.
93-103: Consider restricting store file permissions to 0600.The PR hardens Lima config permissions to 0600 (per PR objectives), but the sandbox store is written with 0644. While sandbox metadata may be less sensitive than VM configs, applying consistent restrictive permissions reduces attack surface.
Proposed fix
- return os.WriteFile(m.storePath, data, 0644) + return os.WriteFile(m.storePath, data, 0600)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/sandbox/sandbox.go` around lines 93 - 103, The sandbox store is written with world-readable permissions; in func (m *Manager) save() tighten the file permissions when persisting m.sandboxes by changing the os.WriteFile call to use 0600 for m.storePath so only the owner can read/write, and optionally tighten the directory creation (currently os.MkdirAll(dir, 0755)) to 0700 to restrict access to the store directory; update the behavior in Manager.save around m.storePath and the MkdirAll call accordingly.internal/sandbox/list.go (1)
11-40: Holding mutex during provider I/O calls may cause contention.
List()holdsm.muwhile callingm.provider.IsRunning()for each sandbox. If the provider call is slow (network/IPC), this blocks all other Manager operations. Consider collecting sandbox names under the lock, releasing it, then querying provider status, and finally re-acquiring to reconcile.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/sandbox/list.go` around lines 11 - 40, Manager.List currently holds m.mu while calling m.provider.IsRunning which can block other operations; change List to first lock m.mu and copy the necessary sandbox identifiers and a shallow copy of m.sandboxes (use m.sandboxes or their names) into a local slice, then unlock before iterating and calling m.provider.IsRunning for each entry; after collecting running states, re-acquire m.mu to reconcile state changes (updating info.State and calling m.put(info) as needed) and build the resulting []api.SandboxInfo to return; ensure you still handle api.StateErrored without calling provider and preserve existing behavior for m.put and result ordering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/sandbox/create.go`:
- Around line 20-25: The current Create path performs a non-atomic existence
check on m.sandboxes (m.mu.Lock / m.mu.Unlock around the lookup) but releases
the lock before resource validation and calling m.put(info), causing a TOCTOU
race that can allow duplicate sandboxes; fix by making the check-and-insert
atomic: hold m.mu for the entire create critical section or implement an atomic
check-and-set (e.g., reserve the name in m.sandboxes immediately after the
existence check and before releasing the lock, or move the m.put(info) call
inside the same lock scope). Ensure you update places referencing
m.mu.Lock/m.mu.Unlock, the existence check for spec.Name, and m.put(info) so the
name reservation and insertion happen while the mutex is held.
- Around line 91-95: The if/else in create.go currently sets info.State =
api.StateRunning in both branches (checking spec.Ephemeral), so remove the
redundant conditional and set info.State = api.StateRunning unconditionally, or
if ephemeral should behave differently implement the intended logic for
spec.Ephemeral (e.g., set a different state or perform ephemeral-specific
initialization) by updating the branch handling around spec.Ephemeral and
info.State accordingly.
- Around line 60-66: When m.provider.Start(ctx, spec.Name) fails after a
successful Create, call m.provider.Delete(ctx, spec.Name) to avoid leaving an
orphaned VM; perform the Delete attempt before marking info.State =
api.StateErrored and persisting with m.put(info), handle/log any Delete error
(but don't fail to update the sandbox state), and keep the existing mutex usage
around the state update (i.e., acquire m.mu, set info.State, call m.put(info),
release m.mu) so cleanup is attempted and the sandbox is still recorded as
errored if Start fails.
In `@internal/sandbox/delete.go`:
- Around line 17-22: The deletion logic currently ignores errors from
m.provider.IsRunning and may proceed to delete a VM that's actually running;
update the delete flow in delete.go to handle the IsRunning error explicitly:
after calling m.provider.IsRunning(ctx, name) check if err != nil and return (or
wrap) that error (e.g., "check running state: %w") instead of continuing, only
call m.provider.Stop(ctx, name) when IsRunning succeeds and running is true, and
preserve the existing Stop error wrapping ("stop VM before delete: %w").
- Line 35: The current call m.mounts.Unregister(ctx, name, name) incorrectly
uses the sandbox name for both sandbox and mount identifiers; instead, fetch all
mounts for the sandbox (e.g. call m.mounts.List(ctx, name) or the appropriate
list method to get mounts for sandbox `name`) and iterate them, calling
m.mounts.Unregister(ctx, name, mount.Name) (or the mount's identifier field) for
each entry, using the same ctx and sandbox `name` variable so every mount
registered under that sandbox is removed.
In `@internal/sandbox/list.go`:
- Around line 60-70: The code updates info.State (the sandbox entry from
m.sandboxes) outside the mutex, causing a data race; acquire m.mu before
mutating info.State and calling m.put(info) and release after both operations.
Specifically, in the branch checks around running and info.State (the block that
currently sets info.State = api.StateRunning and info.State = api.StateStopped),
move the assignment into the critical section guarded by m.mu so the sequence
becomes: m.mu.Lock(), set info.State, _ = m.put(info), m.mu.Unlock(). Keep the
initial read-only checks outside the lock if desired, but any mutation of
info.State or calls to m.put must occur while holding m.mu to avoid races.
In `@internal/sandbox/reset.go`:
- Around line 34-37: The code silently ignores errors from m.put(info) after
updating info.State = api.StateStopped inside the m.mu.Lock()/Unlock() block;
change this so persistence failures are surfaced—capture the error returned by
m.put(info), log it via the manager logger (or return it wrapped alongside the
reset result) and ensure the lock is released in all cases; specifically update
the reset path that calls m.put(info) to handle and propagate/log the error
instead of discarding it.
- Around line 13-19: The Reset method has a TOCTOU race: Manager.get grabs info
under m.mu then unlocks, so concurrent Destroy/Stop can mutate or remove the
sandbox before the subsequent calls to info.IsRunning and info.RestoreClean; fix
by either holding m.mu for the whole Reset operation or by making a deep copy of
the sandbox state returned by get (i.e., copy the fields used by Reset into a
local struct/value) and operate on that copy; specifically update Manager.Reset
to avoid using the shared info pointer after unlocking (or reacquire the lock
around calls to info.IsRunning and info.RestoreClean) to prevent races with
Destroy/Stop.
In `@internal/sandbox/run.go`:
- Around line 14-47: The Run method currently reads VM info under m.mu, releases
the lock, then later re-acquires it and modifies info.State without re-reading
the latest stored state and ignores the error returned from m.put(info); update
Run (the Manager.Run function) to re-fetch info after Start succeeds when you
re-acquire m.mu (call m.get(name) again) and only update info.State if the
current stored state still indicates not running, and handle the error returned
from m.put(info) (propagate or wrap and return it instead of ignoring); ensure
you still call provider.Start/ExecAsUser as before but protect state mutation
and persistence with the lock and proper error handling.
In `@internal/sandbox/sandbox_test.go`:
- Around line 748-758: The test TestCheckResourcesForSpec currently calls
CheckResourcesForSpec and discards the returned issues; update the test to
assert the expected behavior by checking the contents/length of the returned
issues (e.g., expect no issues for the given valid spec or specific validation
errors) or remove the test entirely if it was only for coverage; locate the test
function TestCheckResourcesForSpec and add assertions against the issues
variable (or delete the test) to ensure the result is validated.
In `@internal/sandbox/sandbox.go`:
- Around line 124-148: In resolveResources, avoid dereferencing cpu directly
since cfgDefaults.CPU may be nil; update the CPU selection to safely handle nil
(e.g., implement a helper like derefOr(p *int, def int) int or an inline nil
check) and use that to set VMSpec.CPU instead of *cpu; reference the
resolveResources function, the cpu variable, cfgDefaults.CPU, and the VMSpec CPU
assignment to locate and update the code.
In `@internal/sandbox/stop.go`:
- Around line 11-46: The Stop method currently reads info under m.mu then
releases the lock and later mutates that stale info without re-reading it;
re-acquire the lock and call m.get(name) to refresh info before each mutation
(before setting info.State to api.StateStopped or api.StateErrored) to avoid
races, and check/handle errors returned from m.put(info) (wrap/return an error
instead of ignoring) so persistence failures are surfaced; update the branches
that call provider.IsRunning and provider.Stop to refresh info under lock and to
return a wrapped error if m.put fails, referencing Manager.Stop, m.get, m.put,
provider.IsRunning, provider.Stop, and the api.StateStopped/api.StateErrored
state updates.
In `@internal/vm/lima/config.go`:
- Around line 37-39: The current dangerousShellChars regexp (variable
dangerousShellChars) misses many shell metacharacters (newline, quotes,
parentheses, redirection, backslash, etc.), so replace the blacklist approach
with a whitelist validation: create/replace the check to use a safeProvisionCmd
regexp that only allows a narrow set of characters (e.g., alphanumerics, space,
hyphen, underscore, dot, slash and a minimal set of safe punctuation) and forbid
any newline (\n/\r) or characters like ; & | $ ` \ " ' < > ( ) and backslash;
update any code paths using dangerousShellChars to validate against the new
whitelist (e.g., safeProvisionCmd.MatchString) and reject/return an error if the
provision string fails the whitelist.
In `@internal/vm/lima/provider.go`:
- Around line 161-165: The shell escaping currently performed by shellEscape
does not wrap arguments in single quotes, so arguments with spaces are split
when building shellCmd in the loop that creates escapedArgs and then joins them
for bash -c; update shellEscape to both replace single quotes inside the
argument with the standard '\'"\'"' sequence and then wrap the whole result in
surrounding single quotes (i.e., return "'" + replaced + "'"), so that shellCmd
:= fmt.Sprintf("sudo -u %s bash --login -c '%s'", user,
strings.Join(escapedArgs, " ")) preserves argument boundaries for values like
"hello world"; also update or strengthen
TestLimaProviderExecAsUserArgPreservation to assert actual argument boundaries
rather than mere substring presence.
---
Nitpick comments:
In `@internal/sandbox/list.go`:
- Around line 11-40: Manager.List currently holds m.mu while calling
m.provider.IsRunning which can block other operations; change List to first lock
m.mu and copy the necessary sandbox identifiers and a shallow copy of
m.sandboxes (use m.sandboxes or their names) into a local slice, then unlock
before iterating and calling m.provider.IsRunning for each entry; after
collecting running states, re-acquire m.mu to reconcile state changes (updating
info.State and calling m.put(info) as needed) and build the resulting
[]api.SandboxInfo to return; ensure you still handle api.StateErrored without
calling provider and preserve existing behavior for m.put and result ordering.
In `@internal/sandbox/run.go`:
- Around line 50-77: The Start method returns early when provider.IsRunning
reports the VM is already running but doesn't reconcile persisted state; update
Start to, after detecting running == true, acquire the same lock, fetch or use
the existing info, set info.State = api.StateRunning if it differs, call
m.put(info) to persist, and then return nil, ensuring locking around access to
info and put (use m.mu.Lock()/Unlock() similar to the rest of the method) so the
stored state stays consistent with provider.IsRunning.
In `@internal/sandbox/sandbox_test.go`:
- Line 483: Remove the vestigial blank identifier assignment "_ = resetter" — it
is redundant because the variable resetter is used later (e.g., at line where
resetter is referenced) so delete the "_ = resetter" statement to clean up dead
code and avoid pointless no-op assignments.
In `@internal/sandbox/sandbox.go`:
- Around line 200-227: The CPU, Memory and Disk default values are duplicated
across newSandboxInfo, CheckResourcesForSpec, and defaultSpec; extract them into
shared package-level constants (e.g., defaultCPU, defaultMemory, defaultDisk) or
a single helper function (e.g., defaultResources() returning CPU, Memory, Disk)
and update newSandboxInfo, CheckResourcesForSpec, and defaultSpec to use those
constants/helper instead of hardcoding 2/"4GiB"/"20GiB" so defaults live in one
place.
- Around line 93-103: The sandbox store is written with world-readable
permissions; in func (m *Manager) save() tighten the file permissions when
persisting m.sandboxes by changing the os.WriteFile call to use 0600 for
m.storePath so only the owner can read/write, and optionally tighten the
directory creation (currently os.MkdirAll(dir, 0755)) to 0700 to restrict access
to the store directory; update the behavior in Manager.save around m.storePath
and the MkdirAll call accordingly.
In `@internal/vm/lima/config.go`:
- Around line 13-14: The import ordering in internal/vm/lima/config.go is
incorrect: move "strconv" into the block with other standard library imports so
all stdlib packages are grouped together (update the import block containing
strconv and other stdlib entries in that file to follow Go convention).
- Around line 19-35: The sensitiveMountPaths list contains redundant entries
under /etc and is overbroad for /home; remove entries "/etc/shadow", "/etc/ssh",
"/etc/ssl", "/etc/pki" from the sensitiveMountPaths slice and update the
isSensitiveMountPath logic so that "/etc" continues to block everything under
/etc but "/home" only blocks the root "/home" itself (allowing user
subdirectories like /home/user/...); reference sensitiveMountPaths and
isSensitiveMountPath to locate and change the list and the matching logic
accordingly.
In `@internal/vm/lima/snapshot.go`:
- Line 82: The call to .Perm() is redundant because safeFilePerm already returns
a FileMode with permissions applied; remove the trailing .Perm() when passing
its result to os.Chmod. Update the os.Chmod invocation in snapshot.go (the call
that uses targetPath and safeFilePerm) and the analogous call inside copyDir to
pass safeFilePerm(...) directly instead of safeFilePerm(...).Perm().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 34cec265-445b-44e0-8bb9-35283d58c7f4
📒 Files selected for processing (13)
internal/sandbox/create.gointernal/sandbox/delete.gointernal/sandbox/list.gointernal/sandbox/reset.gointernal/sandbox/run.gointernal/sandbox/sandbox.gointernal/sandbox/sandbox_test.gointernal/sandbox/stop.gointernal/vm/lima/config.gointernal/vm/lima/config_test.gointernal/vm/lima/provider.gointernal/vm/lima/snapshot.gointernal/vm/lima/snapshot_test.go
…haned VMs - Create: make existence check + name reservation atomic (TOCTOU fix); add concurrent duplicate test proving only one sandbox is created - Create: delete VM on Start failure to avoid orphaned VMs - Create: collapse redundant Ephemeral if/else into unconditional assignment - Create: return empty SandboxInfo on final put error (was returning *info with stale state) - Delete: return error from IsRunning instead of silently proceeding - Delete: unregister all mounts by listing first, not just the sandbox-named one - Stop: re-read info under lock before each state mutation (stale pointer fix) - Stop: surface m.put errors instead of discarding them - Stop: surface m.put errors on success path; keep discard on error path where Stop error takes priority - resolveResources: nil-safe CPU dereference via derefInt helper - TestCheckResourcesForSpec: assert non-nil return instead of discarding result
…rgs in quotes - Replace blacklist regex (dangerousShellChars) with whitelist regex (safeProvisionCmd) that only permits alphanumerics, spaces, hyphens, underscores, dots, slashes, colons, equals, tildes, plus, comma, hash, and at-sign. Rejects newline, carriage return, quotes, parentheses, redirection, backslash, and all shell metacharacters. - Update shellEscape to wrap each argument in single quotes so spaces and special characters preserve argument boundaries when passed to bash -c. Previously "hello world" would be split into two args. - Replace TestShellEscape with TestShellEscapeWrapsInQuotes verifying quoting behavior; strengthen TestLimaProviderExecAsUserArgPreservation to assert single-quoted "hello world" token instead of substring match. - Rename TestGenerateConfigValidationDangerousProvisionCmd to TestGenerateConfigValidationProvisionCmdWhitelist with expanded cases covering 10 safe commands and 14 injection patterns.
…rrors - list.go (Status): move info.State mutation inside m.mu lock scope; re-read via m.get(name) to avoid stale pointer after concurrent writes - reset.go: re-read info via m.get(name) under lock before every state mutation; guard against nil info if sandbox deleted concurrently; surface m.put errors on success paths (Stopped, Running state transitions); keep discard on error paths where primary error dominates - run.go (Run + Start): re-read info via m.get(name) under lock before mutating State; only update if current state is not already Running; surface m.put errors instead of discarding
Summary
internal/sandboxpackage with full lifecycle management (create, run, start, stop, destroy, reset, list, status) following SOLID principles with injectable dependenciesSandbox Orchestrator (PR #3 from TASKS.md)
sandbox.goapi.SandboxManagercheckcreate.gorun.gostop.godelete.goreset.golist.gosandbox_test.goKey design decisions (per PRINCIPLES.md):
api.Provider,api.NetworkController,api.MountManagerinterfaces.Resetteris ISP-compliant separate interface.cautiousprofile default, network locked after setup, read-only mountsResourceCheckeris injectable for testing; real impl checks CPU/memory/disk before VM creationStateErrored;Destroyalways cleans up;StateErrorednot overwritten by live statusSecurity Fixes (from audit)
provider.go:63/etc,/root,/proc,/sys,/dev,/home,/var/run/docker.sock, etc.config.go:26-38,160-170;&|$\!` charactersconfig.go:39,209-212provider.go:156-163safeFilePerm()masks to 0755 (strips SUID/SGID/world-write)snapshot.go:17-20AI-assisted: this PR was generated with AI assistance. Every line has been reviewed, understood, and owned by the human who commits it.
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes