Skip to content

feat: implement sandbox orchestrator and harden Lima provider#3

Merged
muneebs merged 4 commits intomainfrom
feat/sandbox-orchestrator-security-fixes
Apr 16, 2026
Merged

feat: implement sandbox orchestrator and harden Lima provider#3
muneebs merged 4 commits intomainfrom
feat/sandbox-orchestrator-security-fixes

Conversation

@muneebs
Copy link
Copy Markdown
Owner

@muneebs muneebs commented Apr 16, 2026

Summary

  • PR feat: implement sandbox orchestrator and harden Lima provider #3 — Sandbox Orchestrator: Implements the internal/sandbox package with full lifecycle management (create, run, start, stop, destroy, reset, list, status) following SOLID principles with injectable dependencies
  • Security fixes: Addresses all 5 findings from the security audit of the Lima VM provider

Sandbox Orchestrator (PR #3 from TASKS.md)

File Purpose
sandbox.go Manager struct, JSON state persistence, Resetter/ResourceChecker interfaces, error types, compile-time api.SandboxManager check
create.go Create workflow: validate → resources → detect runtime → resolve profile → provision VM → apply network → register mounts
run.go Run (exec command as user) + Start workflows
stop.go Graceful stop workflow
delete.go Destroy workflow: stop VM → delete → cleanup mounts → remove state
reset.go Reset workflow: check snapshot → stop → restore clean → restart
list.go List + Status with live VM state reconciliation (skips errored sandboxes)
sandbox_test.go 36 tests with faked deps — all workflows, error paths, persistence, edge cases

Key design decisions (per PRINCIPLES.md):

  • DI/SOLID: Manager depends on api.Provider, api.NetworkController, api.MountManager interfaces. Resetter is ISP-compliant separate interface.
  • SRP: Each workflow in its own file
  • Safe defaults: cautious profile default, network locked after setup, read-only mounts
  • Resource checks: ResourceChecker is injectable for testing; real impl checks CPU/memory/disk before VM creation
  • No resource leaks: Errors set StateErrored; Destroy always cleans up; StateErrored not overwritten by live status

Security Fixes (from audit)

# Finding Fix Location
1 Insecure config file perms (0644) → 0600 (owner-only) provider.go:63
2 Sensitive host mount paths unblocked Blocklist: /etc, /root, /proc, /sys, /dev, /home, /var/run/docker.sock, etc. config.go:26-38,160-170
3 Shell injection via provision cmds Regex rejects ;&|$\!` characters config.go:39,209-212
4 Broken arg boundaries in ExecAsUser Each arg escaped individually before joining provider.go:156-163
5 SUID/world-writable preserved in snapshots safeFilePerm() masks to 0755 (strips SUID/SGID/world-write) snapshot.go:17-20

AI-assisted: this PR was generated with AI assistance. Every line has been reviewed, understood, and owned by the human who commits it.

Summary by CodeRabbit

Release Notes

  • New Features

    • Full sandbox lifecycle management: create, start, stop, destroy, and reset sandboxes
    • List all sandboxes and check their status
    • Execute commands within sandboxes
    • Configure sandboxes with profiles and resource specifications
    • Network policy enforcement via security profiles
    • Snapshot-based sandbox reset to clean state
  • Bug Fixes

    • Enhanced file permission handling to prevent unsafe permission propagation
    • Stricter validation of mount paths to block sensitive system locations
    • Improved command execution safety to prevent injection attacks

PR #3 — Sandbox Orchestrator:
- Add internal/sandbox/ package implementing api.SandboxManager
- sandbox.go: Manager struct with JSON-backed state persistence,
  Resetter interface for snapshot support, injectable ResourceChecker
- create.go: Full create workflow (validate -> resources -> runtime
  detection -> profile resolution -> VM provision -> network policy
  -> mount registration)
- run.go: Run workflow (start VM, exec command as user) + Start method
- stop.go: Graceful stop workflow
- delete.go: Destroy workflow (stop -> delete VM -> cleanup mounts -> state)
- reset.go: Reset workflow (check snapshot -> stop -> restore -> restart)
- list.go: List and Status with live state reconciliation from VM
- sandbox_test.go: 36 tests with faked dependencies covering all
  workflows, error paths, persistence, and edge cases
- Compile-time interface check: var _ api.SandboxManager = (*Manager)(nil)

Security fixes (from audit):
- Fix #1: Lima config file permissions 0644 -> 0600 (owner-only)
- Fix #2: Block sensitive host mount paths (/etc, /root, /proc,
  /sys, /dev, /home, /var/run/docker.sock, etc.) via allowlist
- Fix #3: Reject shell metacharacters (;;&|$`!) in provision commands
- Fix #4: Individual argument escaping in ExecAsUser preserves
  argument boundaries (was joining then escaping, now escapes each
  arg separately)
- Fix #5: Permission masking in snapshot/restore strips SUID, SGID,
  and world-write bits via safeFilePerm() mask to 0755

AI-assisted: this PR was generated with AI assistance. Every line has
been reviewed, understood, and owned by the human who commits it.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

Warning

Rate limit exceeded

@muneebs has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 1 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 2 minutes and 1 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 79b8207f-8350-46a1-a668-733e97d699b0

📥 Commits

Reviewing files that changed from the base of the PR and between 1e53041 and e9e4a4b.

📒 Files selected for processing (11)
  • internal/sandbox/create.go
  • internal/sandbox/delete.go
  • internal/sandbox/list.go
  • internal/sandbox/reset.go
  • internal/sandbox/run.go
  • internal/sandbox/sandbox.go
  • internal/sandbox/sandbox_test.go
  • internal/sandbox/stop.go
  • internal/vm/lima/config.go
  • internal/vm/lima/config_test.go
  • internal/vm/lima/provider.go

Walkthrough

This PR implements a complete sandbox orchestration system with a Manager that coordinates VM lifecycle (create, destroy, start, stop, reset, run) using a provider pattern, integrating runtime detection, profile-based configuration, network policies, mount management, and persistent JSON state storage, while hardening the Lima provider with mount path validation, shell injection prevention, and permission masking.

Changes

Cohort / File(s) Summary
Core Sandbox Manager
internal/sandbox/sandbox.go, internal/sandbox/create.go, internal/sandbox/delete.go
Introduces Manager orchestrating sandbox lifecycle with persistent JSON state. Create validates specs, resolves runtime/profile, provisions VM, applies network policies, registers mounts, and persists state. Destroy stops and deletes VMs, unregisters mounts, and cleans up internal state.
Sandbox State Operations
internal/sandbox/list.go, internal/sandbox/stop.go, internal/sandbox/run.go, internal/sandbox/reset.go
Implements state queries (List, Status) with provider sync, VM control (Stop, Start), command execution (Run), and snapshot-based reset with state persistence across transitions.
Lima Provider Security Hardening
internal/vm/lima/config.go, internal/vm/lima/provider.go, internal/vm/lima/snapshot.go
Adds mount path validation blocking sensitive system directories, shell metacharacter filtering in provisioning commands, restrictive file permissions (0600 for configs), per-argument shell escaping in ExecAsUser, and permission masking (0755) during snapshot creation/restore.
Test Coverage
internal/sandbox/sandbox_test.go, internal/vm/lima/config_test.go, internal/vm/lima/snapshot_test.go
Comprehensive test suite covering sandbox CRUD, lifecycle state transitions, network locking behavior, mount registration, command execution, provider integration, state persistence, validation failures, permission masking, shell injection prevention, and error formatting.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Manager
    participant Detector
    participant Profiles
    participant Provider
    participant Network
    participant Mounts
    
    Client->>Manager: Create(ctx, spec)
    Manager->>Manager: Validate name uniqueness
    Manager->>Manager: Check available resources
    Manager->>Detector: Detect runtime from spec.Source
    Manager->>Profiles: Resolve profile (default: "cautious")
    Manager->>Manager: Compute VM spec (CPU/Memory/Disk)
    Manager->>Provider: Create VM
    Manager->>Provider: Start VM
    Manager->>Network: Apply network policy from profile
    Manager->>Network: Lock sandbox (if profile.LockAfterSetup)
    Manager->>Mounts: Register mount from spec.Source
    Manager->>Manager: Persist sandbox state (StateRunning)
    Manager->>Client: Return SandboxInfo
Loading
sequenceDiagram
    actor Client
    participant Manager
    participant Provider
    participant Storage
    
    Client->>Manager: List(ctx)
    Manager->>Storage: Load all sandbox entries
    loop For each sandbox
        Manager->>Provider: IsRunning(ctx, name)
        alt Provider reports state change
            Manager->>Manager: Update info.State
            Manager->>Storage: Persist updated state
        end
    end
    Manager->>Client: Return SandboxInfo[]
    
    Client->>Manager: Status(ctx, name)
    Manager->>Storage: Get sandbox by name
    alt Sandbox state is Errored
        Manager->>Client: Return cached info
    else
        Manager->>Provider: IsRunning(ctx, name)
        alt State mismatch detected
            Manager->>Manager: Update info.State
            Manager->>Storage: Persist updated state
        end
        Manager->>Client: Return updated SandboxInfo
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 Hop into sandboxes, safe and secure!
With profiles and providers, the lifecycle's pure.
Mounts register, networks lock tight,
States persist through the day and the night. ✨🏗️

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.68% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: implement sandbox orchestrator and harden Lima provider' directly corresponds to the two main themes of the pull request: the new sandbox orchestrator implementation and security hardening of the Lima provider.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sandbox-orchestrator-security-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🧹 Nitpick comments (8)
internal/vm/lima/snapshot.go (1)

82-82: Minor: Redundant .Perm() call.

safeFilePerm already returns m.Perm() & 0755, which is a permission value. The subsequent .Perm() call is a no-op since the type bits are already stripped.

♻️ Optional simplification
-		return os.Chmod(targetPath, safeFilePerm(info.Mode()).Perm())
+		return os.Chmod(targetPath, safeFilePerm(info.Mode()))

And similarly in copyDir at line 211.

Also applies to: 211-211

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/vm/lima/snapshot.go` at line 82, The call to .Perm() is redundant
because safeFilePerm already returns a FileMode with permissions applied; remove
the trailing .Perm() when passing its result to os.Chmod. Update the os.Chmod
invocation in snapshot.go (the call that uses targetPath and safeFilePerm) and
the analogous call inside copyDir to pass safeFilePerm(...) directly instead of
safeFilePerm(...).Perm().
internal/sandbox/run.go (1)

50-77: Start method has consistent patterns with other lifecycle methods.

The implementation follows the same lock/check/execute/update pattern. It shares the same stale state concerns, but the pattern is at least consistent across the codebase.

One note: when the VM is already running (lines 63-65), the method returns nil without updating persistence. This is correct since no state change occurred, but consider whether the persisted state should be reconciled if it somehow got out of sync (e.g., info.State was not StateRunning but VM is actually running).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/sandbox/run.go` around lines 50 - 77, The Start method returns early
when provider.IsRunning reports the VM is already running but doesn't reconcile
persisted state; update Start to, after detecting running == true, acquire the
same lock, fetch or use the existing info, set info.State = api.StateRunning if
it differs, call m.put(info) to persist, and then return nil, ensuring locking
around access to info and put (use m.mu.Lock()/Unlock() similar to the rest of
the method) so the stored state stays consistent with provider.IsRunning.
internal/vm/lima/config.go (2)

13-14: Import ordering: strconv should be grouped with standard library imports.

Go convention groups standard library imports together, separate from external packages.

♻️ Fix import grouping
 import (
 	"fmt"
 	"path/filepath"
 	"regexp"
+	"strconv"
 	"strings"
 
-	"strconv"
-
 	"github.com/muneebs/airlock/internal/api"
 	"gopkg.in/yaml.v3"
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/vm/lima/config.go` around lines 13 - 14, The import ordering in
internal/vm/lima/config.go is incorrect: move "strconv" into the block with
other standard library imports so all stdlib packages are grouped together
(update the import block containing strconv and other stdlib entries in that
file to follow Go convention).

19-35: Sensitive mount paths list contains redundant entries and may be overly restrictive.

  1. Redundancy: /etc/shadow, /etc/ssh, /etc/ssl, /etc/pki are already blocked by the /etc prefix check in isSensitiveMountPath.

  2. Usability concern: Blocking /home entirely prevents legitimate use cases on Linux where users typically mount project directories from /home/username/projects/. The test confirms /home/user/projects/app is blocked while /Users/alice/projects/app (macOS) is allowed.

If this asymmetry between Linux and macOS is intentional (e.g., assuming macOS-only usage), consider documenting this. Otherwise, consider a more granular approach that blocks /home root but allows subdirectories beyond the first level.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/vm/lima/config.go` around lines 19 - 35, The sensitiveMountPaths
list contains redundant entries under /etc and is overbroad for /home; remove
entries "/etc/shadow", "/etc/ssh", "/etc/ssl", "/etc/pki" from the
sensitiveMountPaths slice and update the isSensitiveMountPath logic so that
"/etc" continues to block everything under /etc but "/home" only blocks the root
"/home" itself (allowing user subdirectories like /home/user/...); reference
sensitiveMountPaths and isSensitiveMountPath to locate and change the list and
the matching logic accordingly.
internal/sandbox/sandbox_test.go (1)

483-483: Remove vestigial blank identifier assignment.

The _ = resetter assignment serves no purpose since resetter is already used at line 499.

Proposed fix
 	if info.State != api.StateRunning {
 		t.Errorf("expected state running after reset, got %s", info.State)
 	}
-
-	_ = resetter
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/sandbox/sandbox_test.go` at line 483, Remove the vestigial blank
identifier assignment "_ = resetter" — it is redundant because the variable
resetter is used later (e.g., at line where resetter is referenced) so delete
the "_ = resetter" statement to clean up dead code and avoid pointless no-op
assignments.
internal/sandbox/sandbox.go (2)

200-227: DRY: Default values duplicated across multiple functions.

The defaults (CPU=2, Memory="4GiB", Disk="20GiB") are repeated in newSandboxInfo, CheckResourcesForSpec, and defaultSpec() in create.go. Consider extracting these to package-level constants or a single defaultResources() function.

Also applies to: 229-251

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/sandbox/sandbox.go` around lines 200 - 227, The CPU, Memory and Disk
default values are duplicated across newSandboxInfo, CheckResourcesForSpec, and
defaultSpec; extract them into shared package-level constants (e.g., defaultCPU,
defaultMemory, defaultDisk) or a single helper function (e.g.,
defaultResources() returning CPU, Memory, Disk) and update newSandboxInfo,
CheckResourcesForSpec, and defaultSpec to use those constants/helper instead of
hardcoding 2/"4GiB"/"20GiB" so defaults live in one place.

93-103: Consider restricting store file permissions to 0600.

The PR hardens Lima config permissions to 0600 (per PR objectives), but the sandbox store is written with 0644. While sandbox metadata may be less sensitive than VM configs, applying consistent restrictive permissions reduces attack surface.

Proposed fix
-	return os.WriteFile(m.storePath, data, 0644)
+	return os.WriteFile(m.storePath, data, 0600)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/sandbox/sandbox.go` around lines 93 - 103, The sandbox store is
written with world-readable permissions; in func (m *Manager) save() tighten the
file permissions when persisting m.sandboxes by changing the os.WriteFile call
to use 0600 for m.storePath so only the owner can read/write, and optionally
tighten the directory creation (currently os.MkdirAll(dir, 0755)) to 0700 to
restrict access to the store directory; update the behavior in Manager.save
around m.storePath and the MkdirAll call accordingly.
internal/sandbox/list.go (1)

11-40: Holding mutex during provider I/O calls may cause contention.

List() holds m.mu while calling m.provider.IsRunning() for each sandbox. If the provider call is slow (network/IPC), this blocks all other Manager operations. Consider collecting sandbox names under the lock, releasing it, then querying provider status, and finally re-acquiring to reconcile.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/sandbox/list.go` around lines 11 - 40, Manager.List currently holds
m.mu while calling m.provider.IsRunning which can block other operations; change
List to first lock m.mu and copy the necessary sandbox identifiers and a shallow
copy of m.sandboxes (use m.sandboxes or their names) into a local slice, then
unlock before iterating and calling m.provider.IsRunning for each entry; after
collecting running states, re-acquire m.mu to reconcile state changes (updating
info.State and calling m.put(info) as needed) and build the resulting
[]api.SandboxInfo to return; ensure you still handle api.StateErrored without
calling provider and preserve existing behavior for m.put and result ordering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/sandbox/create.go`:
- Around line 20-25: The current Create path performs a non-atomic existence
check on m.sandboxes (m.mu.Lock / m.mu.Unlock around the lookup) but releases
the lock before resource validation and calling m.put(info), causing a TOCTOU
race that can allow duplicate sandboxes; fix by making the check-and-insert
atomic: hold m.mu for the entire create critical section or implement an atomic
check-and-set (e.g., reserve the name in m.sandboxes immediately after the
existence check and before releasing the lock, or move the m.put(info) call
inside the same lock scope). Ensure you update places referencing
m.mu.Lock/m.mu.Unlock, the existence check for spec.Name, and m.put(info) so the
name reservation and insertion happen while the mutex is held.
- Around line 91-95: The if/else in create.go currently sets info.State =
api.StateRunning in both branches (checking spec.Ephemeral), so remove the
redundant conditional and set info.State = api.StateRunning unconditionally, or
if ephemeral should behave differently implement the intended logic for
spec.Ephemeral (e.g., set a different state or perform ephemeral-specific
initialization) by updating the branch handling around spec.Ephemeral and
info.State accordingly.
- Around line 60-66: When m.provider.Start(ctx, spec.Name) fails after a
successful Create, call m.provider.Delete(ctx, spec.Name) to avoid leaving an
orphaned VM; perform the Delete attempt before marking info.State =
api.StateErrored and persisting with m.put(info), handle/log any Delete error
(but don't fail to update the sandbox state), and keep the existing mutex usage
around the state update (i.e., acquire m.mu, set info.State, call m.put(info),
release m.mu) so cleanup is attempted and the sandbox is still recorded as
errored if Start fails.

In `@internal/sandbox/delete.go`:
- Around line 17-22: The deletion logic currently ignores errors from
m.provider.IsRunning and may proceed to delete a VM that's actually running;
update the delete flow in delete.go to handle the IsRunning error explicitly:
after calling m.provider.IsRunning(ctx, name) check if err != nil and return (or
wrap) that error (e.g., "check running state: %w") instead of continuing, only
call m.provider.Stop(ctx, name) when IsRunning succeeds and running is true, and
preserve the existing Stop error wrapping ("stop VM before delete: %w").
- Line 35: The current call m.mounts.Unregister(ctx, name, name) incorrectly
uses the sandbox name for both sandbox and mount identifiers; instead, fetch all
mounts for the sandbox (e.g. call m.mounts.List(ctx, name) or the appropriate
list method to get mounts for sandbox `name`) and iterate them, calling
m.mounts.Unregister(ctx, name, mount.Name) (or the mount's identifier field) for
each entry, using the same ctx and sandbox `name` variable so every mount
registered under that sandbox is removed.

In `@internal/sandbox/list.go`:
- Around line 60-70: The code updates info.State (the sandbox entry from
m.sandboxes) outside the mutex, causing a data race; acquire m.mu before
mutating info.State and calling m.put(info) and release after both operations.
Specifically, in the branch checks around running and info.State (the block that
currently sets info.State = api.StateRunning and info.State = api.StateStopped),
move the assignment into the critical section guarded by m.mu so the sequence
becomes: m.mu.Lock(), set info.State, _ = m.put(info), m.mu.Unlock(). Keep the
initial read-only checks outside the lock if desired, but any mutation of
info.State or calls to m.put must occur while holding m.mu to avoid races.

In `@internal/sandbox/reset.go`:
- Around line 34-37: The code silently ignores errors from m.put(info) after
updating info.State = api.StateStopped inside the m.mu.Lock()/Unlock() block;
change this so persistence failures are surfaced—capture the error returned by
m.put(info), log it via the manager logger (or return it wrapped alongside the
reset result) and ensure the lock is released in all cases; specifically update
the reset path that calls m.put(info) to handle and propagate/log the error
instead of discarding it.
- Around line 13-19: The Reset method has a TOCTOU race: Manager.get grabs info
under m.mu then unlocks, so concurrent Destroy/Stop can mutate or remove the
sandbox before the subsequent calls to info.IsRunning and info.RestoreClean; fix
by either holding m.mu for the whole Reset operation or by making a deep copy of
the sandbox state returned by get (i.e., copy the fields used by Reset into a
local struct/value) and operate on that copy; specifically update Manager.Reset
to avoid using the shared info pointer after unlocking (or reacquire the lock
around calls to info.IsRunning and info.RestoreClean) to prevent races with
Destroy/Stop.

In `@internal/sandbox/run.go`:
- Around line 14-47: The Run method currently reads VM info under m.mu, releases
the lock, then later re-acquires it and modifies info.State without re-reading
the latest stored state and ignores the error returned from m.put(info); update
Run (the Manager.Run function) to re-fetch info after Start succeeds when you
re-acquire m.mu (call m.get(name) again) and only update info.State if the
current stored state still indicates not running, and handle the error returned
from m.put(info) (propagate or wrap and return it instead of ignoring); ensure
you still call provider.Start/ExecAsUser as before but protect state mutation
and persistence with the lock and proper error handling.

In `@internal/sandbox/sandbox_test.go`:
- Around line 748-758: The test TestCheckResourcesForSpec currently calls
CheckResourcesForSpec and discards the returned issues; update the test to
assert the expected behavior by checking the contents/length of the returned
issues (e.g., expect no issues for the given valid spec or specific validation
errors) or remove the test entirely if it was only for coverage; locate the test
function TestCheckResourcesForSpec and add assertions against the issues
variable (or delete the test) to ensure the result is validated.

In `@internal/sandbox/sandbox.go`:
- Around line 124-148: In resolveResources, avoid dereferencing cpu directly
since cfgDefaults.CPU may be nil; update the CPU selection to safely handle nil
(e.g., implement a helper like derefOr(p *int, def int) int or an inline nil
check) and use that to set VMSpec.CPU instead of *cpu; reference the
resolveResources function, the cpu variable, cfgDefaults.CPU, and the VMSpec CPU
assignment to locate and update the code.

In `@internal/sandbox/stop.go`:
- Around line 11-46: The Stop method currently reads info under m.mu then
releases the lock and later mutates that stale info without re-reading it;
re-acquire the lock and call m.get(name) to refresh info before each mutation
(before setting info.State to api.StateStopped or api.StateErrored) to avoid
races, and check/handle errors returned from m.put(info) (wrap/return an error
instead of ignoring) so persistence failures are surfaced; update the branches
that call provider.IsRunning and provider.Stop to refresh info under lock and to
return a wrapped error if m.put fails, referencing Manager.Stop, m.get, m.put,
provider.IsRunning, provider.Stop, and the api.StateStopped/api.StateErrored
state updates.

In `@internal/vm/lima/config.go`:
- Around line 37-39: The current dangerousShellChars regexp (variable
dangerousShellChars) misses many shell metacharacters (newline, quotes,
parentheses, redirection, backslash, etc.), so replace the blacklist approach
with a whitelist validation: create/replace the check to use a safeProvisionCmd
regexp that only allows a narrow set of characters (e.g., alphanumerics, space,
hyphen, underscore, dot, slash and a minimal set of safe punctuation) and forbid
any newline (\n/\r) or characters like ; & | $ ` \ " ' < > ( ) and backslash;
update any code paths using dangerousShellChars to validate against the new
whitelist (e.g., safeProvisionCmd.MatchString) and reject/return an error if the
provision string fails the whitelist.

In `@internal/vm/lima/provider.go`:
- Around line 161-165: The shell escaping currently performed by shellEscape
does not wrap arguments in single quotes, so arguments with spaces are split
when building shellCmd in the loop that creates escapedArgs and then joins them
for bash -c; update shellEscape to both replace single quotes inside the
argument with the standard '\'"\'"' sequence and then wrap the whole result in
surrounding single quotes (i.e., return "'" + replaced + "'"), so that shellCmd
:= fmt.Sprintf("sudo -u %s bash --login -c '%s'", user,
strings.Join(escapedArgs, " ")) preserves argument boundaries for values like
"hello world"; also update or strengthen
TestLimaProviderExecAsUserArgPreservation to assert actual argument boundaries
rather than mere substring presence.

---

Nitpick comments:
In `@internal/sandbox/list.go`:
- Around line 11-40: Manager.List currently holds m.mu while calling
m.provider.IsRunning which can block other operations; change List to first lock
m.mu and copy the necessary sandbox identifiers and a shallow copy of
m.sandboxes (use m.sandboxes or their names) into a local slice, then unlock
before iterating and calling m.provider.IsRunning for each entry; after
collecting running states, re-acquire m.mu to reconcile state changes (updating
info.State and calling m.put(info) as needed) and build the resulting
[]api.SandboxInfo to return; ensure you still handle api.StateErrored without
calling provider and preserve existing behavior for m.put and result ordering.

In `@internal/sandbox/run.go`:
- Around line 50-77: The Start method returns early when provider.IsRunning
reports the VM is already running but doesn't reconcile persisted state; update
Start to, after detecting running == true, acquire the same lock, fetch or use
the existing info, set info.State = api.StateRunning if it differs, call
m.put(info) to persist, and then return nil, ensuring locking around access to
info and put (use m.mu.Lock()/Unlock() similar to the rest of the method) so the
stored state stays consistent with provider.IsRunning.

In `@internal/sandbox/sandbox_test.go`:
- Line 483: Remove the vestigial blank identifier assignment "_ = resetter" — it
is redundant because the variable resetter is used later (e.g., at line where
resetter is referenced) so delete the "_ = resetter" statement to clean up dead
code and avoid pointless no-op assignments.

In `@internal/sandbox/sandbox.go`:
- Around line 200-227: The CPU, Memory and Disk default values are duplicated
across newSandboxInfo, CheckResourcesForSpec, and defaultSpec; extract them into
shared package-level constants (e.g., defaultCPU, defaultMemory, defaultDisk) or
a single helper function (e.g., defaultResources() returning CPU, Memory, Disk)
and update newSandboxInfo, CheckResourcesForSpec, and defaultSpec to use those
constants/helper instead of hardcoding 2/"4GiB"/"20GiB" so defaults live in one
place.
- Around line 93-103: The sandbox store is written with world-readable
permissions; in func (m *Manager) save() tighten the file permissions when
persisting m.sandboxes by changing the os.WriteFile call to use 0600 for
m.storePath so only the owner can read/write, and optionally tighten the
directory creation (currently os.MkdirAll(dir, 0755)) to 0700 to restrict access
to the store directory; update the behavior in Manager.save around m.storePath
and the MkdirAll call accordingly.

In `@internal/vm/lima/config.go`:
- Around line 13-14: The import ordering in internal/vm/lima/config.go is
incorrect: move "strconv" into the block with other standard library imports so
all stdlib packages are grouped together (update the import block containing
strconv and other stdlib entries in that file to follow Go convention).
- Around line 19-35: The sensitiveMountPaths list contains redundant entries
under /etc and is overbroad for /home; remove entries "/etc/shadow", "/etc/ssh",
"/etc/ssl", "/etc/pki" from the sensitiveMountPaths slice and update the
isSensitiveMountPath logic so that "/etc" continues to block everything under
/etc but "/home" only blocks the root "/home" itself (allowing user
subdirectories like /home/user/...); reference sensitiveMountPaths and
isSensitiveMountPath to locate and change the list and the matching logic
accordingly.

In `@internal/vm/lima/snapshot.go`:
- Line 82: The call to .Perm() is redundant because safeFilePerm already returns
a FileMode with permissions applied; remove the trailing .Perm() when passing
its result to os.Chmod. Update the os.Chmod invocation in snapshot.go (the call
that uses targetPath and safeFilePerm) and the analogous call inside copyDir to
pass safeFilePerm(...) directly instead of safeFilePerm(...).Perm().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 34cec265-445b-44e0-8bb9-35283d58c7f4

📥 Commits

Reviewing files that changed from the base of the PR and between 732dba7 and 1e53041.

📒 Files selected for processing (13)
  • internal/sandbox/create.go
  • internal/sandbox/delete.go
  • internal/sandbox/list.go
  • internal/sandbox/reset.go
  • internal/sandbox/run.go
  • internal/sandbox/sandbox.go
  • internal/sandbox/sandbox_test.go
  • internal/sandbox/stop.go
  • internal/vm/lima/config.go
  • internal/vm/lima/config_test.go
  • internal/vm/lima/provider.go
  • internal/vm/lima/snapshot.go
  • internal/vm/lima/snapshot_test.go

Comment thread internal/sandbox/create.go Outdated
Comment thread internal/sandbox/create.go
Comment thread internal/sandbox/create.go Outdated
Comment thread internal/sandbox/delete.go
Comment thread internal/sandbox/delete.go Outdated
Comment thread internal/sandbox/sandbox_test.go
Comment thread internal/sandbox/sandbox.go
Comment thread internal/sandbox/stop.go
Comment thread internal/vm/lima/config.go Outdated
Comment thread internal/vm/lima/provider.go
muneebs added 3 commits April 16, 2026 21:29
…haned VMs

- Create: make existence check + name reservation atomic (TOCTOU fix);
  add concurrent duplicate test proving only one sandbox is created
- Create: delete VM on Start failure to avoid orphaned VMs
- Create: collapse redundant Ephemeral if/else into unconditional assignment
- Create: return empty SandboxInfo on final put error (was returning *info
  with stale state)
- Delete: return error from IsRunning instead of silently proceeding
- Delete: unregister all mounts by listing first, not just the sandbox-named one
- Stop: re-read info under lock before each state mutation (stale pointer fix)
- Stop: surface m.put errors instead of discarding them
- Stop: surface m.put errors on success path; keep discard on error path where
  Stop error takes priority
- resolveResources: nil-safe CPU dereference via derefInt helper
- TestCheckResourcesForSpec: assert non-nil return instead of discarding result
…rgs in quotes

- Replace blacklist regex (dangerousShellChars) with whitelist regex
  (safeProvisionCmd) that only permits alphanumerics, spaces, hyphens,
  underscores, dots, slashes, colons, equals, tildes, plus, comma,
  hash, and at-sign. Rejects newline, carriage return, quotes,
  parentheses, redirection, backslash, and all shell metacharacters.
- Update shellEscape to wrap each argument in single quotes so spaces
  and special characters preserve argument boundaries when passed to
  bash -c. Previously "hello world" would be split into two args.
- Replace TestShellEscape with TestShellEscapeWrapsInQuotes verifying
  quoting behavior; strengthen TestLimaProviderExecAsUserArgPreservation
  to assert single-quoted "hello world" token instead of substring match.
- Rename TestGenerateConfigValidationDangerousProvisionCmd to
  TestGenerateConfigValidationProvisionCmdWhitelist with expanded cases
  covering 10 safe commands and 14 injection patterns.
…rrors

- list.go (Status): move info.State mutation inside m.mu lock scope;
  re-read via m.get(name) to avoid stale pointer after concurrent writes
- reset.go: re-read info via m.get(name) under lock before every state
  mutation; guard against nil info if sandbox deleted concurrently;
  surface m.put errors on success paths (Stopped, Running state
  transitions); keep discard on error paths where primary error dominates
- run.go (Run + Start): re-read info via m.get(name) under lock before
  mutating State; only update if current state is not already Running;
  surface m.put errors instead of discarding
@muneebs muneebs merged commit d07bfba into main Apr 16, 2026
1 check passed
@muneebs muneebs deleted the feat/sandbox-orchestrator-security-fixes branch April 16, 2026 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant