test(core): improve sandbox integration test coverage and fix OS-specific failures#25307
test(core): improve sandbox integration test coverage and fix OS-specific failures#25307
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness of the Linux sandbox environment by introducing comprehensive integration tests and fixing a critical bug. The bug previously prevented write access to policy-authorized paths if they did not yet exist and the command was not explicitly recognized as a write operation, leading to read-only file system errors. The changes ensure that the sandbox consistently grants appropriate write permissions for allowed paths, aligning with the defined execution policy and improving overall sandbox reliability. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
Size Change: +617 B (0%) Total Size: 34.1 MB
ℹ️ View Unchanged
|
There was a problem hiding this comment.
Code Review
This pull request simplifies the Linux sandbox argument building by removing the isWriteCommand flag and consistently using --bind-try for parent directories of non-existent paths. Additionally, it significantly enhances the integration test suite for the SandboxManager, introducing comprehensive coverage for virtual commands, environment sanitization, sandbox mode transitions, recursive forbidden path protection, git worktree metadata security, and network access policies. I have no feedback to provide as no issues were identified in the changes.
|
Sent with Notion Mail <https://www.notion.so/product/mail>
On Apr 13, 2026 at 11:29 AM gemini-code-assist[bot] < ***@***.***> wrote:
***@***.***[bot]* commented on this pull request.
Code Review
This pull request simplifies the Linux sandbox argument building by
removing the isWriteCommand flag and consistently using --bind-try for
parent directories of non-existent paths. Additionally, it significantly
enhances the integration test suite for the SandboxManager, introducing
comprehensive coverage for virtual commands, environment sanitization,
sandbox mode transitions, recursive forbidden path protection, git worktree
metadata security, and network access policies. I have no feedback to
provide as no issues were identified in the changes.
—
Reply to this email directly, view it on GitHub
<#25307 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A6DTLFPMTQQKAG56A5QGA7L4VUPX3AVCNFSM6AAAAACXXVS2YCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DCMBQGY4DQMBQGY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
galz10
left a comment
There was a problem hiding this comment.
Code Review
Scope: Pull Request #25307
This pull request significantly improves the sandbox integration tests by introducing better isolation (fresh workspace per test) and increasing coverage for various sandbox features (virtual commands, environment sanitization, etc.). It also addresses platform-specific bugs, notably by ensuring "governance files" are initialized on macOS to allow the Seatbelt profile to target them effectively, and by simplifying the Linux Bubblewrap argument builder.
Metadata Review
- Title/Description: The title
test(core): improve sandbox integration test coverage and fix OS-specific failuresis accurate but slightly undersells the impact, as it also includes core logic changes in theSandboxManagerimplementations for parity and reliability. The description is clear and explains the motivation for the changes well.
Concerns (Action Required)
-
[MacOsSandboxManager.ts]: Workspace Pollution: The PR adds logic to automatically create
.git,.gitignore, and.geminiignorein the workspace on every command preparation if they don't already exist. While this ensures Seatbelt can protect these paths, it results in hidden files/folders being created in any directory where the CLI is run (e.g., a user's home directory or downloads folder).- Suggestion: Consider creating these files only if they are missing and the security policy specifically requires their presence, or document this behavior clearly. Alternatively, if Seatbelt can block creation using
(deny file-write-create ...)on the literal path even if it doesn't exist, that would be preferable to "touching" the files.
- Suggestion: Consider creating these files only if they are missing and the security policy specifically requires their presence, or document this behavior clearly. Alternatively, if Seatbelt can block creation using
-
[bwrapArgsBuilder.ts]: Linux Security Regression: By removing the
isWriteCommandcheck and always using--bind-tryfor the parent directory of non-existent allowed paths, the sandbox now grants read-write access to the parent directory even for supposedly read-only commands.- Suggestion: If a command is explicitly intended to be read-only (e.g., a
__readvirtual command), we should use--ro-bind-tryfor the parent directory instead of--bind-try. This preserves the principle of least privilege.
- Suggestion: If a command is explicitly intended to be read-only (e.g., a
-
[MacOsSandboxManager.ts]: Initialization Inefficiency: The governance file initialization loop (calling
touch) runs insideprepareCommand. For long-running sessions with many tool calls, this results in redundantlstatcalls and filesystem checks for every single command.- Suggestion: Move the initialization to the
SandboxManagerconstructor or a one-timeinitializemethod, or cache the fact that initialization has already been performed for the current workspace.
- Suggestion: Move the initialization to the
-
[sandboxManager.integration.test.ts]: Temporary Directory Cleanup: The refactor to
beforeEachfor workspace creation is excellent for isolation. However, since cleanup happens inafterAll, all temporary workspaces for every test in the file will persist on disk until the entire suite finishes.- Suggestion: Move the
rmSynclogic for the currentworkspaceinto anafterEachblock to keep the disk footprint minimal during test execution, while keepingafterAllas a fail-safe for other temporary directories created viacreateTempDir.
- Suggestion: Move the
Nits (Suggestions)
- [MacOsSandboxManager.ts]:
touchImplementation: The checkif (fs.lstatSync(filePath)) return;is slightly redundant becauselstatSyncwill either throw or return a truthyStatsobject. A simplertry { fs.lstatSync(filePath); return; } catch {}is sufficient and avoids the unnecessary truthiness check. - [MacOsSandboxManager.ts]:
touchError Handling: Thecatchblock intouchignores all errors. It would be safer to specifically catchENOENTand let other errors (likeEACCES) bubble up to help diagnose permission issues in the test environment. - [bwrapArgsBuilder.ts]: Comment Accuracy: The comment "Bind the parent directory as read-write" is correct given the implementation change, but it highlights the security concern mentioned above.
|
DavidAPierce
left a comment
There was a problem hiding this comment.
LGTM. Thanks for tackling this!
Comments addressed, David gave approval
Summary
This PR significantly improves the integration test coverage for the sandbox environments across all major operating systems (Linux, macOS, Windows). In the process of expanding this test suite to rigorously verify sandbox boundaries and policies, several OS-specific failures were identified and resolved, including initialization inefficiencies and a Bubblewrap permission bug on Linux.
Details
Test Coverage Enhancements (Primary)
sandboxManager.integration.test.tsto comprehensively exercise sandbox boundaries across Linux, macOS, and Windows.beforeAlltobeforeEachfor workspace initialization. Each test case now receives a fresh, isolated temporary directory to prevent state leakage (critical for Windows ACL checks). Added anafterEachcleanup block that specifically tracks and removes isolated test directories, drastically reducing the disk footprint during testing.bwrapArgsBuilder(Linux) andMacOsSandboxManager(macOS).Sandbox Logic Fixes (Secondary)
bwrap) Security:--bind-try) to any path inpolicyAllowedand its parent directory, unless the command is an explicit read-only virtual command (__read), in which case the parent directory is safely bound as read-only.bwrap) & Windows Optimization:ensureGovernanceFilesExist) to ensure that governance files (e.g.,.git,.gitignore) are secured and verified exactly once per session workspace, rather than redundantly on every single command execution. This reduces disk I/O and improves performance for long-running sessions.policy.allowedPathsinstead of non-existent target files, satisfying the Windows Sandbox requirement that granular access can only be granted to existing filesystem objects.ensureHelperCompiled) to be a globally static initialization, ensuring it only compiles once per Node process.Related Issues
Fixes #24932.
How to Validate
Windows, Linux, macOS
Run integration tests:
npm run test -w @google/gemini-cli-core -- src/services/sandboxManager.integration.test.tsLinux
Verify unit tests:
npm run test -w @google/gemini-cli-core -- src/sandbox/linux/bwrapArgsBuilder.test.ts