Skip to content

Conversation

jif-oai
Copy link
Contributor

@jif-oai jif-oai commented Sep 26, 2025

Extract and Centralize Sandboxing

  • Goal: Improve safety and clarity by centralizing sandbox planning and execution.
  • Approach:
    • Add planner (ExecPlan) and backend registry (Direct/Seatbelt/Linux) with run_with_plan.
    • Refactor codex.rs to plan-then-execute; handle failures/escalation via the plan.
    • Delegate apply_patch to the codex binary and run it with an empty env for determinism.

@jif-oai
Copy link
Contributor Author

jif-oai commented Sep 26, 2025

@codex review

Copy link
Contributor

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

@pakrym-oai
Copy link
Collaborator

This PR can use some integration tests for areas that are being touched.

Err(e) => Err(FunctionCallError::RespondToModel(format!(
"execution error: {e:?}"
Err(ExecError::Function(err)) => Err(err),
Err(ExecError::Codex(CodexErr::Sandbox(SandboxErr::Timeout { output }))) => Err(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth having more explicit members on ExecError ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of like this error nesting schema. We could collapse it but it would add redundance and it makes it easy to keep the structure. wdyt ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long nested error make it hard to know which meaningful scenarios there is to handle.

In addition we're leaking internals of executor here. All that consumer of executor needs to know is whether a timeout error happened or not. Today we force them to understand the executor-internal CodexErr::Sandbox + SandboxErr::Timeout


#[derive(Clone, Debug)]
pub(crate) struct ExecutorConfig {
pub(crate) sandbox_policy: SandboxPolicy,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should approval_policy be here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can change per turn. I think it would make things more complicate

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does sandbox policy

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at Op::UserTurn

stdout_stream,
};

sess.executor.update_environment(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels a bit risky to do this inline, in theory we shouldn't need this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Risky to do this inline ? I'm not sure I understand this point


match decision {
ReviewDecision::Approved => Ok(SandboxDecision::user_override(false)),
ReviewDecision::ApprovedForSession => Ok(SandboxDecision::user_override(true)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to record approval right here?

Copy link
Contributor Author

@jif-oai jif-oai Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, both are equivalent IMO. Ownership of the mutability of the approval_cache is slightly better defined with the current implementation IMO

Copy link
Collaborator

@pakrym-oai pakrym-oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR needs integration tests.

  • make sure you handle sandbox mode and cwd changes between turns.

@jif-oai jif-oai merged commit b8195a1 into main Oct 1, 2025
124 of 148 checks passed
@jif-oai jif-oai deleted the jif/sandbox-1 branch October 1, 2025 11:05
@github-actions github-actions bot locked and limited conversation to collaborators Oct 1, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants