fix(exec): allow known safe shell builtins in allowlist mode by kinjitakabe · Pull Request #79363 · openclaw/openclaw

kinjitakabe · 2026-05-08T11:19:13Z

Summary

Treat a closed internal set of POSIX shell builtins as safe during exec allowlist evaluation. This avoids approval prompts for harmless shell segments like cd /tmp && git status when the real executable segment is already allowlisted.

No new config surface is added. The earlier tools.exec.safeBuiltins proposal was removed; the behavior is default-only and limited to :, cd, false, pwd, and true in shell-command allowlist analysis. Environment-mutating builtins such as export and unset, code-evaluating builtins such as eval, source, and ., unknown commands, and direct argv execution remain approval-gated unless separately allowlisted.

Fixes #46056.

What changed

Added src/infra/exec-safe-builtins.ts with the closed internal safe-builtin classifier.
Updated src/infra/exec-approvals-allowlist.ts so only shell-command allowlist evaluation can satisfy a segment through that classifier.
Added focused regression coverage in src/infra/exec-safe-builtins.test.ts for the reporter case, unknown binaries, environment-mutating builtins, Windows behavior, and direct argv evaluation.
Updated src/infra/exec-approvals-analysis.ts metadata typing for the new satisfaction kind.
Updated src/agents/bash-tools.exec.security-floor.test.ts to keep auto-review security-floor coverage on a non-safe command.

What did not change

No tools.exec.safeBuiltins config key.
No config schema, config docs, generated baseline, or changelog changes.
No safeBins behavior changes.
No Windows / PowerShell widening.

Real behavior proof

Behavior addressed: Allowlist mode previously gated shell chains like cd /tmp && git status on the pathless cd segment even when git was allowlisted.

Real environment tested: Local macOS source checkout, Node via repo pnpm wrapper; remote Testbox changed gate attempted twice.

Exact steps or command run after this patch:

pnpm test src/infra/exec-safe-builtins.test.ts src/agents/bash-tools.exec.security-floor.test.ts -- --reporter=verbose
pnpm changed:lanes --json
pnpm check:no-conflict-markers
pnpm lint:core
pnpm check:changed
pnpm check:changed

Evidence after fix: Focused Vitest passed 10 tests in src/infra/exec-safe-builtins.test.ts and 13 tests in src/agents/bash-tools.exec.security-floor.test.ts. pnpm changed:lanes --json reported core/coreTests only for the three touched files. pnpm check:no-conflict-markers passed. git diff --check origin/main...HEAD passed.

Observed result after fix: cd ~/ and cd /tmp && git status satisfy shell allowlist evaluation through safeBuiltins plus the existing allowlist match. cd /tmp && curl evil.com, export PATH=/tmp/bin:$PATH && git status, and direct argv pwd remain gated.

What was not tested: pnpm check:changed did not reach checks in either Testbox attempt. Both runs failed during remote dependency install before executing the gate, with missing postinstall modules for @matrix-org/matrix-sdk-crypto-nodejs / https-proxy-agent and esbuild. pnpm lint:core, pnpm tsgo:prod, and pnpm check:test-types also failed on an unrelated current-main Discord boundary dts error in extensions/discord/src/monitor/gateway-plugin.ts, which this PR does not touch.

clawsweeper · 2026-05-08T11:22:14Z

Codex review: needs real behavior proof before merge. Reviewed May 31, 2026, 8:32 AM ET / 12:32 UTC.

Summary
The PR adds an internal POSIX safe-builtin classifier, enables it by default for shell-command allowlist evaluation, and adds focused tests for builtin, Windows, environment-mutating, and direct-argv cases.

PR surface: Source +40, Tests +148. Total +188 across 3 files.

Reproducibility: yes. source inspection gives a high-confidence path for the merge blocker: cd changes shell cwd, while later chain segments are still resolved with the original cwd. I did not run a live patched OpenClaw flow in this read-only review.

Review metrics: 1 noteworthy metric.

Implicit safe builtins: 5 added by default. The PR changes exec allowlist behavior without a config knob, so maintainers should review the new default security boundary explicitly.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🧂 unranked krab
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

[P1] Remove cd from the default safe builtin set or implement cwd-aware shell-chain analysis with regression coverage.
[P1] Add redacted configured OpenClaw exec allowlist proof after the patch; updating the PR body should trigger a fresh ClawSweeper review, or a maintainer can ask @clawsweeper re-review.

Proof guidance:

[P1] Needs real behavior proof before merge: The PR body reports focused tests and check attempts, but it does not show a real configured OpenClaw exec allowlist flow after the patch; redacted terminal output, live output, or logs are still needed, with private details removed. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

[P1] Merging as-is can approve cwd-sensitive shell chains using a different directory context than the shell will use at execution time.
[P1] The PR changes default exec allowlist behavior without a config knob, so maintainers need an explicit security/product decision on which builtins are safe by default.
[P1] The PR body still lacks redacted terminal/live output or logs from a configured OpenClaw exec allowlist flow after the patch.

Maintainer options:

Make cd cwd-aware before merge (recommended)
Propagate cd effects through shell-chain allowlist analysis and add node/gateway regression coverage for cwd-relative executables, or remove cd from the safe set until that exists.
Ship stateless builtins only
Keep only :, true, false, and possibly pwd as default safe builtins if maintainers want a narrower no-config behavior change.
Defer the policy change
Pause or close this PR if default implicit approval of shell builtins is not worth expanding the exec approval model now.

Next step before merge

[P1] Manual review is needed because the remaining blocker is the exec security/product boundary for default builtin allowlisting, not a narrow mechanical cleanup.

Security
Needs attention: The diff broadens exec allowlist behavior and has a concrete cwd-sensitive approval-boundary concern around cd.

Review findings

[P1] Keep cd gated until chain cwd is tracked — src/infra/exec-safe-builtins.ts:6

Review details

Best possible solution:

Keep cwd-mutating cd approval-gated until shell-chain cwd tracking and regression coverage exist, or ship only truly stateless builtins with configured real-flow proof.

Do we have a high-confidence way to reproduce the issue?

Yes, source inspection gives a high-confidence path for the merge blocker: cd changes shell cwd, while later chain segments are still resolved with the original cwd. I did not run a live patched OpenClaw flow in this read-only review.

Is this the best way to solve the issue?

No. Removing the earlier config surface is better, but default-allowing cd before cwd-aware chain evaluation is not the narrowest safe solution.

Full review comments:

[P1] Keep cd gated until chain cwd is tracked — src/infra/exec-safe-builtins.ts:6
Adding cd to the default safe builtin set lets a chain like cd /tmp && ./tool pass allowlist analysis for ./tool using the original cwd, while the shell executes it after changing directories. Remove cd from this set or carry cwd changes through chain analysis with regression coverage.
Confidence: 0.88

Overall correctness: patch is incorrect
Overall confidence: 0.88

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 1e54e908e2e4.

Label changes

Label justifications:

P2: This is a normal-priority exec approval improvement with important security-boundary review needs, but it is not a shipped regression affecting users yet.
merge-risk: 🚨 security-boundary: The diff can let shell builtins satisfy host exec allowlist checks and suppress approval prompts.
rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🧂 unranked krab.
status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body reports focused tests and check attempts, but it does not show a real configured OpenClaw exec allowlist flow after the patch; redacted terminal output, live output, or logs are still needed, with private details removed. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Evidence reviewed

PR surface:

Source +40, Tests +148. Total +188 across 3 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	2	42	2	+40
Tests	1	148	0	+148
Docs	0	0	0	0
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	3	190	2	+188

Security concerns:

[high] cd changes the cwd used by later shell segments — src/infra/exec-safe-builtins.ts:6
cd is classified as safe even though later segments are resolved with the original cwd, which can make the approval decision and shell execution disagree about relative executables.
Confidence: 0.88

What I checked:

Root policy read and applied: Root AGENTS.md was read fully; its exec approval and security-boundary guidance applies because this PR changes default allowlist evaluation for host exec. (AGENTS.md:7, 1e54e908e2e4)
PR adds cd to the default safe-builtin set: The PR's new closed builtin set includes cd, so a cwd-mutating shell builtin can satisfy allowlist evaluation without its own allowlist entry. (src/infra/exec-safe-builtins.ts:6, 9315a3415b75)
PR enables builtin satisfaction by default for shell allowlist evaluation: The PR sets allowShellBuiltins: true inside evaluateShellAllowlist, making the builtin set default behavior rather than an opt-in policy. (src/infra/exec-approvals-allowlist.ts:1138, 9315a3415b75)
Shell-chain analysis keeps the original cwd: Current evaluateShellAllowlist analyzes every chain part with params.cwd; it does not carry a prior cd into later segment resolution. (src/infra/exec-approvals-allowlist.ts:1164, 1e54e908e2e4)
Segment resolution is cwd-sensitive: parseSegmentsFromParts passes the caller cwd into resolveCommandResolutionFromArgv, so later relative commands are resolved against the original cwd during approval analysis. (src/infra/exec-approvals-analysis.ts:697, 1e54e908e2e4)
Relative executable paths resolve against cwd: resolveExecutablePathCandidate resolves relative executable tokens against the provided cwd, which can diverge from runtime shell cwd after cd. (src/infra/executable-path.ts:30, 1e54e908e2e4)

Likely related people:

steipete: Git history on the central exec approval files shows repeated allowlist, safe-bin, and realpath-bound approval work by Peter Steinberger. (role: feature-history owner; confidence: high; commits: 83bc73f4ea03, 8b4cdbb21da4, 9530c0108589; files: src/infra/exec-approvals-allowlist.ts, src/infra/exec-command-resolution.ts, src/infra/exec-safe-bin-policy.ts)
pgondhi987: Recent safe-bin argument validation hardening touched the same allowlist and node-host approval path. (role: recent exec allowlist hardening contributor; confidence: medium; commits: 9ac4272b35e9; files: src/infra/exec-approvals-allowlist.ts, src/node-host/invoke-system-run-allowlist.ts)
vincentkoc: Adjacent history includes exec approval allowlist type extraction and remote approval regression work near this policy surface. (role: adjacent exec approval contributor; confidence: medium; commits: d9a3ecd109ee, 2d53ffdec1da; files: src/infra/exec-approvals-allowlist.ts, src/infra/exec-approvals.types.ts, src/config/types.tools.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

kinjitakabe · 2026-05-08T13:36:56Z

@clawsweeper recheck

kinjitakabe · 2026-05-08T13:39:07Z

@clawsweeper re-review

kinjitakabe · 2026-05-08T13:48:28Z

@clawsweeper re-review

clawsweeper · 2026-05-12T05:57:58Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

State: Complete
Detail: The targeted re-review finished, the durable review comment was updated, and the synced verdict was routed.
Run: https://github.com/openclaw/clawsweeper/actions/runs/25717433617
Updated: 2026-05-12T06:33:14.456Z

[P2 security] safeBuiltins config now fails closed to a closed supported set. `normalizeSafeBuiltins` filters entries against `SUPPORTED_SAFE_BUILTINS` (`:`, `cd`, `export`, `false`, `pwd`, `true`, `unset`), silently dropping anything else — including code-evaluating builtins like `eval`, `source`, and `.` that the original implementation would have accepted as-is and allowlist-bypassed. The conservative `DEFAULT_SAFE_BUILTINS` remains the stateless subset (`:`, `false`, `pwd`, `true`). [P3 docs] Align the documented "conservative default set" with the actual `DEFAULT_SAFE_BUILTINS` (`:`, `false`, `pwd`, `true`). Surface the wider supported set (`cd`, `export`, `unset`) as an explicit opt-in with the cwd/env-mutation trade-off documented inline, instead of presenting all seven as the conservative default. [P3 type comment] Drop the "from the SDK" claim — `DEFAULT_SAFE_BUILTINS` is not exported through the plugin SDK. Inline the literal supported names in the JSDoc so config writers can copy them directly without chasing a nonexistent SDK export. Addresses clawsweeper review findings on openclaw#79363.

clawsweeper · 2026-05-12T14:12:27Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

State: Failed
Detail: The targeted re-review did not finish cleanly. Check the workflow run for details.
Run: https://github.com/openclaw/clawsweeper/actions/runs/25740123230
Updated: 2026-05-12T14:24:40.772Z

kinjitakabe · 2026-05-12T14:43:17Z

@clawsweeper re-review

clawsweeper · 2026-05-12T14:44:52Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

State: Complete
Detail: The targeted re-review finished, the durable review comment was updated, and the synced verdict was routed.
Run: https://github.com/openclaw/clawsweeper/actions/runs/25742015062
Updated: 2026-05-12T14:52:53.177Z

barnacle-openclaw · 2026-05-31T05:17:10Z

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

Treat a closed set of pathless POSIX shell builtins as internally safe in exec allowlist evaluation. This keeps cd/pwd/no-op shell segments from requiring approval while leaving environment-mutating builtins and unknown binaries gated. Fixes openclaw#46056.

openclaw-barnacle Bot added docs Improvements or additions to documentation agents Agent runtime and tooling size: M triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 8, 2026

kinjitakabe force-pushed the fix/exec-approvals-shell-builtins branch 2 times, most recently from 83471c1 to d4189a8 Compare May 8, 2026 12:10

kinjitakabe changed the title ~~fix(exec): add opt-in safeBuiltins for cd/pwd/export so allowlisted chains stop gating on builtins~~ fix(exec): add opt-in tools.exec.safeBuiltins for stateless shell builtins in allowlist mode May 8, 2026

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 8, 2026

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 8, 2026

clawsweeper Bot mentioned this pull request May 11, 2026

Shell builtins (e.g. cd) always trigger approval gate even when allowlist is configured #46056

Closed

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 12, 2026