Skip to content

fix(status): bound sandbox browser Docker audit probes#85226

Closed
ai-hpc wants to merge 1 commit into
openclaw:mainfrom
ai-hpc:fix/status-deep-docker-timeout-84984
Closed

fix(status): bound sandbox browser Docker audit probes#85226
ai-hpc wants to merge 1 commit into
openclaw:mainfrom
ai-hpc:fix/status-deep-docker-timeout-84984

Conversation

@ai-hpc
Copy link
Copy Markdown
Member

@ai-hpc ai-hpc commented May 22, 2026

Fixes #84984.

Summary

  • Pass the status timeout budget into the security audit path.
  • Bound sandbox-browser Docker audit subprocesses with an abort signal.
  • Return a partial security-audit diagnostic when Docker does not respond.
  • Add regression coverage for a silent Docker listing probe.

Verification

  • pnpm test src/security/audit-sandbox-browser.test.ts src/commands/status-runtime-shared.test.ts
  • pnpm exec oxfmt --check src/security/audit-extra.async.ts src/security/audit.ts src/commands/status-runtime-shared.ts src/security/audit-sandbox-browser.test.ts src/commands/status-runtime-shared.test.ts
  • git diff --check

Real behavior proof

After fix, a real source CLI run with an isolated OpenClaw state dir and a temporary docker command that hangs until aborted exits successfully and reports partial security-audit output instead of hanging.

Command:

env PATH=/tmp/openclaw-proof-84984/bin:/home/orin/.nvm/versions/node/v22.22.2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OPENCLAW_STATE_DIR=/tmp/openclaw-proof-84984 OPENCLAW_DISABLE_BUNDLED_PLUGINS=1 OPENCLAW_NO_RESPAWN=1 timeout 20 node --import tsx src/entry.ts status --json --all --timeout 1000

Key output:

{
  "securityAudit": {
    "summary": {
      "critical": 2,
      "warn": 2,
      "info": 2
    },
    "findings": [
      {
        "checkId": "sandbox.browser_container.docker_probe_timeout",
        "severity": "info",
        "title": "Sandbox browser Docker audit timed out",
        "detail": "Docker did not respond within 1000ms while auditing sandbox browser containers. Status continues with partial security audit results."
      }
    ]
  }
}

The command exited with status 0 before the outer 20s guard fired.

What was not tested

  • Full gateway/channel E2E was not run locally; this fix is scoped to the status security-audit Docker subprocess path.

@ai-hpc ai-hpc requested a review from a team as a code owner May 22, 2026 04:48
@openclaw-barnacle openclaw-barnacle Bot added commands Command implementations size: M triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 22, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 22, 2026

Thanks for the context here. I swept through the related work, and this is now duplicate or superseded.

Close as superseded: the earlier open PR at #85046 already owns the same status/security-audit Docker hang and carries the more complete subprocess-timeout implementation, proof, tests, and maintainer-review path.

Canonical path: Keep maintainer review focused on #85046, land the stronger bounded-subprocess fix there if approved, and close the linked issue after that merge.

So I’m closing this here and keeping the remaining discussion on #85046.

Review details

Best possible solution:

Keep maintainer review focused on #85046, land the stronger bounded-subprocess fix there if approved, and close the linked issue after that merge.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection and the linked report show current main routes deep status security audit into Docker ps/inspect/port probes without a subprocess timeout; a non-settling Docker test double is a high-confidence reproduction path.

Is this the best way to solve the issue?

No for this branch as the merge target. The earlier open PR is the better current solution because it bounds execDockerRaw itself, uses a separate audit subprocess timeout, carries stronger tests/proof, and is already marked ready for maintainer review.

Security review:

Security review needs attention: The branch changes security-audit failure semantics and is superseded by a stronger PR that preserves a clearer warning-on-timeout path.

  • [medium] Skipped Docker audit is downgraded to info — src/security/audit-extra.async.ts:313
    When Docker does not respond, this branch emits an info finding even though sandbox-browser container drift and port checks were skipped; the canonical PR treats that path as a warning for operator visibility.
    Confidence: 0.82
  • [medium] Abort-only wrapper may not fully bound Docker subprocesses — src/security/audit-extra.async.ts:299
    The branch aborts via the existing Docker helper but still depends on the child close event after SIGTERM, so a process that ignores or delays close can keep the status path waiting.
    Confidence: 0.78

What I checked:

  • Same canonical bug: The PR body says it fixes the same open status hang report, and the linked issue describes openclaw status --deep hanging after spawning docker ps, with expected bounded partial output instead of an indefinite wait.
  • Current branch implementation: The branch forwards timeoutMs as deepTimeoutMs and passes it to collectSandboxBrowserHashLabelFindings, which uses an AbortController around the existing Docker calls and emits sandbox.browser_container.docker_probe_timeout as an info finding. (src/security/audit-extra.async.ts:310, 8f29efd4aefc)
  • Earlier canonical PR is open: The earlier PR is open, references the same issue, has proof-sufficient and ready-for-maintainer-look labels, and its body documents real CLI proof plus validation for the bounded Docker audit path. (06b1f825206a)
  • Earlier PR has stronger subprocess boundary: The earlier PR adds timeoutMs directly to execDockerRaw, rejects promptly with TimeoutError, destroys stdio, and schedules a SIGKILL fallback instead of relying only on caller-provided AbortSignal close behavior. (src/agents/sandbox/docker.ts:11, 06b1f825206a)
  • Earlier PR keeps timeout API narrower: The earlier PR adds a separate subprocessTimeoutMs audit option and threads that into sandbox-browser probes, avoiding this branch's broader reinterpretation of deepTimeoutMs for all audit subprocess probes. (src/security/audit.ts:56, 06b1f825206a)
  • History routing sampled: Local blame and GitHub path history show recent status/security/sandbox ownership signals around the touched files, though the checkout is shallow and local blame collapses much of current main to a boundary commit. (src/security/audit-extra.async.ts:277, 98af51748d19)

Likely related people:

  • steipete: GitHub path history shows multiple recent commits on src/security/audit-extra.async.ts and src/commands/status-runtime-shared.ts, including security audit refactors and status runtime consolidation adjacent to this path. (role: recent security/status area contributor; confidence: medium; commits: 694ca50e9775, 8a2207f8a179, 88aa81422653; files: src/security/audit-extra.async.ts, src/commands/status-runtime-shared.ts)
  • vincentkoc: GitHub path history for src/agents/sandbox/docker.ts shows recent Docker/sandbox changes, including Docker GPU passthrough, default image fallback, and Docker environment handling adjacent to the affected subprocess helper. (role: recent sandbox contributor; confidence: medium; commits: d70191f8afd4, 47dc9f7fc0b9, bd33a340fba0; files: src/agents/sandbox/docker.ts)
  • Takhoffman: GitHub path history shows prior work on Windows/non-core spawn handling across Docker, which is adjacent to the subprocess helper changed by the canonical PR. (role: adjacent Docker spawn contributor; confidence: low; commits: cd653c55d7ce; files: src/agents/sandbox/docker.ts)
  • BSG2000: The shallow local checkout blames current status/security lines to commit 98af51748d19e79390b7b7575ebd37761044fa8d; the commit subject is unrelated, so this is a weak routing signal only. (role: local shallow-blame boundary signal; confidence: low; commits: 98af51748d19; files: src/commands/status-runtime-shared.ts, src/security/audit.ts, src/security/audit-extra.async.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 229490a48924.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. labels May 22, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 22, 2026

ClawSweeper applied the proposed close for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

commands Command implementations merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. size: M status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

openclaw status --deep hangs indefinitely (no timeout on subprocess probe)

1 participant