Skip to content

doctor: warn about orphaned agent dirs#65113

Merged
obviyus merged 5 commits intoopenclaw:mainfrom
neeravmakwana:fix-doctor-orphan-agent-dirs
Apr 12, 2026
Merged

doctor: warn about orphaned agent dirs#65113
obviyus merged 5 commits intoopenclaw:mainfrom
neeravmakwana:fix-doctor-orphan-agent-dirs

Conversation

@neeravmakwana
Copy link
Copy Markdown
Contributor

Summary

  • Problem: openclaw doctor did not warn when ~/.openclaw/agents/<id>/agent directories still existed on disk but the matching agents.list[] entries were missing from config.
  • Why it matters: a partially lost config can leave agent state on disk while routing, identity, and model selection ignore those agents, which makes recovery confusing.
  • What changed: the state-integrity doctor pass now scans on-disk agent directories and warns when they no longer have matching agents.list[] entries.
  • What did NOT change (scope boundary): this PR does not reconstruct missing agent config automatically or change the already-fixed backup/update recovery paths from Update 2026.4.9 → 4.11 silently wipes channels.discord config and agents.list from openclaw.json #65105.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Root Cause (if applicable)

  • Root cause: doctor only checked general state/session integrity and never compared on-disk agent directories against configured agents.list[] entries.
  • Missing detection / guardrail: no doctor warning existed for orphaned agent directories, even though some runtime surfaces can still discover agents from disk.
  • Contributing context (if known): the original report in Update 2026.4.9 → 4.11 silently wipes channels.discord config and agents.list from openclaw.json #65105 described agent directories remaining on disk after config loss, but doctor did not flag that mismatch.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/commands/doctor-state-integrity.test.ts
  • Scenario the test should lock in: doctor warns for real orphaned ~/.openclaw/agents/<id>/agent directories, but ignores configured agents and incomplete folders.
  • Why this is the smallest reliable guardrail: the new behavior is isolated to the doctor state-integrity scan and can be exercised deterministically with a temp state dir.
  • Existing test that already covers this (if any): none
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

  • openclaw doctor now warns when agent state directories exist on disk without matching agents.list[] entries in config.

Diagram (if applicable)

Before:
[config loses agents.list entries] -> [agent dirs still exist on disk] -> [doctor stays silent]

After:
[config loses agents.list entries] -> [agent dirs still exist on disk] -> [doctor warns about orphaned agent dirs]

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS 26.3.0
  • Runtime/container: Node 25 / pnpm workspace
  • Model/provider: N/A
  • Integration/channel (if any): doctor CLI
  • Relevant config (redacted): config with only agents.list: [{ id: "main", default: true }] plus extra on-disk agent dirs under ~/.openclaw/agents/

Steps

  1. Create ~/.openclaw/agents/big-brain/agent and ~/.openclaw/agents/cerebro/agent in a temp state dir.
  2. Run noteStateIntegrity() with config that only defines main in agents.list.
  3. Inspect the emitted doctor state-integrity text.

Expected

  • Doctor warns that on-disk agent directories exist without matching agents.list[] entries.

Actual

  • Doctor now emits that warning and includes example agent ids.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios: targeted regression test for orphaned agent detection; targeted regression test for configured-agent and incomplete-folder false positives; repo-wide pnpm check.
  • Edge cases checked: default agent remains ignored; incomplete ~/.openclaw/agents/<id> folders without nested agent/ do not warn.
  • What you did not verify: full end-to-end recovery from a historical config wipe; pnpm build is currently failing in the existing tree due to an unrelated extensions/acpx/src/runtime.ts export-resolution error.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Risks and Mitigations

  • Risk: stale non-agent directories under ~/.openclaw/agents/ could be mistaken for orphaned agents.
    • Mitigation: the warning only considers directories that contain the nested agent/ runtime dir, and tests cover the incomplete-folder false-positive case.

AI assistance: prepared with an agent, with the code path, tests, and verification reviewed before submission.

Made with Cursor

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a5e0644fe4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/commands/doctor-state-integrity.ts Outdated
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 12, 2026

Greptile Summary

Adds orphaned-agent-directory detection to openclaw doctor: a new listOrphanAgentDirs helper scans $OPENCLAW_STATE_DIR/agents/ for directories that contain a nested agent/ subdirectory but have no matching entry in agents.list, then emits a warning via the existing state-integrity note path. The scope is well-bounded — read-only, no automatic repair, no config changes — and two targeted regression tests cover the detection and the false-positive suppression cases.

Confidence Score: 5/5

Safe to merge; the change is read-only, well-tested, and correctly scoped to the doctor state-integrity pass.

No P0 or P1 issues found. The only finding is a P2 style note about using the normalized ID for both the configuredIds lookup and the existsDir path check — a false-negative edge case that cannot occur with real OpenClaw data because agent directories are always created with already-normalized names.

No files require special attention.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/commands/doctor-state-integrity.ts
Line: 75-80

Comment:
**Normalized name used for path lookup**

`normalizeAgentId(entry.name)` can transform the raw directory name (e.g. lowercasing or replacing `_` with `-`), and then `existsDir` is called with the transformed ID as a path segment. On a case-sensitive filesystem this means a directory named `BigBrain` would be mapped to `bigbrain`, and the `agent/` check at `agentsRoot/bigbrain/agent` would silently fail even though `agentsRoot/BigBrain/agent` exists — producing a false negative.

Safer to preserve the original name for path construction and normalize only for the `configuredIds` comparison:

```suggestion
      .map((entry) => ({ raw: entry.name, normalized: normalizeAgentId(entry.name) }))
      .filter(({ raw, normalized: agentId }) => {
        if (!agentId || configuredIds.has(agentId)) {
          return false;
        }
        return existsDir(path.join(agentsRoot, raw, "agent"));
      })
      .map(({ normalized }) => normalized)
      .toSorted((left, right) => left.localeCompare(right));
```

In practice OpenClaw creates all agent directories with already-normalized (lowercase) IDs, so this is a false-negative edge case rather than a present bug — but it's worth making the intent explicit.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "docs(changelog): note orphaned agent war..." | Re-trigger Greptile

Comment thread src/commands/doctor-state-integrity.ts Outdated
@neeravmakwana
Copy link
Copy Markdown
Contributor Author

Addressed the review feedback about path probing for orphaned agent dirs.

What changed:

  • Preserved the original on-disk directory name for the filesystem agents/<dir>/agent existence check.
  • Continued normalizing only for config-id comparison and for the warning output.
  • Added a regression test that creates ~/.openclaw/agents/Research/agent and verifies doctor still reports it as an orphan.

Why this changed:

  • Both review comments correctly pointed out that probing with the normalized id could miss real orphaned directories on case-sensitive filesystems when the folder name casing differs from the normalized agent id.

Verification:

  • pnpm test src/commands/doctor-state-integrity.test.ts
  • pnpm check

The follow-up is in commit c5e3ff01e0.

@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot bot commented Apr 12, 2026

🔒 Aisle Security Analysis

We found 2 potential security issue(s) in this PR:

# Severity Title
1 🟡 Medium Symlink-following realpath/stat on agent directories can cause local filesystem probing and DoS
2 🟡 Medium Terminal/log injection via unsanitized agent directory names in doctor warning output
1. 🟡 Symlink-following realpath/stat on agent directories can cause local filesystem probing and DoS
Property Value
Severity Medium
CWE CWE-400
Location src/commands/doctor-state-integrity.ts:47-88

Description

The new orphan-agent-directory detection follows symlinks inside the state directory when checking .../<dirName>/agent.

  • existsDir() uses fs.statSync() which follows symlinks.
  • isReachableConfiguredAgentDir() calls fs.realpathSync.native() on .../<dirName>/agent, which resolves symlinks and may touch arbitrary paths outside stateDir.
  • If an attacker (or another local user when OPENCLAW_STATE_DIR points to a shared/writable location) can create ~/.openclaw/agents/<dir>/agent as a symlink to:
    • slow/unavailable network mounts (hang during realpath/stat),
    • special filesystems (e.g., /proc, FUSE),
    • permission-restricted locations (triggering repeated errors),

then running openclaw doctor can hang or become very slow (local DoS) and can act as a filesystem oracle via timing/error behavior.

Vulnerable code:

function existsDir(dir: string): boolean {
  return fs.existsSync(dir) && fs.statSync(dir).isDirectory();
}
...
const rawRealPath = fs.realpathSync.native(rawDir);

Recommendation

Avoid following symlinks when scanning the state directory, or strictly confine resolution to inside agentsRoot.

Option A (simplest): skip symlinks

function existsRealDirNoSymlink(p: string): boolean {
  try {
    const st = fs.lstatSync(p);
    return st.isDirectory(); // symlink -> false
  } catch {
    return false;
  }
}// use existsRealDirNoSymlink(path.join(agentsRoot, dirName, "agent"))

Option B: allow symlinks but confine

const agentsRootReal = fs.realpathSync.native(agentsRoot);
const candidateReal = fs.realpathSync.native(candidate);
if (!candidateReal.startsWith(agentsRootReal + path.sep)) {
  return false; // points outside state dir
}

Also consider using async filesystem APIs to reduce CLI blocking and adding defensive limits (max entries processed) to prevent excessive work on attacker-controlled directory structures.

2. 🟡 Terminal/log injection via unsanitized agent directory names in `doctor` warning output
Property Value
Severity Medium
CWE CWE-117
Location src/commands/doctor-state-integrity.ts:91-93

Description

doctor-state-integrity lists on-disk agent directories and interpolates the raw directory name (Dirent.name) into CLI output without escaping/sanitization.

  • Input: dirName is sourced from fs.readdirSync(..., { withFileTypes: true }) (entry.name) under the state directory.
  • Propagation: formatOrphanAgentDirLabel() embeds entry.dirName directly into a human-readable label.
  • Sink: The label is included in a multi-line warning string (warnings.push([...].join("\n"))) that is rendered to the terminal via the note() system (which wraps text but does not neutralize ANSI/control characters).

If an attacker can create a directory under ~/.openclaw/agents/ (or the configured state dir) with embedded newlines, carriage returns, tabs, or ANSI escape sequences, they can manipulate the openclaw doctor output (log forging / misleading remediation text / terminal escape injection). On POSIX filesystems, such characters are generally permitted in filenames.

Vulnerable code:

function formatOrphanAgentDirLabel(entry: OrphanAgentDir): string {
  return entry.dirName === entry.agentId ? entry.agentId : `${entry.dirName} (id ${entry.agentId})`;
}

Recommendation

Sanitize untrusted filesystem-derived strings before writing them to the terminal.

Options (pick one consistent approach used across the CLI):

  1. Escape/strip control characters and ANSI sequences for display labels.
function sanitizeForTerminal(value: string): string {// Remove C0 controls + DEL (incl. \n, \r, \t, ESC)
  return value.replace(/[\u0000-\u001F\u007F]/g, "?");
}

function formatOrphanAgentDirLabel(entry: OrphanAgentDir): string {
  const safeDir = sanitizeForTerminal(entry.dirName);
  return safeDir === entry.agentId ? entry.agentId : `${safeDir} (id ${entry.agentId})`;
}
  1. Render with an explicit escaped representation (e.g., JSON string escaping) so newlines/escapes are visible rather than interpreted.

Also consider using normalizeAgentId() strictly for display (or displaying both normalized + escaped raw name) to avoid printing raw directory names in multi-line notes.


Analyzed PR: #65113 at commit 58a0fdb

Last updated on: 2026-04-12T04:04:15Z

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c5e3ff01e0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/commands/doctor-state-integrity.ts Outdated
@neeravmakwana
Copy link
Copy Markdown
Contributor Author

Addressed the latest review comment in .

The orphan-agent check now treats a case- or normalization-mismatched on-disk folder as configured only when the normalized configured path resolves to the same real directory. That means we now warn for truly unreachable folders like on case-sensitive filesystems, while avoiding false positives on case-insensitive filesystems where and resolve to the same directory.

I also updated the examples text to show the on-disk folder name when it differs from the normalized agent id, and added focused tests for both the unreachable and same-realpath cases.

Verification:

openclaw@2026.4.11 test /Users/neeravmakwana/Desktop/github_repos/openclaw_repo_prrefresh
node scripts/test-projects.mjs src/commands/doctor-state-integrity.test.ts

RUN v4.1.2 /Users/neeravmakwana/Desktop/github_repos/openclaw_repo_prrefresh

Test Files 1 passed (1)
Tests 14 passed (14)
Start at 22:50:52
Duration 1.33s (transform 353ms, setup 147ms, import 1.06s, tests 28ms, environment 0ms)

@obviyus obviyus self-assigned this Apr 12, 2026
@obviyus obviyus force-pushed the fix-doctor-orphan-agent-dirs branch from 26e94d0 to 58a0fdb Compare April 12, 2026 03:58
Copy link
Copy Markdown
Contributor

@obviyus obviyus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified the doctor gap where on-disk ~/.openclaw/agents/<id>/agent directories could survive config loss while noteStateIntegrity() stayed silent, and confirmed this change adds the warning on the direct state-integrity path.

Maintainer follow-up: rebased onto latest main, fixed the changelog entry into the current Unreleased Fixes block with (#65113) Thanks @neeravmakwana, and polished the warning text/pluralization.

Local gate: pnpm test src/commands/doctor-state-integrity.test.ts.

@obviyus obviyus merged commit 33836ab into openclaw:main Apr 12, 2026
24 of 26 checks passed
@obviyus
Copy link
Copy Markdown
Contributor

obviyus commented Apr 12, 2026

Landed on main.

Thanks @neeravmakwana.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 58a0fdb998

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +125 to +126
if (!configuredIds.has(agentId)) {
return true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor configured agentDir overrides in orphan-dir scan

The orphan-dir filter currently treats any agents/<folder>/agent directory as orphan when <folder> is not a configured agent ID, but runtime agent storage can be redirected with agents.list[].agentDir (via resolveAgentDir). In that common override case (for example, agent id main using ~/.openclaw/agents/work/agent), work is a live configured directory yet this branch always flags it as orphan, leading to a misleading doctor warning and incorrect cleanup guidance.

Useful? React with 👍 / 👎.

trudbot pushed a commit to trudbot/openclaw that referenced this pull request Apr 12, 2026
…akwana)

* doctor: warn about orphaned agent dirs

* docs(changelog): note orphaned agent warning

* doctor: preserve orphan agent dir casing

* doctor: flag unreachable agent dirs

* fix: polish orphan agent dir warning

---------

Co-authored-by: Ayaan Zaidi <hi@obviy.us>
TOMUIV pushed a commit to TOMUIV/openclaw that referenced this pull request Apr 14, 2026
…akwana)

* doctor: warn about orphaned agent dirs

* docs(changelog): note orphaned agent warning

* doctor: preserve orphan agent dir casing

* doctor: flag unreachable agent dirs

* fix: polish orphan agent dir warning

---------

Co-authored-by: Ayaan Zaidi <hi@obviy.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

commands Command implementations size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants