docs(rfc-025): agent loop self-awareness + self-management (design-first)#295
docs(rfc-025): agent loop self-awareness + self-management (design-first)#295vansin wants to merge 6 commits into
Conversation
…ove claude bucket skip) Vincent priority bug: "除 Claude Code CLI 外其他 runtime loop 不起来". The actual symptom for claude-agent-sdk users was that /loop commands were silently rejected at the inbox gate and the scheduler refused to wake any goals tagged with a claude-bucket runtime. ## Root cause (corrected after design-first investigation) The dispatch's initial audit claimed claude-agent-sdk was "mistakenly grouped with claude-code-cli". On code reading + pure-function probe, the gate was BY DESIGN per v0.4 §3.4 (commit 45c7909) on the premise that "claude-agent-sdk uses Claude Code's native /loop inside the spawned `claude` binary". That premise is FALSE for SDK-spawned claude: - `processWithClaude` (cli.ts:1270) calls `query()` from @anthropic-ai/claude-agent-sdk — a one-shot Promise per task, not a persistent CC REPL. - Native /loop machinery (CronCreate / ScheduleWakeup) requires a long-running interactive session to host its callbacks. The SDK process exits after each query() resolves. - Result: the gate blocked anet /loop on the (incorrect) assumption there was a native /loop to defer to; in fact there was no /loop at all. A pure-function probe across (runtime × goal) combinations established the ground truth before any code change: - claude-agent-sdk + empty → skip + "native /loop in use" - codex-sdk + same-bucket → ok (scheduler ran — codex/grok worked) - claude bucket + ANY goal → throws via assertNonClaudeRuntime - codex bucket + grok goal → fatal exit(1) (hostile UX) ## Fix (refined-B per 通信龙 ack) ### 1. Remove the claude-bucket skip + assertNonClaudeRuntime gate goals/store.ts: - `CLAUDE_RUNTIME_NAMES` kept as a CLASSIFIER (for cross-bucket detection in decideStartupAction) but no longer load-bearing for any reject/throw path. - `assertNonClaudeRuntime` removed entirely (was the throw in newGoal + GoalStore.upsert). - `newGoal({runtime: "claude-agent-sdk"})` now succeeds. - `GoalStore.upsert` accepts claude-bucket goals. ### 2. Remove inbox `RUNTIME !== "claude"` carve-out agent-node/src/cli.ts (~L2584): - The pre-#144 `if (RUNTIME !== "claude" && isGoalCommand(content))` let claude-bucket /loop messages fall through to the LLM as plain text (which would respond "I created a loop" but wire nothing). - Now: `if (isGoalCommand(content))` — all recognized runtimes route /loop and /goal to `createScheduledGoal`. ### 3. unknown bucket still skips (defensive) If `--runtime <typo>` resolves to an unrecognized bucket, scheduler still refuses to start (no name → bucket guess). Operator sees the log line and can correct the config. ### 4. Cross-bucket: fatal exit → archive + continue (hostile-UX fix) decideStartupAction: - Pre-#144 codex+grok-leftover (or vice versa) returned `fatal`, which cli.ts handled as `process.exit(1)` — silently killing the user's node with no recovery guidance. That's a real UX bug independent of the loop premise. - Now: returns `archive` with runScheduler=true. cli.ts dispatcher archives the foreign goals to `<path>.runtime-switched.<ts>` (recoverable) + boots the scheduler against a clean store. No process exit. - StartupAction type pruned: removed the `fatal` variant entirely. ### 5. UX — `anet node loop` one-liner agent-network/bin/cli.ts: - New `anet node loop <alias> "<task>" --every 5m` command. Wraps the inbox `/loop <interval> <task>` slash command and POSTs as a normal task via `/api/task`. Default --every = 5m. Validates interval format (s/m/h/d). Surfaces hub-down errors clearly. - Help text + Did-you-mean suggestion updated. ## Tests (all pass) agent-node/src/goals/store.test.ts: 35 pass, 0 fail - claude runtime newGoal/upsert smoke - decideStartupAction matrix incl. claude=ok, cross-bucket=archive - cross-bucket archive sets runScheduler=true (regression pin vs the old fatal-exit behavior) - unknown bucket still skips agent-node/src/ full suite: 329 pass, 0 fail, 686 expect() ## Docker e2e (the load-bearing capability proof per 通信龙) tests/docker-e2e-loop-runtime.sh — runs in isolated Docker: - Hub on :9210 with isolated COMMHUB_DB - Registers test user → grabs network token (report_status MCP tool requires network-scoped token) - claude-agent-sdk node: pre-injected 5s-interval goal with next_wake_at = past. Asserts within 25s: ✅ "goals scheduler: enabled (runtime=claude)" (gate gone) ✅ "[goal] wake <id>" (tick fires) - codex-sdk node: same shape, 15s deadline ✅ "goals scheduler: enabled (runtime=codex)" ✅ "[goal] wake <id>" Result: 7 pass, 0 fail. Both runtimes' schedulers verifiably FIRE the wake on the pre-injected goal — not just an architecture inference, an observed runtime event. ## Migration Zero-downtime safe. Nodes that previously had a claude-bucket runtime + an empty goals.json continue to work (just now the scheduler is enabled and idle, instead of disabled). Anyone who had codex/grok foreign goals on a claude node will see those archived to `.runtime-switched.<ts>` on next start (was: silently no-op because scheduler was off; now: scheduler runs against a clean store after archive). ## Public-facing copy Public placeholders only. No real domains, no team aliases in code. Tracking: internal task #144. auth-sensitive (runtime behavior change) — PR opens for review; 通信龙 surfaces Vincent before merge.
…I shape + full test matrix Independent reviewer on #288 caught real silent-fail bug between the new `anet node loop` CLI and the parser: - CLI accepted `5m` / `2h` / `1d` (single-letter, the canonical --every shape) - parser INTERVAL_PATTERNS only matched word-form (min/minute/hour/ day) — none of the single-letter forms parsed - Result: CLI POSTed /api/task, /api/task enqueued (j.ok=true), CLI printed `✅ Scheduled loop` — but the node's inbox handler rejected the parse, sent the failure reply back to `from:"api"` where no human ever saw it. Silent fail. - The original e2e wrote goals.json directly, bypassing the parser and missing this entire path (the "测了机制没测真命令" lesson — introspection ≠ capability). ## Fix ### parser.ts — add single-letter `m/h/d` while keeping `5min` working Added 3 patterns AFTER the word-form patterns (declaration order matters: longest first wins, so `5min` still picks the word rule). Lookbehind `(?<!\w)` + lookahead `(?![a-zA-Z])` prevent `5min` from matching the `m` rule (would otherwise eat `5m` and leave `in` as leftover goal text). /(?<!\w)(\d+)m(?![a-zA-Z])/i → minutes /(?<!\w)(\d+)h(?![a-zA-Z])/i → hours /(?<!\w)(\d+)d(?![a-zA-Z])/i → days `s` and sub-minute remain rejected (MIN_INTERVAL_MS=60s) — both the parser and the CLI now align on the same floor. ### CLI — strict format + poll for node confirmation - argv pattern tightened from `/^\d+[smhd]$/` to `/^[1-9]\d*[mhd]$/` — rejects `30s` AT THE CLI LAYER so the user gets immediate feedback rather than a doomed task POST - After `/api/task` enqueue, CLI now POLLS `/api/tasks?task_id=<id>` for up to 15s looking for status='replied' + result containing "已创建 loop 目标". Only THEN prints ✅. - "❌ Node rejected" on failure-status reply (surfaces the actual parser error to the user) - "⚠ Node did not confirm within 15s" on timeout (covers offline / crashed node) - help text updated: drop `30s` example, add `1d` example, note the sub-minute rejection rationale ### `anet node --help` routing fix (cli.ts:9568) `anet node loop --help` was hitting the generic node-help interceptor at the top-level --help router, which printed the OLD "create|start|stop|..." usage string (without `loop`). Added a delegate so `anet node loop --help` reaches `nodeLoopCommand`'s own help text with examples. ## Test matrix (per 通信龙 dispatch — Vincent's "反复测确保可用") ### Parser unit tests (+7) `src/goals/parser.test.ts`: - `5m`, `30m`, `90m`, `2h`, `1d` all parse to correct ms - single-letter and word-form produce same interval (no semantic drift) - `5min` still wins over `5m` (longest-prefix declaration order pin) - `30s` rejected with sub-minute error (parser + CLI aligned) - lookahead guard pins (5MOM doesn't match) Result: 31 parser tests, 0 fail. ### Docker e2e — full user path (`tests/docker-e2e-loop-runtime.sh`) Pre-fix this script wrote goals.json directly and missed the parser bug. Rewritten to exercise the REAL user flow end-to-end: Section 1: hub bootstrap (isolated COMMHUB_DB + register user + grab network token) Section 2: claude-agent-sdk — `anet node loop` CLI invocation, CLI confirms goal creation, goals.json verified, restart to past next_wake_at, [goal] wake fired Section 3: codex-sdk — same path with `--every 2h` Section 4: channel `/loop` direct (raw /api/task) — same parser path that commhub_send_task hits Section 5: interval edge cases (30s rejected at CLI / 5x rejected / empty rejected — no silent successes) Section 6: `anet goal list` shows goal, `anet goal cancel` flips status='cancelled' (verified in goals.json) Section 6b: grok-build-acp startup-enable + wake fire Section 6c: multi-cycle wake — same goal fires 2+ times within 130s (proves not one-shot) Section 7: offline node → CLI fails clearly (no silent ✅) Result: 20 pass, 0 fail. ### Docker npm-pack smoke (`tests/docker-e2e-loop-npm-pack.sh` — NEW) Per 通信龙 "用 npm pack 本地 tarball" — verifies the artifact `npm publish` would upload, without needing to actually publish: - `npm pack` agent-node + agent-network + commhub-server → .tgz - Uninstall source-built versions (clean state) - `npm install -g <tgz>` for all 3 — proves the published artifact correctly wires the `anet` / `agent-node` / `commhub-server` bins - `anet node loop --help` prints the new help text - Full e2e using ONLY installed binaries (no /app source paths) — fresh hub, register, agent-node, run `anet node loop`, verify goals.json landed Result: 12 pass, 0 fail. Catches the "source builds work but npm install broken" class of regression ([[feedback_docker_smoke_gate_before_ship]] / [[feedback_anet_node_behavior_stale_install]]). ## Summary parser unit tests: 31 pass e2e matrix: 20 pass (claude+codex+grok+channel+edges+ management+multi-cycle+offline) npm-pack install: 12 pass Total: 63 checks, 0 fail. Pre-merge readiness: 通信龙 + Vincent spot-check the two user entries (CLI + channel /loop) themselves before切 preview2.
…re + safe_rm + glitch
通信龙 spot-check passed with 4 tighten items before merge:
1. **claude-agent-sdk ≥2 natural wakes** (was: only codex covered)
The pre-tighten 6c test ran multi-cycle on codex. Scheduler is
runtime-agnostic shared code, but claude is Vincent's personally-
tested runtime — pin its multi-cycle behaviour rather than relying
on transitive inference. Added Section 6d: claude-agent-sdk with
60s interval, asserts ≥2 `[goal] wake` lines in 130s window.
2. **Wire e2e into test-all.sh**
Added both `docker-e2e-loop-runtime.sh` and
`docker-e2e-loop-npm-pack.sh` to `tests/test-all.sh` after the
existing 4 suites. Auto-regression coverage — previously they
were only runnable manually so a future regression would silent-
regress. Scripts now also emit `Results: N passed, M failed`
format that `run_suite`'s regex parses.
3. **`safe_rm_rf` for the 7 bare `rm -rf "$VAR"` sites**
lint guard caught them. Both scripts now `source lib/safe-rm.sh`
at startup (with /app/lib/ fallback path for Docker mount), check
`type safe_rm_rf` is defined, and exit 99 if missing. All 7
workdir cleanups (claude/codex/channel/grok/multi/cmulti in the
runtime script + WORKDIR in the npm-pack script) now use
`safe_rm_rf`. lib copied into image via `COPY tests/lib /app/lib`.
4. **`[: 0\n0` bash glitch** (loop-runtime.sh:588)
Old: `grep -c ... 2>/dev/null || echo 0` — when grep returns 0
AND the file doesn't exist, the `|| echo 0` was emitting an
extra newline that produced `0\n0` and `[ "0\n0" -ge 2 ]`
triggered "integer expression expected" warnings. Harmless but
noisy. New: `WAKE_COUNT=$( { grep -c ... 2>/dev/null || true; } | head -1); WAKE_COUNT=${WAKE_COUNT:-0}`
— single line, defaults to 0 if empty.
## Test results
```
src/goals/parser.test.ts: 31 pass / 0 fail
src/ (agent-node full unit suite): 329 pass / 0 fail
docker-e2e-loop-runtime.sh: 21 pass / 0 fail (was 20; +1 claude multi-cycle)
docker-e2e-loop-npm-pack.sh: 12 pass / 0 fail
──────────────
64 checks 0 fail
```
`Results: N passed, M failed` lines verified — test-all.sh's
`run_suite` regex parses them correctly.
…irst
Per Vincent priority + 通信龙 5-criterion review:
① 自调 — 6 self-scoped tools (list/edit/cancel/create/reschedule/complete_my_loop)
② 智能 — LLM 自解析意图, tool descriptions 引导而非 NLU rules
③ Claude Code /loop ScheduleWakeup 范式 — reschedule_my_loop + complete_my_loop
是优雅核心 (动态自调度 + 自决达标即停防死循环)
④ 🔴 防双调度 — 两层模型严格分: 外层 anet goals scheduler 唯一调度源 /
内层 SDK query() 是单次任务执行, SDK turn loop ≠ '循环' 概念
⑤ 优雅 — 复用 goals store + parser + commhub-mcp 注入模式;
共享 SELF_LOOP_TOOL_SPECS 一处定义, per-runtime adapter 各 ~20 行翻译
11 sections + 5 open questions + 6-phase impl plan (~600 LOC total).
Design-first: 待 通信龙 + Vincent review, 不 impl.
…ation findings Per 通信龙 review feedback: 1. **Renumber RFC-024 → RFC-025** — RFC-024 already taken by dashboard config-apply (#287/#290/#294 on main). RFC-025 is next free. 2. **Simplify §3.1 claude-code-cli** — per Vincent: 'claude-code-cli 一直用 原生 CC /loop, 这个不用我们的'. Out-of-scope, one-line note pointing to §6 per-runtime table + §12 non-goals. Drops the long verification table (rolled into project memory instead). 3. **Fold 5 LLM-validation findings** (independent agent verified ②智能 premise with 8 sentences × 3 LLM contexts → 100% tool-choice agreement, but caught 5 design gaps): - 🔴 #1 **Time-of-day / cron-lite gap** (most important — Vincent's own '每天9点' example doesn't fit current interval model). Added cron-lite schedule field option (A) vs explicit-reject (B), preferred A. New P0a phase ~120 LOC. - #2 **Confirm-back on destructive/batch** ops (3 cancels/30s window → refuse + ask user to confirm). - #3 **Report fabricated values back** to user ('已改成 30 分钟一次'). - #4 **pause/cancel/complete description disambiguation** — added contrastive phrasing to each tool description. - #5 **Multi-loop reference resolution e2e** (3-5 loop state '这个/那个' pinned in P5). 4. Updated §5 tool table (description hints per #3/#4), §6 per-runtime (claude-code-cli marked ❌ 范围外), §10 phases (added P0a cron-lite, ~720 LOC total), §11 open questions (+#6 cron-lite A/B, +#7 confirm-back threshold, +#8 timezone handling), §12 non-goals (added claude-code-cli explicit out).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 67d6c7d46e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| while (Date.now() - started < POLL_DEADLINE_MS) { | ||
| await new Promise(r => setTimeout(r, POLL_INTERVAL_MS)); | ||
| try { | ||
| const r: any = await fetch(`${hub}/api/tasks?task_id=${encodeURIComponent(taskId)}`, { headers: authHeaders() }).then(x => x.json()); |
There was a problem hiding this comment.
Add skip_stats to loop confirmation polling
When this runs against a hub with many task rows, each 1s poll still asks /api/tasks to compute its default stats GROUP BY; the handler documents skip_stats=1 as the opt-out because that scan dominates large DBs (server/src/index.ts:1966-1991). This path only needs the single task row, so add skip_stats=1 (and ideally limit=1) to avoid repeatedly loading the hub and potentially causing the 15s confirmation wait to time out.
Useful? React with 👍 / 👎.
| // word like `5min`). Lookahead `(?![a-zA-Z])` prevents `5min` from | ||
| // matching the `5m` rule. Lookbehind `(?<!\w)` prevents `pm5h`-like | ||
| // surprises (no digit/letter immediately before the number). | ||
| { re: /(?<!\w)(\d+)m(?![a-zA-Z])/i, toMs: (m) => parseInt(m[1], 10) * 60_000 }, |
There was a problem hiding this comment.
Prefer the CLI prefix interval over later task text
When anet node loop sends a prefixed command whose task text also contains an interval word, the parser can consume the later text instead of the leading --every value because these single-letter rules only run after daily/hourly/word-unit patterns. For example, /loop 5m summarize daily standup currently becomes a 1-day loop with text 5m summarize standup; parse the command-prefix interval first or choose the earliest match so the CLI's requested cadence is honored.
Useful? React with 👍 / 👎.
| reason: `claude runtime started with ${activeGoals.length} non-claude goal(s) (${foreignBuckets.join(",")}) — archive + skip scheduler so anet doesn't double-fire alongside Claude Code's native /loop`, | ||
| foreignCount: activeGoals.length, | ||
| runScheduler: true, | ||
| reason: `${currentBucket} runtime started but goals.json holds ${foreign.length} ${foreignBuckets.join("/")} goal(s) — archiving + clearing so we don't reuse thread/session IDs across SDK boundaries. Backup recoverable on disk.`, |
There was a problem hiding this comment.
Preserve current-runtime goals during archive recovery
If goals.json contains both valid current-bucket goals and stale foreign ones, this archive verdict is handled by cli.ts with archiveAndClear(), which clears the entire store before re-enabling the scheduler. In a codex node with one codex loop and one old grok loop, the safe codex loop is stopped too even though only the grok record is unsafe; filter/archive only the foreign active records or restore same-bucket goals after archiving.
Useful? React with 👍 / 👎.
| const gc = loadGlobal(); | ||
| const hub = profile.hub || gc.hub || "http://127.0.0.1:9200"; |
There was a problem hiding this comment.
Honor --hub and COMMHUB_URL for node loop
This new command ignores the existing hub override path (getHub() reads --hub and COMMHUB_URL) and only uses the node/global config. In multi-hub or test setups where the operator points the CLI at another hub via env/flag but the local profile lacks that hub (or global still points at :9200), anet node loop posts to the wrong server while the token/network may refer to the intended one, causing not-found or scheduling on the wrong hub; include the same override precedence used by the rest of the CLI.
Useful? React with 👍 / 👎.
All 8 design questions resolved (待 Vincent 产品级 review):
1. list returns structured {goals:[...]} (agent + dashboard parseable)
2. reschedule_my_loop / edit_my_loop split — 优雅核心
3. paused → active re-validates (防 cap/规则变了)
4. context-injection in system prompt (agent self-state)
5. COMMHUB_MAX_GOALS_PER_NODE env, default 20
6. cron-lite (A) — schedule union (interval|time_of_day|weekday)
7. confirm-back 3 cancels/30s default, env tunable
8. time_of_day uses node config flags.timezone, default Asia/Shanghai
Vincent 拍板后接 P0a-P5 impl. 设计 lock-down.
Author
Agent: 通信SDK马
Why
Vincent priority — "agent 要自己理解 loop":感知 + 自然语言对话调整管理. design-first per 通信龙, no impl.
Scope
In: agent-node runtimes —
claude-agent-sdk/codex-sdk/grok-build-acpOut (per Vincent §3.1, §12 non-goals):
claude-code-cli(用 CC 原生 /loop 自管, 独立 session 不接 anet)Highlights
aliasarg — physically can't address another node)list_my_loops/create_my_loop/edit_my_loop/cancel_my_loopreschedule_my_loop— Claude Code/loopScheduleWakeup 范式: agent 自决下次唤醒complete_my_loop— 达标即停, 防死循环, 跟cancel语义分清query()= single-task exec【你的当前循环任务】block injected before each thinkSELF_LOOP_TOOL_SPECS一处定义, 各 runtime ~20 行翻译Independent LLM-validation (通信龙 起独立 agent 验)
3 LLM contexts × 8 natural-language sentences → 100% tool-choice agreement (②智能 premise 实证). But caught 5 design gaps, all folded into the RFC:
8 open questions (待 review 拍)
See §11. New since v1: #6 cron-lite A/B, #7 confirm-back threshold, #8 timezone handling (倾向
Asia/Shanghaidefault via node configflags.timezone).ETA (impl phase, post-review) — ~720 LOC
5-7h impl + 3-4h 测.
Tracking