feat(cli): CLI feature parity phase 2 - usage tracking and streaming (AI-assisted)#2352
feat(cli): CLI feature parity phase 2 - usage tracking and streaming (AI-assisted)#2352rmorse wants to merge 19 commits intoopenclaw:mainfrom
Conversation
- Add resumeArgs to DEFAULT_CLAUDE_BACKEND for proper --resume flag usage - Fix gateway not preserving cliSessionIds/claudeCliSessionId in nextEntry - Add test for CLI session ID preservation in gateway agent handler - Update docs with new resumeArgs default
CLI backends (claude-cli etc) don't emit streaming assistant events, causing TUI to show "(no output)" despite correct processing. Now emits assistant event with final text before lifecycle end so server-chat buffer gets populated for WebSocket clients.
- TOCTOU fix: use { flag: 'wx' } for atomic file creation
- Add session-level locking for concurrent-safe writes
- Add async API: appendMessageToTranscriptAsync, appendAssistantMessageToTranscriptAsync
- Add partial failure test (orphaned user message)
- Add concurrent write test
Adds usageFields config option to CliBackendConfig allowing per-backend customization of token usage field names. This enables correct parsing of cache_creation_input_tokens from Anthropic's API (previously missed) while maintaining backwards compatibility through fallback defaults. - Add usageFields type to CliBackendConfig - Add Zod schema validation for usageFields - Configure default fields for Claude CLI (with Anthropic's actual field names) - Configure default fields for Codex CLI (OpenAI field names) - Update toUsage() to use backend config with fallback to hardcoded defaults
Previously, CLI backend responses wrote hardcoded zeros for token usage in session transcripts (input: 0, output: 0, totalTokens: 0). This caused the UI to show incorrect token counts and status to fall back to stale accumulated values. Changes: - Add usage parameter to appendMessageToTranscript and related functions in session-utils.fs.ts to accept NormalizedUsage from CLI backends - Pass result.meta.agentMeta?.usage when persisting assistant messages in agent-runner-execution.ts - Create CliSessionManager class with SDK-aligned API for future use: static factories (open/create), accessor methods, write locking - Add comprehensive tests for both session-utils.fs usage parameter and the new CliSessionManager class (17 + 2 new tests) Transcript entries now include actual input/output/cache token counts from CLI backends like claude-cli and opus.
Adds real-time streaming output support to CLI backends, enabling line-by-line parsing of NDJSON output instead of waiting for the full response. This brings CLI backends closer to the embedded/API flow by emitting events as they arrive. Key changes: - New streaming execution module (cli-runner/streaming.ts): - Uses readline to parse NDJSON lines as they arrive - Extracts session IDs, usage stats, and text from stream - Supports event type filtering with prefix matching - Maps CLI-specific events to Clawdbot agent events - Config extension: - Added `streaming?: boolean` to enable streaming mode - Added `streamingEventTypes?: string[]` to filter events - Claude CLI defaults: stream-json format with --verbose flag - Codex CLI defaults: streaming enabled with item/turn events - Event mapping for different CLI formats: - Claude CLI: tool_use, tool_result, text, result events - Codex CLI: item.*, turn.completed, thread.completed events - Debug logging throughout the pipeline: - Logs raw JSON lines, parsed types, session/usage extraction - Logs event emission and mapping decisions - Helps diagnose streaming issues in production The streaming path is enabled by default for Claude CLI and Codex CLI. Users can disable it by setting `streaming: false` in their config. Non-streaming path via runCommandWithTimeout remains available.
Match embedded flow's text processing: extract media URLs, clean directives, compute delta from cleaned text.
Conflicts resolved (kept ours): - cli-backends.ts: keep stream-json format for streaming support - agent-runner-execution.ts: keep transcript persistence + usage tracking - claude-cli-runner.test.ts: keep streaming mock expectations
Claude CLI loses its system message context when resuming a session unless the system prompt is explicitly passed on every call. Previously, we only sent it on the first call (`systemPromptWhen: "first"`), which caused resumed sessions to lose their system prompt context. Changes: - Switch from `--append-system-prompt` to `--system-prompt`: the former only appends to an existing system prompt, while the latter completely replaces it (per Claude CLI docs). This ensures consistent behavior. - Change `systemPromptWhen` from "first" to "always" so the system prompt is sent on every CLI invocation, including resumes. - Remove the redundant `!params.useResume` guard in `buildCliArgs()` - the `resolveSystemPromptUsage()` function already handles the "when to include system prompt" logic via `systemPromptWhen`.
|
Some things worth discussing / checking: New backend args
Questions
|
|
Sorry for the tag @steipete but thought I better get your eyes on this before doing any more (see "questions" above). |
CLI providers (Claude CLI) report cache_read_input_tokens as the full cached context for each turn, unlike the embedded/API flow. This change adds CLI-aware token calculation that uses the correct formula: cacheRead + cacheWrite + input = total context tokens. - Add deriveCliContextTokens() for CLI-specific calculation - Apply CLI detection via isCliProvider() before calculating - Update session-usage.ts, status.ts, session-utils.fs.ts to use CLI-aware calculation - Add verbose logging for token flow debugging
Token Calculation ChallengesWhile implementing CLI token display, we ran into some semantic ambiguity around what "tokens" should mean in different contexts. The ProblemClaude CLI returns different token fields with different semantics: {
"input_tokens": 2,
"cache_creation_input_tokens": 3538,
"cache_read_input_tokens": 52381,
"output_tokens": 5
}Meanwhile, the Anthropic API (embedded flow) returns: {
"input_tokens": 1200,
"output_tokens": 340,
"cache_creation_input_tokens": 200,
"cache_read_input_tokens": 50,
"total_tokens": 1790
}Semantic Differences
Current ImplementationFor CLI providers, we now calculate context as For embedded/API, we use the same formula but the API also provides Questions
The current fix makes CLI and embedded show similar context-based values, but wanted to flag this architectural question for review. |
The extra system prompt was unconditionally appending "Tools are disabled in this session. Do not call tools." which prevented all CLI agents from using tools they had available.
|
Closing: This PR is marked as WIP/Draft and has been open without completion. Please reopen when the work is ready for review. |
# Conflicts: # src/agents/cli-runner.ts # src/auto-reply/status.ts
|
@sebslight do not close good Prs! |
# Conflicts: # src/agents/usage.ts # src/auto-reply/reply/session-usage.ts # src/auto-reply/status.ts # src/gateway/session-utils.fs.test.ts
Resolve conflicts favoring main's lastCallUsage-based context tracking and updated resolveSessionFilePath API, while preserving HEAD's logVerbose instrumentation.
bfc1ccb to
f92900f
Compare
Summary
This PR continues CLI backend improvements from #1921, adding accurate token usage tracking and real-time streaming support.
Concurrency Hardening
Fixes race conditions in CLI transcript writes:
{ flag: 'wx' }for atomic file creation (TOCTOU fix)appendMessageToTranscriptAsync,appendAssistantMessageToTranscriptAsyncToken Usage Tracking
Fixes incorrect token display in UI (showed 558k/200k when actual was ~120k):
usageparameter toappendMessageToTranscriptfunctionsresult.meta.agentMeta?.usagewhen persisting assistant messagesCliSessionManagerclass with SDK-aligned API for future useConfigurable Usage Fields
Enables per-backend token field parsing:
usageFieldsconfig toCliBackendConfigcache_creation_input_tokens(previously missed)Streaming NDJSON Support
Adds real-time output for CLI backends:
cli-runner/streaming.tsmodule with readline-based NDJSON parsingstreaming?: boolean,streamingEventTypes?: string[]text,tool_use,result) and Codex CLI (item.*,turn.*)Reply Directives for Streaming
Matches embedded flow's text processing:
parseReplyDirectivesto streaming text eventsTest plan
AI-assisted
This PR was developed with AI assistance (Claude). The code has been tested locally with a live Clawdbot instance. I understand what all the code does.