feat(cli): CLI feature parity phase 2 - usage tracking and streaming (AI-assisted) by rmorse · Pull Request #2352 · openclaw/openclaw

rmorse · 2026-01-26T19:41:10Z

Summary

This PR continues CLI backend improvements from #1921, adding accurate token usage tracking and real-time streaming support.

Concurrency Hardening

Fixes race conditions in CLI transcript writes:

Uses { flag: 'wx' } for atomic file creation (TOCTOU fix)
Adds session-level locking for concurrent-safe writes
New async API: appendMessageToTranscriptAsync, appendAssistantMessageToTranscriptAsync
Tests for partial failure (orphaned user message) and concurrent writes

Token Usage Tracking

⚠️ WIP: Token display is improved but not fully resolved - some edge cases remain

Fixes incorrect token display in UI (showed 558k/200k when actual was ~120k):

Adds usage parameter to appendMessageToTranscript functions
Passes result.meta.agentMeta?.usage when persisting assistant messages
Creates CliSessionManager class with SDK-aligned API for future use
Transcript entries now contain actual input/output/cache token counts

Configurable Usage Fields

Enables per-backend token field parsing:

Adds usageFields config to CliBackendConfig
Handles different API response formats (Anthropic vs OpenAI field names)
Correctly parses cache_creation_input_tokens (previously missed)
Maintains backwards compatibility via fallback defaults

Streaming NDJSON Support

⚠️ TODO: Non-streaming path (streaming: false) is currently broken - needs fix before merge

Adds real-time output for CLI backends:

New cli-runner/streaming.ts module with readline-based NDJSON parsing
Emits events as they arrive instead of waiting for full response
Config options: streaming?: boolean, streamingEventTypes?: string[]
Event mapping for Claude CLI (text, tool_use, result) and Codex CLI (item.*, turn.*)
Debug logging throughout pipeline for production diagnostics

Reply Directives for Streaming

Matches embedded flow's text processing:

Applies parseReplyDirectives to streaming text events
Extracts media URLs, cleans directives, computes delta from cleaned text

Test plan

Unit tests for CliSessionManager (17 tests)
Unit tests for session-utils.fs usage parameter
Unit tests for CLI streaming module
Unit tests for agent-runner-execution CLI persistence
Tested locally with claude-cli backend - tokens display improved
Verified streaming events emit in real-time
TODO: Fix and test non-streaming path
TODO: Verify token display edge cases resolved

AI-assisted

This PR was developed with AI assistance (Claude). The code has been tested locally with a live Clawdbot instance. I understand what all the code does.

- Add resumeArgs to DEFAULT_CLAUDE_BACKEND for proper --resume flag usage - Fix gateway not preserving cliSessionIds/claudeCliSessionId in nextEntry - Add test for CLI session ID preservation in gateway agent handler - Update docs with new resumeArgs default

CLI backends (claude-cli etc) don't emit streaming assistant events, causing TUI to show "(no output)" despite correct processing. Now emits assistant event with final text before lifecycle end so server-chat buffer gets populated for WebSocket clients.

- TOCTOU fix: use { flag: 'wx' } for atomic file creation - Add session-level locking for concurrent-safe writes - Add async API: appendMessageToTranscriptAsync, appendAssistantMessageToTranscriptAsync - Add partial failure test (orphaned user message) - Add concurrent write test

Adds usageFields config option to CliBackendConfig allowing per-backend customization of token usage field names. This enables correct parsing of cache_creation_input_tokens from Anthropic's API (previously missed) while maintaining backwards compatibility through fallback defaults. - Add usageFields type to CliBackendConfig - Add Zod schema validation for usageFields - Configure default fields for Claude CLI (with Anthropic's actual field names) - Configure default fields for Codex CLI (OpenAI field names) - Update toUsage() to use backend config with fallback to hardcoded defaults

Previously, CLI backend responses wrote hardcoded zeros for token usage in session transcripts (input: 0, output: 0, totalTokens: 0). This caused the UI to show incorrect token counts and status to fall back to stale accumulated values. Changes: - Add usage parameter to appendMessageToTranscript and related functions in session-utils.fs.ts to accept NormalizedUsage from CLI backends - Pass result.meta.agentMeta?.usage when persisting assistant messages in agent-runner-execution.ts - Create CliSessionManager class with SDK-aligned API for future use: static factories (open/create), accessor methods, write locking - Add comprehensive tests for both session-utils.fs usage parameter and the new CliSessionManager class (17 + 2 new tests) Transcript entries now include actual input/output/cache token counts from CLI backends like claude-cli and opus.

Adds real-time streaming output support to CLI backends, enabling line-by-line parsing of NDJSON output instead of waiting for the full response. This brings CLI backends closer to the embedded/API flow by emitting events as they arrive. Key changes: - New streaming execution module (cli-runner/streaming.ts): - Uses readline to parse NDJSON lines as they arrive - Extracts session IDs, usage stats, and text from stream - Supports event type filtering with prefix matching - Maps CLI-specific events to Clawdbot agent events - Config extension: - Added `streaming?: boolean` to enable streaming mode - Added `streamingEventTypes?: string[]` to filter events - Claude CLI defaults: stream-json format with --verbose flag - Codex CLI defaults: streaming enabled with item/turn events - Event mapping for different CLI formats: - Claude CLI: tool_use, tool_result, text, result events - Codex CLI: item.*, turn.completed, thread.completed events - Debug logging throughout the pipeline: - Logs raw JSON lines, parsed types, session/usage extraction - Logs event emission and mapping decisions - Helps diagnose streaming issues in production The streaming path is enabled by default for Claude CLI and Codex CLI. Users can disable it by setting `streaming: false` in their config. Non-streaming path via runCommandWithTimeout remains available.

Match embedded flow's text processing: extract media URLs, clean directives, compute delta from cleaned text.

Conflicts resolved (kept ours): - cli-backends.ts: keep stream-json format for streaming support - agent-runner-execution.ts: keep transcript persistence + usage tracking - claude-cli-runner.test.ts: keep streaming mock expectations

Claude CLI loses its system message context when resuming a session unless the system prompt is explicitly passed on every call. Previously, we only sent it on the first call (`systemPromptWhen: "first"`), which caused resumed sessions to lose their system prompt context. Changes: - Switch from `--append-system-prompt` to `--system-prompt`: the former only appends to an existing system prompt, while the latter completely replaces it (per Claude CLI docs). This ensures consistent behavior. - Change `systemPromptWhen` from "first" to "always" so the system prompt is sent on every CLI invocation, including resumes. - Remove the redundant `!params.useResume` guard in `buildCliArgs()` - the `resolveSystemPromptUsage()` function already handles the "when to include system prompt" logic via `systemPromptWhen`.

rmorse · 2026-01-27T11:24:34Z

Some things worth discussing / checking:

New backend args

systemPromptArg: "--system-prompt",
usageFields: {
    input: ["input_tokens", "inputTokens"],
    output: ["output_tokens", "outputTokens"],
    cacheRead: ["cache_read_input_tokens", "cached_input_tokens", "cacheRead"],
    cacheWrite: ["cache_creation_input_tokens", "cache_write_input_tokens", "cacheWrite"],
    total: ["total_tokens", "total"],
  },
  streaming: true,
  streamingEventTypes: ["tool_use", "tool_result", "text", "result"],
  streamingFormat: {
    text: {
      eventTypes: ["assistant"],
      contentPath: "message.content",
      matchType: "text",
      textField: "text",
    },
    toolUse: {
      eventTypes: ["assistant"],
      contentPath: "message.content",
      matchType: "tool_use",
      idField: "id",
      nameField: "name",
      inputField: "input",
    },
    toolResult: {
      eventTypes: ["user"],
      contentPath: "message.content",
      matchType: "tool_result",
      idField: "tool_use_id",
      outputField: "content",
      isErrorField: "is_error",
    },
  },

usageFields - defines what fields are used to extract context usage from the reponse json
streaming + streamingEventTypes - enable streaming for the backend, and which entries are considered as events we want to capture
streamingFormat - this might need some work - tried to figure out a cli agnostic way to define how to the parse the stream json result and extract the relevant data
systemPromptArg - changed Claude's system prompt arg to "--system-prompt" , which should replace any existing prompts - but, on my max sub at least, --append-system-prompt and --system-prompt work the same - they only append.

Questions

If the cli backend wasn't fully implemented, do we need backwards compat?
I made a CliSessionManager class (unused) to roughly match the api surface of the pi SessionManager class - but its going to need updates across multiple files - should we go with it, or leave as is with our modifications in src\gateway\session-utils.fs.ts instead?
For streaming, we re-use parseReplyDirectives, output is a bit inconsistent, but better re-use what we already have?
Token usage for displaying to the user - having a bit of hard time getting this right, using the existing calculations everythinng is coming out way off, adding a custom calculation works (matches claude codes reported context usage) - is this what we want?
- Are the original calculations for embedded functioning/accurate? Seems its not (or direct api usage works differently to CC cli with sub at least)

rmorse · 2026-01-27T11:27:25Z

Sorry for the tag @steipete but thought I better get your eyes on this before doing any more (see "questions" above).

CLI providers (Claude CLI) report cache_read_input_tokens as the full cached context for each turn, unlike the embedded/API flow. This change adds CLI-aware token calculation that uses the correct formula: cacheRead + cacheWrite + input = total context tokens. - Add deriveCliContextTokens() for CLI-specific calculation - Apply CLI detection via isCliProvider() before calculating - Update session-usage.ts, status.ts, session-utils.fs.ts to use CLI-aware calculation - Add verbose logging for token flow debugging

rmorse · 2026-01-27T11:49:49Z

Token Calculation Challenges

While implementing CLI token display, we ran into some semantic ambiguity around what "tokens" should mean in different contexts.

The Problem

Claude CLI returns different token fields with different semantics:

{
  "input_tokens": 2,
  "cache_creation_input_tokens": 3538,
  "cache_read_input_tokens": 52381,
  "output_tokens": 5
}

Meanwhile, the Anthropic API (embedded flow) returns:

{
  "input_tokens": 1200,
  "output_tokens": 340,
  "cache_creation_input_tokens": 200,
  "cache_read_input_tokens": 50,
  "total_tokens": 1790
}

Semantic Differences

Metric	Formula	Purpose
API `total_tokens`	`input + output + cacheRead + cacheWrite`	Billing - all tokens consumed
"Context" for display	`input + cacheRead + cacheWrite`	What's in the context window (excludes output)

Current Implementation

For CLI providers, we now calculate context as cacheRead + cacheWrite + input (what's in the context window this turn).

For embedded/API, we use the same formula but the API also provides total_tokens which includes output.

Questions

What should the UI's "tokens" display represent?
- Context tokens (what's in the window) = input + cacheRead + cacheWrite
- Billing tokens (all consumed) = input + output + cacheRead + cacheWrite
Should we add a separate contextTokens field to NormalizedUsage to distinguish from billing total?
Are there other consumers of totalTokens that expect specific semantics?

The current fix makes CLI and embedded show similar context-based values, but wanted to flag this architectural question for review.

The extra system prompt was unconditionally appending "Tools are disabled in this session. Do not call tools." which prevented all CLI agents from using tools they had available.

sebslight · 2026-01-28T16:28:45Z

Closing: This PR is marked as WIP/Draft and has been open without completion. Please reopen when the work is ready for review.

# Conflicts: # src/agents/cli-runner.ts # src/auto-reply/status.ts

steipete · 2026-02-02T17:38:16Z

@sebslight do not close good Prs!

# Conflicts: # src/agents/usage.ts # src/auto-reply/reply/session-usage.ts # src/auto-reply/status.ts # src/gateway/session-utils.fs.test.ts

Resolve conflicts favoring main's lastCallUsage-based context tracking and updated resolveSessionFilePath API, while preserving HEAD's logVerbose instrumentation.

rmorse and others added 9 commits January 25, 2026 20:58

test: update CLI runner test to expect --resume for session resume

1bed846

test: cover CLI chat delta event (openclaw#1921) (thanks @rmorse)

c1b98ff

feat(cli): apply parseReplyDirectives to streaming text events

da967db

Match embedded flow's text processing: extract media URLs, clean directives, compute delta from cleaned text.

openclaw-barnacle bot added docs Improvements or additions to documentation app: web-ui App: web-ui gateway Gateway runtime labels Jan 26, 2026

rmorse added 2 commits January 26, 2026 19:48

Merge branch 'main' into feat/cli-feature-parity

d415afd

Conflicts resolved (kept ours): - cli-backends.ts: keep stream-json format for streaming support - agent-runner-execution.ts: keep transcript persistence + usage tracking - claude-cli-runner.test.ts: keep streaming mock expectations

openclaw-barnacle bot added agents Agent runtime and tooling and removed docs Improvements or additions to documentation labels Jan 27, 2026

rmorse added 2 commits January 27, 2026 16:30

fix(cli): remove hardcoded tool disabling from CLI agents

ad69a6a

The extra system prompt was unconditionally appending "Tools are disabled in this session. Do not call tools." which prevented all CLI agents from using tools they had available.

Merge branch 'main' into feat/cli-feature-parity

b53f573

sebslight closed this Jan 28, 2026

Merge remote-tracking branch 'origin/main' into feat/cli-feature-parity

3356330

# Conflicts: # src/agents/cli-runner.ts # src/auto-reply/status.ts

steipete reopened this Feb 2, 2026

rmorse added 3 commits February 3, 2026 12:12

fix: resolve merge conflicts from main into feat/cli-feature-parity

f995909

Merge branch 'main' into feat/cli-feature-parity

9f4813f

# Conflicts: # src/agents/usage.ts # src/auto-reply/reply/session-usage.ts # src/auto-reply/status.ts # src/gateway/session-utils.fs.test.ts

Merge remote-tracking branch 'origin/main' into feat/cli-feature-parity

6864db3

Merge origin/main into feat/cli-feature-parity

c481dcf

Resolve conflicts favoring main's lastCallUsage-based context tracking and updated resolveSessionFilePath API, while preserving HEAD's logVerbose instrumentation.

openclaw-barnacle bot added the size: XL label Feb 13, 2026

thewilloftheshadow force-pushed the main branch from bfc1ccb to f92900f Compare February 15, 2026 18:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(cli): CLI feature parity phase 2 - usage tracking and streaming (AI-assisted)#2352

feat(cli): CLI feature parity phase 2 - usage tracking and streaming (AI-assisted)#2352
rmorse wants to merge 19 commits intoopenclaw:mainfrom
rmorse:feat/cli-feature-parity

rmorse commented Jan 26, 2026

Uh oh!

rmorse commented Jan 27, 2026 •

edited

Loading

Uh oh!

rmorse commented Jan 27, 2026

Uh oh!

rmorse commented Jan 27, 2026

Uh oh!

sebslight commented Jan 28, 2026

Uh oh!

steipete commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

rmorse commented Jan 26, 2026

Summary

Concurrency Hardening

Token Usage Tracking

Configurable Usage Fields

Streaming NDJSON Support

Reply Directives for Streaming

Test plan

AI-assisted

Uh oh!

rmorse commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New backend args

Questions

Uh oh!

rmorse commented Jan 27, 2026

Uh oh!

rmorse commented Jan 27, 2026

Token Calculation Challenges

The Problem

Semantic Differences

Current Implementation

Questions

Uh oh!

sebslight commented Jan 28, 2026

Uh oh!

steipete commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rmorse commented Jan 27, 2026 •

edited

Loading