Improve agent caller detection across MCP, env, and process tree by codyde · Pull Request #885 · railwayapp/cli

codyde · 2026-05-08T06:40:02Z

Summary

Comprehensive rewrite of the CLI's caller detection. Replaces the old single-pass env/ps-walk with a six-layer pipeline and routes JSON-RPC clientInfo from railway mcp tool events into the telemetry payload so MCP-driven calls get tagged authoritatively from the handshake.

Validated live against 11 agent harnesses (Claude Code CLI/Desktop, Claude-in-Cursor-terminal, Cursor agent mode, OpenCode, Amp, Codex CLI/Desktop, Pi, Copilot CLI, Factory Droid). 11/11 detected correctly. Factory Droid is the canonical case the old detector would have leaked into agent_subprocess.

Pairs with railwayapp/dbt-analytics#128 which normalizes the new caller taxonomy in fct_agentic_events / dim_agent_session.

What changed

Detection pipeline (`src/telemetry.rs`)

Six layers, evaluated in order — first match wins:

RAILWAY_CALLER env override (existing, unchanged)
Strong agent env signals — CLAUDECODE, CURSOR_AGENT, CODEX_SANDBOX, OPENCODE, AMP_CURRENT_THREAD_ID, PI_CODING_AGENT, __COG_BASHRC_SOURCED (Devin), AI_AGENT, COPILOT_CLI, AIDER, FACTORY_DROID, GEMINI_CLI, REPLIT_AGENT. Includes a fix for a typo where the old code looked for CLAUDECODE_SESSION_ID (doesn't exist) instead of CLAUDE_CODE_SESSION_ID.
Process-tree walk — single ps -A snapshot (replaces N per-hop spawns) parsed into a HashMap, walks up to 15 ancestors. Matches against full argv (catches node-bundled agents like node /path/cursor-agent) plus exact-basename matching for short generic names (droid, pi, amp).
AI-IDE host detection — __CFBundleIdentifier (Cursor com.todesktop.230313mzl4w4u92, Windsurf, VS Code, Claude Desktop, JetBrains, Zed), TERM_PROGRAM, TERMINAL_EMULATOR=JetBrains-JediTerm. Combined with isatty(stdout) to disambiguate human vs subprocess in the same IDE: tty:cursor vs agent_unknown:cursor.
Cloud-IDE / sandbox env — REPL_ID, CODESPACES, CLOUD_SHELL, MONOSPACE_ENV (Firebase Studio), ANTIGRAVITY_CLI_ALIAS.
CI provider — GITHUB_ACTIONS, GITLAB_CI, CIRCLECI, BUILDKITE, JENKINS_URL, TRAVIS, TF_BUILD, CODEBUILD_BUILD_ID, NETLIFY, VERCEL, RAILWAY_*, etc.
Bucketed fallback — interactive shell with no IDE → tty. Non-interactive subprocess buckets by parent interpreter (agent_unknown:python, agent_unknown:node, agent_unknown:shell, agent_unknown:ruby, ...) so even unidentified harnesses give us a useful axis.

MCP `clientInfo` is now authoritative for MCP events

src/commands/mcp/handler.rs snapshots context.peer.peer_info().client_info and threads it into a new send_mcp_tool_with_client. The clientInfo name (per the MCP JSON-RPC spec, every client must send it during initialize) maps to the canonical caller value: claude-ai → claude_code (or claude_desktop based on env), codex-mcp-client → codex, Cline → cline, Roo Code → roo_code, kilo → kilo_code, opencode → opencode, continue-client → continue_dev, Visual Studio Code(...) → vscode_copilot / vscode_insiders, windsurf → windsurf, etc. Unknown clients land on mcp_unknown so we can debug new entrants in Hex.

Sub-bucketed caller vocabulary

Caller is still a bounded string — backboard accepts colons, dbt-analytics normalizes downstream. New colon-prefixed forms:

Prefix	Examples	Class
(none)	`claude_code`, `cursor`, `codex`, `factory_droid`, `amp`, `pi`, `copilot_cli`, `aider`, `windsurf`, `gemini_cli`, ...	`agent_named`
`tty:`	`tty:cursor`, `tty:vscode`, `tty:zed`, `tty:jetbrains`, `tty:trae`, `tty:ghostty`, ...	`human`
`agent_unknown:`	`agent_unknown:vscode`, `agent_unknown:cursor`, `agent_unknown:python`, `agent_unknown:node`, `agent_unknown:shell`, ...	`agent_unknown`
`cloud_ide:`	`cloud_ide:codespaces`, `cloud_ide:replit`, `cloud_ide:cloud_shell`, ...	`cloud_ide`
`ci:`	`ci:github_actions`, `ci:gitlab`, `ci:circle`, `ci:buildkite`, `ci:railway`, ...	`ci`
`mcp_unknown`	(literal value)	`agent_unknown`

Diagnostic script

scripts/diagnose-caller.sh collects every signal the new detector reads (env vars, IDE host indicators, CI markers, TTY status, full process ancestry). Drop-in for ground-truth validation against any agent harness — used during this PR to confirm 11/11 attribution.

Live validation results

Agent	Detected caller	Layer fired
Claude Code (CLI)	`claude_code`	L2 env
Claude Code Desktop	`claude_code`	L2 env
Claude Code in Cursor's terminal	`claude_code`	L2 env (correctly chooses agent over IDE host)
Cursor (agent mode)	`cursor`	L2 env
OpenCode	`opencode`	L2 env
Amp	`amp`	L2 env
Codex CLI	`codex`	L2 env
Codex Desktop	`codex`	L2 env
Pi	`pi`	L2 env
GitHub Copilot CLI	`copilot_cli`	L2 env
Factory Droid	`factory_droid`	L3 process tree (the case this PR fixes)

Notable finding: Copilot CLI is the only tested agent that hands its child a real PTY (stdout_tty=true). All others pipe stdout. This validates that the TTY check alone is not a reliable agent-vs-human discriminator — we use it only as a tiebreaker for the IDE-host bucket.

Test plan

14 telemetry unit tests covering env detection, MCP clientInfo mapping, full-argv process matching, parent-kind bucketing, sub-bucket prefix classification, and ps snapshot parsing
All 183 existing tests pass
Live validation against 11 agent harnesses via scripts/diagnose-caller.sh
Build clean on macOS / Ubuntu / Windows in CI
Once merged + released, verify in Hex that the new caller values flow through fct_agentic_events.caller and that dbt-analytics#128's caller_class / caller_agent / caller_subkind columns populate as expected

Background

RFC: Agentic Loop Telemetry: MCP + CLI. Companion PRs: railwayapp/dbt-analytics#128 (taxonomy normalization in marts).

🤖 Generated with Claude Code

Restructures `detect_caller` into a six-layer pipeline (RAILWAY_CALLER → strong env → process tree → IDE host → cloud IDE → CI provider → bucketed fallback) and routes JSON-RPC `clientInfo` from `railway mcp` tool events into a new `send_mcp_tool_with_client` so MCP-driven calls get tagged authoritatively from the handshake instead of relying on heuristics. Caller value semantics are extended with colon-suffixed sub-buckets so unattributed events still carry a useful slicing axis: `tty:cursor`, `tty:vscode`, `agent_unknown:vscode`, `agent_unknown:python`, `ci:github_actions`, `cloud_ide:codespaces`, etc. The detector also expands the env-var table (`__COG_BASHRC_SOURCED`, `AI_AGENT`, `CLAUDE_CODE_SESSION_ID` — fixing a long-standing typo where the old code looked for the non-existent `CLAUDECODE_SESSION_ID`), adds a one-shot `ps -A` snapshot in place of N per-hop spawns, matches against full argv (catching `node /path/to/cursor-agent`), increases the ancestor walk to 15 hops, and adds basename matching for short generic agent names (Factory Droid's `droid`, Pi's `pi`, Amp's `amp`). Validated live against 11 agent harnesses (Claude Code CLI/Desktop, Cursor, OpenCode, Amp, Codex CLI/Desktop, Pi, Copilot CLI, Factory Droid, Claude-in-Cursor-terminal): 11/11 detected correctly. Factory Droid is the case the old detector would have leaked into `agent_subprocess`. Diagnostic script for future ground-truth checks lives at `scripts/diagnose-caller.sh`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wraps long || chains and assert_eq! arguments per CI's cargo fmt diff. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Walks ancestors signature and MCP client peer_info chain wrapped per cargo fmt. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add local MCP tool telemetry

1af743f

codyde added the release/minor Author minor release label May 8, 2026

codyde and others added 3 commits May 8, 2026 00:40

Apply rustfmt to telemetry detector

dd194ce

Wraps long || chains and assert_eq! arguments per CI's cargo fmt diff. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Apply remaining rustfmt wraps

86fff25

Walks ancestors signature and MCP client peer_info chain wrapped per cargo fmt. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codyde changed the title ~~Add local MCP tool telemetry~~ Improve agent caller detection across MCP, env, and process tree May 8, 2026

codyde merged commit 0065458 into master May 8, 2026
6 checks passed

codyde deleted the cody/local-mcp-tool-telemetry branch May 8, 2026 08:05

codyde mentioned this pull request May 8, 2026

Drive upgrade when cli.new --agents finds an out-of-date Railway CLI #886

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve agent caller detection across MCP, env, and process tree#885

Improve agent caller detection across MCP, env, and process tree#885
codyde merged 4 commits into
masterfrom
cody/local-mcp-tool-telemetry

codyde commented May 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

codyde commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Detection pipeline (src/telemetry.rs)

MCP clientInfo is now authoritative for MCP events

Sub-bucketed caller vocabulary

Diagnostic script

Live validation results

Test plan

Background

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codyde commented May 8, 2026 •

edited

Loading

Detection pipeline (`src/telemetry.rs`)

MCP `clientInfo` is now authoritative for MCP events