Fix codex / opencode env names and add persistent agent_session_id fallback#894
Merged
Merged
Conversation
…llback
The agent_session_id precedence chain in TelemetryContext had two entries
checking for env vars that no agent actually exports:
CODEX_SESSION_ID → real name is CODEX_THREAD_ID (UUID v7)
OPENCODE_SESSION_ID → real name is OPENCODE_RUN_ID (UUID v4)
Both verified 2026-05-11 by capturing the env inside live codex/opencode
shell tool invocations. The wrong names matched the warehouse data:
codex was at 100% per-process fallback and opencode at ~100% because the
CLI was looking for variables that don't exist. Also adds AGENT_THREAD_ID
as a late fallback (cross-agent convention exposed by Amp and observed
in other harnesses' docs).
When no harness env var is present, the CLI now writes a persistent
session file to ~/.railway/sessions/<16-hex>.session keyed on parent
process identity (pid + boot time + argv0). Subsequent `railway`
invocations from the same parent reuse the recorded UUID, recovering
stable stitching for agents whose env doesn't propagate (notably
claude_code: 99.8% of sessions hit the per-process mint because
CLAUDE_CODE_SESSION_ID doesn't survive the Bash tool boundary).
File lifecycle:
- Written only for agent callers (tty/ci never get a file).
- Reused as long as parent pid is alive AND boot time matches.
- 7-day hard age cap as backstop against PID reuse.
- Stale files (parent gone or btime mismatch) deleted on every
invocation; directory capped at 100 files (oldest-by-mtime evicted).
- Override location via RAILWAY_SESSIONS_DIR for tests.
UUID format chosen to match the dbt-side is_unstitched_agent_session
macro: a v4 UUID does not match the cli_<22-char-base64> regex, so
persistent IDs are treated as real stitched sessions in the warehouse,
not heuristically gap-binned.
Tightens the process-tree claude substring match to require
claude-code / claude_code / anthropic.claude-code / bare `claude` argv0.
The previous bare `claude` substring over-attributed Claude Desktop
helper paths, MCP server binaries with "claude" in argv, and
~/.claude/ scripts to claude_code.
Verified end-to-end against a locally built binary across:
fresh write, same-parent reuse (UUID preserved), multi-parent isolation
(concurrent subshells get distinct files), env precedence (CODEX_THREAD_ID
and OPENCODE_RUN_ID win over disk), stale cleanup (dead-pid files
removed), tty caller suppression (no file written). 17/17 unit tests
passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI rustfmt check failed on three multiline-preference nits: the UUID format! args, an assert_eq! over 100 chars, and a multi-arg assert! over 100 chars. No behavioral change; cargo fmt --all auto-fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 11, 2026
codyde
added a commit
that referenced
this pull request
May 12, 2026
…diate parent (#896) * fix(telemetry): anchor persistent session on agent ancestor, not immediate parent #894 introduced a persistent ~/.railway/sessions/*.session fallback, but parent_identity() reads me.ppid directly, so the session is keyed on the immediate parent. For claude_code's claude_code -> bash -> railway invocation chain, the parent is the short-lived bash spawned per Bash tool call, which dies between invocations. Result: every railway call mints a fresh UUID instead of reusing the file. Warehouse confirms: ~107k single-event UUID sessions from claude_code alone in 48h after 4.57.3 shipped, with stitching empirically worse than the prior cli_<22b64> regime because the new UUIDs aren't pattern-matchable as fallbacks (the dbt is_unstitched_agent_session macro can't catch them). Fix: extract the ancestor walk into agent_ancestor_pid() (mirrors the 15-level walk in agent_from_process_tree) and anchor the persistent session on the recognized harness process — claude_code, codex, cursor, etc. These are long-lived and stable across the agent's many short-lived shell subprocesses. Falls back to the immediate parent only when no recognized agent ancestor exists, preserving stitching for unknown-but-long-lived parents. Tests cover the claude_code-via-bash and codex-via-sh chains, the no-agent fallback path, and a self-referential ppid cycle guard. 4 new tests, 21/21 telemetry tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: rustfmt the long claude_code argv line in the new test CI rustfmt failed on the multiline-preference threshold for the `node(1, "...")` argument in the claude_code anchor test. Apply cargo fmt --all auto-fix; no behavioral change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
agent_session_idprecedence chain. The CLI was checking for variables that no agent actually exports:CODEX_SESSION_ID→ real name isCODEX_THREAD_ID(UUID v7)OPENCODE_SESSION_ID→ real name isOPENCODE_RUN_ID(UUID v4)~/.railway/sessions/<16-hex>.sessionkeyed on parent process identity (pid + boot time + argv0). Subsequentrailwayinvocations from the same parent reuse the recorded UUIDAGENT_THREAD_IDto the precedence chain (cross-agent convention exposed by Amp and a few other harnesses)claudesubstring match to requireclaude-code/claude_code/anthropic.claude-code/ bareclaudeargv0 (previous bare substring was over-attributing Claude Desktop helper paths and~/.claude/scripts toclaude_code)Why
Audit on 2026-05-10 found
claude_codeevent count 21x ahead and session count 17x ahead of any other agent in the warehouse. Hex provenance check revealed the real bug: 99.8% of claude_code's "sessions" were the per-processcli_<base64>fallback becauseCLAUDE_CODE_SESSION_IDdoesn't survive the Bash tool boundary. Same shape for every other agent — and even worse for codex (100% fallback) and opencode (~100% fallback) because the CLI was looking for env var names that those agents don't actually export.Verified 2026-05-11 by capturing the env inside live shell-tool invocations of each agent:
CODEX_THREAD_ID(UUID v7)OPENCODE_RUN_ID(UUID v4)OPENCODE_SESSION_ID(does not exist)AMP_CURRENT_THREAD_IDAMP_CURRENT_THREAD_ID(correct)CURSOR_TRACE_ID(kept for forward compat)CLAUDE_CODE_SESSION_ID(UUID)CLAUDE_CODE_SESSION_ID(correct)The persistent-file fallback is the unified fix for the last row and any future agent whose env doesn't propagate. UUID format is chosen to be a v4 UUID specifically so the dbt-side
is_unstitched_agent_sessionmacro treats it as a real stitched session (not subject to gap-windowing).File lifecycle (the part worth scrutinizing in review)
is_agent_caller(caller) == true. Humans typingrailwayinteractively (tty/tty:*) and CI runs never get a fileRAILWAY_SESSIONS_DIRenv var overrides the directory location (used by tests)DO_NOT_TRACK=1orRAILWAY_NO_TELEMETRY=1) short-circuits before the file is written, so users who opted out keep nothing on diskTest plan
claude_substring_no_longer_overmatches,new_session_uuid_is_v4_format,new_session_uuid_does_not_match_cli_fallback_regex)CODEX_THREAD_IDset → file NOT written (env wins)OPENCODE_RUN_IDset → file NOT writtenttycaller → no file writtenagent_session_idvalues that look like UUIDs (notcli_<base64>) for agent-attributed trafficmcp_submit_tool/cli_*tables stitch correctly via their new env entriesCompanion PR
Analytics-side fix (gap-windowed sessionization for the unstitched fallback) lands in railwayapp/dbt-analytics#134. The two together address both producer and consumer of the problem; either can ship independently but they're best deployed close together.
🤖 Generated with Claude Code