feat(telemetry): fingerprint sandbox runtime and agent vendor#1035
Conversation
Add two new properties to every CLI telemetry event so we can tell
managed-sandbox traffic (Codex Cloud, Claude Code Web, etc.) apart from
real developer laptops without geolocation guesswork:
- sandbox_runtime: 'gvisor' | 'firecracker' | 'docker' | 'kvm' | 'wsl' | null
gVisor detected via kernel string ('4.19.0-gvisor' or legacy Sentry
'4.4.0') + /proc/version. Firecracker via /dev/vsock + DMI sys_vendor.
Docker reuses the existing /.dockerenv + cgroup probe.
- agent_runtime: claude_code | codex | cursor | copilot_agent | jules
| replit | devin | aider | gemini_cli | hermes | openclaw | null
Detected by the EXISTENCE of well-known vendor env vars only — values
are never read. Hermes rule keys on HERMES_QUIET=1 (set unconditionally
at hermes-agent/cli.py:50). openclaw rule keys on OPENCLAW_STATE_DIR
or OPENCLAW_CONFIG_PATH (set explicitly in the spawned child env at
openclaw/extensions/qa-matrix/src/runners/contract/scenario-runtime-cli.ts).
Drive-by cleanups required by fallow because system.ts and client.ts
fall into the audit scope of this PR:
- Extract detectWSL into platform.ts to break the system.ts ↔ agent_runtime.ts cycle.
- Refactor detectCI / getCIName into a single CI_PROVIDERS table.
- Dedupe flush / flushSync via a shared drainQueueToPayload helper.
Privacy posture unchanged: HYPERFRAMES_NO_TELEMETRY=1 still opts out;
disclosure in docs/packages/cli.mdx updated to enumerate the new fields.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
miguel-heygen
left a comment
There was a problem hiding this comment.
Review: feat(telemetry): fingerprint sandbox runtime and agent vendor
Nice work — the PR description is excellent and the detection approach is well-researched. A few things:
kernel === "4.4.0" false positive risk
isGVisor() returns true for any Linux box reporting kernel 4.4.0 without checking /proc/version:
function isGVisor(): boolean {
const kernel = release();
if (kernel === "4.4.0" || kernel.includes("gvisor")) return true; // ← 4.4.0 skips /proc/version
...While the comment says "no production Linux box reports either today," 4.4.0 is a real upstream LTS kernel (2016). Embedded systems, Raspberry Pi images, and older distros could report it. The /proc/version backup check is more reliable — should the 4.4.0 branch also require the /proc/version confirmation?
// Suggestion: only trust the exact gvisor string without backup
if (kernel.includes("gvisor")) return true;
if (kernel === "4.4.0" && platform() === "linux") {
try { return readFileSync("/proc/version", "utf-8").includes("gVisor"); }
catch { return false; }
}This way the legacy Sentry kernel still gets detected, but only after confirming via /proc/version.
Missing sandbox detection tests
Only gVisor has test coverage. Docker, Firecracker, KVM, and WSL are untested. These are straightforward file-existence checks, but since the gVisor tests set a good precedent with vi.doMock("node:os"), would be nice to see at least one Docker test (mock existsSync("/.dockerenv")) and one negative case (plain Linux → null).
HERMES_QUIET value check
Most rules check existence (typeof env[key] === "string"), but Hermes checks an exact value (env["HERMES_QUIET"] === "1"). If Hermes ever changes that value (or sets it to "true"), this breaks silently. Consider typeof env["HERMES_QUIET"] === "string" for consistency, since the var name itself is specific enough.
CI_PROVIDERS type nit
The truthy and presence properties are both optional booleans — a provider entry could have neither set (matching nothing). A discriminated union would be stricter:
type CIProvider =
| { name: string | null; envVar: string; mode: "truthy" }
| { name: string | null; envVar: string; mode: "presence" };Not blocking, just cleaner.
Everything else looks good
drainQueueToPayloaddedup is cleandetectWSLextraction toplatform.tsis the right call for the import cycle- Privacy posture (existence-only,
$ip: null, opt-out respected) is solid - Test that secret-shaped values never leak is a great pattern
- CI_PROVIDERS table is much cleaner than the OR-chain
- The
copilot_agentrequiring bothGITHUB_ACTIONS+COPILOT_AGENT_IDis correct — tested and documented
Addresses PR feedback from @magi: kernel string `4.4.0` is also the Ubuntu 16.04 LTS / older-real-kernel version, so accepting it alone false-positives. Now `4.4.0` only counts as gVisor when /proc/version also contains "gVisor". `*-gvisor` kernel strings remain standalone- sufficient since no real production kernel reports them. Adds a regression test that an Ubuntu 16.04 box reporting `Linux version 4.4.0-1128-aws (buildd@lcy01)` is NOT classified as a gVisor sandbox. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-ups from @miguel-heygen's review: 1. HERMES_QUIET — switch to existence check. `env["HERMES_QUIET"] === "1"` was brittle vs. future Hermes changes (e.g. if cli.py ever sets it to "true"). The var name itself is specific enough that existence is the right signal. 2. CI_PROVIDERS — convert to a discriminated union. `mode: "truthy" | "presence"` is stricter than the previous pair of optional boolean flags (which allowed entries with neither set). 3. Sandbox detection tests — add coverage. - Docker positive: /.dockerenv present → docker. - Negative case: plain Linux laptop with no markers → null. Together with the gVisor 4.4.0 fix in the previous commit, that addresses all three actionable callouts (the discriminated-union nit was non-blocking but worth doing while in the file). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks for the thorough review @miguel-heygen — all four points addressed across two follow-up commits.
|
Audit of every detection rule in the registry against actual vendor source code. Rules that lacked a public-source citation were guesses and have been removed; surviving rules now all cite the file + line that emits the marker. Codex — replace per @magi's investigation: - Drop CODEX_HOME (config override read at startup, NOT propagated to child processes — would miss most Codex invocations). - Drop CODEX_SANDBOX (macOS Seatbelt only; covered by the others). - Add CODEX_THREAD_ID (set unconditionally on every spawned shell command — codex-rs/protocol/src/shell_environment.rs:6 + codex-rs/core/src/unified_exec/process_manager.rs:1010). - Add CODEX_CI (hardcoded in UNIFIED_EXEC_ENV — process_manager.rs:70). - Keep CODEX_SANDBOX_NETWORK_DISABLED (default-on sandbox marker — codex-rs/core/src/sandboxing/mod.rs:135-138). Cursor — drop unverified CURSOR_TRACE_ID and CURSOR_AGENT guesses. Keep TERM_PROGRAM=cursor (set by Cursor's integrated terminal). Pi — new rule. https://github.com/earendil-works/pi packages/coding-agent/src/cli.ts:13 unconditionally executes process.env.PI_CODING_AGENT = "true"; at module entry, so every subprocess Pi spawns sees this marker. Same propagation pattern as Hermes. Removed (no source-cited marker found in this audit): - aider — verified Aider sets no AIDER_* env vars; only OR_SITE_URL and OR_APP_NAME (OpenRouter integration). No reliable marker. - gemini_cli — GEMINI_SANDBOX/GEMINI_CLI_TRUST_WORKSPACE are conditional on CLI flags; no unconditional marker found. - jules, devin — closed source, no public marker documentation. These vendors can be re-added later with a source citation; absence in the registry will silently false-negative (events land in the null bucket), but won't false-positive on other vendors. Per @james-russo's review: do source-level research before shipping detection rules. Memory updated to enforce this for future work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
miguel-heygen
left a comment
There was a problem hiding this comment.
All four review points addressed:
- gVisor
4.4.0false positive — Fixed.isGVisor()now only trustskernel.includes("gvisor")directly; the4.4.0path falls through to the/proc/versioncheck. Test added for Ubuntu 16.04 negative case. ✅ - Sandbox detection tests — Docker (
.dockerenvmock) and plain-Linux-null tests added. ✅ - Hermes value check — Changed from
=== "1"totypeof === "string". ✅ - CI_PROVIDERS type — Not addressed, but not blocking.
Codex detection rebuilt with source-verified signals (CODEX_THREAD_ID, CODEX_CI, CODEX_SANDBOX_NETWORK_DISABLED). Unverified vendors (aider, gemini_cli, jules, devin) dropped — right call. Pi added with source citation.
The copilot_agent caveat in the code comment is honest ("not yet verified from a public-source citation") — fine to ship as-is since the var names are specific enough to not false-positive.
Ship it.
Three follow-ups from @miguel-heygen's review: 1. HERMES_QUIET — switch to existence check. `env["HERMES_QUIET"] === "1"` was brittle vs. future Hermes changes (e.g. if cli.py ever sets it to "true"). The var name itself is specific enough that existence is the right signal. 2. CI_PROVIDERS — convert to a discriminated union. `mode: "truthy" | "presence"` is stricter than the previous pair of optional boolean flags (which allowed entries with neither set). 3. Sandbox detection tests — add coverage. - Docker positive: /.dockerenv present → docker. - Negative case: plain Linux laptop with no markers → null. Together with the gVisor 4.4.0 fix in the previous commit, that addresses all three actionable callouts (the discriminated-union nit was non-blocking but worth doing while in the file). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
vanceingalls
left a comment
There was a problem hiding this comment.
Post-merge advisory review (this PR is already merged into main). Findings flagged for follow-up, not gating.
@miguel-heygen's review covered the substantive pre-merge issues (gVisor 4.4.0 false positive, missing sandbox tests, Hermes existence check, CI_PROVIDERS type) — all fixed before merge. Below are the gaps not yet on the PR.
Strengths
- Detection is well-sourced throughout — every rule in
VENDOR_RULEShas afile:linecitation from the upstream repo, and the decision to drop unverified vendors (aider, gemini_cli, jules, devin) in favour of a clean false-negative is explicitly documented (agent_runtime.ts~line 65). That discipline is correct. drainQueueToPayloaddedup inclient.tseliminates real divergence risk betweenflushandflushSync— good refactor with zero behaviour change.- The "existence only, never read values" invariant is tested with a secret-shaped sibling var. Good pattern.
Findings
should-be-follow-up-ticket — cursor and replit rules conflate IDE-usage with agent-driven invocation (agent_runtime.ts lines ~93, ~105)
TERM_PROGRAM=cursor is exported by Cursor's integrated terminal for any human typing in it, not just Cursor Background Agent. REPL_ID / REPLIT_USER are present in any Replit workspace regardless of whether a human or an agent is at the keyboard.
The PR description frames agent_runtime as distinguishing "ephemeral cloud sandbox driving the CLI" from "real developer laptop." These two rules don't achieve that — they identify the host environment, not whether an agent is in the loop. A human using Cursor's terminal or coding in Replit will be counted in the same bucket as a Cursor/Replit background agent.
Downstream effect: the "DAU by agent_runtime" tile proposed in the PR will overcount agentic invocations for these two vendors. Worth documenting the semantic as "associated with agent vendor's ecosystem" rather than "driven by an agent" in the field's JSDoc comment, and noting the limitation in the dashboard tile description.
should-be-follow-up-ticket (nit) — duplicate Docker detection (system.ts:detectDocker vs agent_runtime.ts:isDocker)
system.ts has detectDocker() (~line 95, used for the existing is_docker field) and agent_runtime.ts has isDocker() (~line 156, used for sandbox_runtime). Both check /.dockerenv + /proc/1/cgroup for the same strings. Two implementations of the same predicate that can drift independently. detectDocker() in system.ts could import from agent_runtime.ts or both could delegate to the shared platform.ts helper, the same way detectWSL was extracted.
nit — "once at module load" claim in PR description doesn't match the code
PR description says: "Detection is cheap (one cached read per process), runs once at module load." Actual execution is lazy — detectSandboxRuntime() and detectAgentRuntime() are called inside getSystemMeta() (system.ts:71-72), which runs on first access (after cached == null). Not wrong in practice, just slightly misleading documentation. Module-load caching would require a top-level const assignment; this is first-call caching. Low stakes.
nit — copilot_agent self-citation gap (agent_runtime.ts ~line 83)
The code comment says: "Not yet verified from a public-source citation in this audit; the var names below match GitHub Copilot Coding Agent documentation but should be confirmed." James acknowledged in the PR response that this is fine to ship given the specificity of COPILOT_AGENT_ID. Agreed — but the comment is a paper trail that the contract isn't confirmed. A follow-up issue to verify or cite (or add "unverified" to the JSDoc the way other unverified vendors were simply dropped) would close the loop cleanly.
Verdict: COMMENT (post-merge: With fixes)
Reasoning: Core correctness issues were fixed pre-merge by Miguel + James. The remaining gaps are semantic (Cursor/Replit over-counting) and hygiene (duplicate Docker predicate, doc mismatch) — none block product value, but the semantic one is worth a note in the dashboard tile before drawing agentic-DAU conclusions.
— Vai

What
Two new properties on every CLI telemetry event:
sandbox_runtime:'gvisor' | 'firecracker' | 'docker' | 'kvm' | 'wsl' | null— what kind of managed sandbox is hosting this CLI invocation.agent_runtime:'claude_code' | 'codex' | 'cursor' | 'copilot_agent' | 'jules' | 'replit' | 'devin' | 'aider' | 'gemini_cli' | 'hermes' | 'openclaw' | null— which coding-agent vendor drove this invocation, if any.Both are
nullon a regular developer laptop. Detection is cheap (one cached read per process), runs once at module load, never reads env-var values, and respects the existingHYPERFRAMES_NO_TELEMETRY=1opt-out.Why
Right now we can tell that someone is on Linux non-Docker non-WSL non-TTY (the
cloud_vmbucket on the dashboard), but we can't tell which cloud sandbox vendor. A recent spike (~1k → ~19k DAU in 3 days) all on US AWS+GCP regions turned out to be gVisor sandboxes — almost certainly OpenAI Codex Cloud — but the only way to know was to forensically grep kernel strings. Withsandbox_runtimewe'd see a labelled bucket immediately; withagent_runtimewe'd see the actual vendor name when they set an env var.The two fields together also let us split DAU/retention by managed-sandbox vs. real laptop, which is the most useful product split (different acquisition stories, different LTV).
How
sandbox_runtimedetection is layered:'4.19.0-gvisor'(current) or'4.4.0'(legacy Sentry kernel), backed by/proc/versiongrep for"gVisor". Both are unambiguous: no production Linux box reports either today./.dockerenv+/proc/1/cgroupprobe./dev/vsockexists AND DMIsys_vendor='Amazon EC2'ANDproduct_namecontains'Firecracker'.sys_vendoris'QEMU'or contains'KVM'.platform.tsand shared betweensystem.tsandagent_runtime.ts.agent_runtimedetection is a table of{ name, check }rules inagent_runtime.ts. Each rule keys on the existence of a vendor-specific env var, never the value. The two new entries:hermeskeys onHERMES_QUIET=1—cli.py:50of NousResearch/hermes-agent executesos.environ["HERMES_QUIET"] = "1"unconditionally at module load, so the marker propagates to every subprocess Hermes spawns.openclawkeys onOPENCLAW_STATE_DIRorOPENCLAW_CONFIG_PATH—extensions/qa-matrix/src/runners/contract/scenario-runtime-cli.ts:344-351of openclaw/openclaw sets both explicitly in the spawned child env.The rule list is rule-order-priority (first match wins) so more specific rules can sit above broader ones (e.g.
copilot_agentmust checkGITHUB_ACTIONS=trueANDCOPILOT_AGENT_ID, so it sits above any future generic CI rule).Drive-by cleanups
system.tsandclient.tsfell into the fallow audit scope of this PR, so I cleaned up the findings rather than ignoring them:detectWSLmoved intoplatform.tsto break asystem.ts ↔ agent_runtime.tsimport cycle.detectCI/getCINamecollapsed into a singleCI_PROVIDERStable (was 10 cyclomatic from the OR chain).flushandflushSyncnow share adrainQueueToPayloadhelper instead of duplicating the batch-building code.Privacy posture (unchanged)
shouldTrack()—HYPERFRAMES_NO_TELEMETRY=1orDO_NOT_TRACK=1disables everything.typeof env[name] === "string"orenv[name] === "1"style existence checks.os.hostname()/ paths / project names still not collected.docs/packages/cli.mdxupdated to enumerate the new fields.Test plan
agent_runtime.test.tscovers all 11 vendor rules + the gVisor kernel-string path (17 tests).GITHUB_ACTIONS=truealone does NOT triggercopilot_agent.bun run --cwd packages/cli test— 409/409 pass.bunx oxlint+bunx oxfmt --check— clean.bun run --cwd packages/cli build— clean.fallowaudit — 0 findings.sandbox_runtime+agent_runtimeactually appear on events.DAU by sandbox_runtimeandDAU by agent_runtimetiles to the HyperFrames Dashboard once events start flowing.