Skip to content

feat(telemetry): fingerprint sandbox runtime and agent vendor#1035

Merged
jrusso1020 merged 4 commits into
mainfrom
feat/telemetry-agent-runtime-fingerprint
May 22, 2026
Merged

feat(telemetry): fingerprint sandbox runtime and agent vendor#1035
jrusso1020 merged 4 commits into
mainfrom
feat/telemetry-agent-runtime-fingerprint

Conversation

@jrusso1020
Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 commented May 22, 2026

What

Two new properties on every CLI telemetry event:

  • sandbox_runtime: 'gvisor' | 'firecracker' | 'docker' | 'kvm' | 'wsl' | null — what kind of managed sandbox is hosting this CLI invocation.
  • agent_runtime: 'claude_code' | 'codex' | 'cursor' | 'copilot_agent' | 'jules' | 'replit' | 'devin' | 'aider' | 'gemini_cli' | 'hermes' | 'openclaw' | null — which coding-agent vendor drove this invocation, if any.

Both are null on a regular developer laptop. Detection is cheap (one cached read per process), runs once at module load, never reads env-var values, and respects the existing HYPERFRAMES_NO_TELEMETRY=1 opt-out.

Why

Right now we can tell that someone is on Linux non-Docker non-WSL non-TTY (the cloud_vm bucket on the dashboard), but we can't tell which cloud sandbox vendor. A recent spike (~1k → ~19k DAU in 3 days) all on US AWS+GCP regions turned out to be gVisor sandboxes — almost certainly OpenAI Codex Cloud — but the only way to know was to forensically grep kernel strings. With sandbox_runtime we'd see a labelled bucket immediately; with agent_runtime we'd see the actual vendor name when they set an env var.

The two fields together also let us split DAU/retention by managed-sandbox vs. real laptop, which is the most useful product split (different acquisition stories, different LTV).

How

sandbox_runtime detection is layered:

  • gVisor — kernel string '4.19.0-gvisor' (current) or '4.4.0' (legacy Sentry kernel), backed by /proc/version grep for "gVisor". Both are unambiguous: no production Linux box reports either today.
  • Docker — reuses the existing /.dockerenv + /proc/1/cgroup probe.
  • Firecracker/dev/vsock exists AND DMI sys_vendor='Amazon EC2' AND product_name contains 'Firecracker'.
  • KVM — DMI sys_vendor is 'QEMU' or contains 'KVM'.
  • WSL — moved into the new platform.ts and shared between system.ts and agent_runtime.ts.

agent_runtime detection is a table of { name, check } rules in agent_runtime.ts. Each rule keys on the existence of a vendor-specific env var, never the value. The two new entries:

  • hermes keys on HERMES_QUIET=1cli.py:50 of NousResearch/hermes-agent executes os.environ["HERMES_QUIET"] = "1" unconditionally at module load, so the marker propagates to every subprocess Hermes spawns.
  • openclaw keys on OPENCLAW_STATE_DIR or OPENCLAW_CONFIG_PATHextensions/qa-matrix/src/runners/contract/scenario-runtime-cli.ts:344-351 of openclaw/openclaw sets both explicitly in the spawned child env.

The rule list is rule-order-priority (first match wins) so more specific rules can sit above broader ones (e.g. copilot_agent must check GITHUB_ACTIONS=true AND COPILOT_AGENT_ID, so it sits above any future generic CI rule).

Drive-by cleanups

system.ts and client.ts fell into the fallow audit scope of this PR, so I cleaned up the findings rather than ignoring them:

  • detectWSL moved into platform.ts to break a system.ts ↔ agent_runtime.ts import cycle.
  • detectCI / getCIName collapsed into a single CI_PROVIDERS table (was 10 cyclomatic from the OR chain).
  • flush and flushSync now share a drainQueueToPayload helper instead of duplicating the batch-building code.

Privacy posture (unchanged)

  • All collection still gated by shouldTrack()HYPERFRAMES_NO_TELEMETRY=1 or DO_NOT_TRACK=1 disables everything.
  • No env-var values are ever read; only typeof env[name] === "string" or env[name] === "1" style existence checks.
  • os.hostname() / paths / project names still not collected.
  • Disclosure in docs/packages/cli.mdx updated to enumerate the new fields.

Test plan

  • Unit tests added — agent_runtime.test.ts covers all 11 vendor rules + the gVisor kernel-string path (17 tests).
  • Test that vendor env-var values are never surfaced (uses a secret-shaped sibling var).
  • Test that generic GITHUB_ACTIONS=true alone does NOT trigger copilot_agent.
  • bun run --cwd packages/cli test — 409/409 pass.
  • bunx oxlint + bunx oxfmt --check — clean.
  • bun run --cwd packages/cli build — clean.
  • fallow audit — 0 findings.
  • Verify in dev sandbox post-merge that sandbox_runtime + agent_runtime actually appear on events.
  • Add DAU by sandbox_runtime and DAU by agent_runtime tiles to the HyperFrames Dashboard once events start flowing.

Add two new properties to every CLI telemetry event so we can tell
managed-sandbox traffic (Codex Cloud, Claude Code Web, etc.) apart from
real developer laptops without geolocation guesswork:

- sandbox_runtime: 'gvisor' | 'firecracker' | 'docker' | 'kvm' | 'wsl' | null
  gVisor detected via kernel string ('4.19.0-gvisor' or legacy Sentry
  '4.4.0') + /proc/version. Firecracker via /dev/vsock + DMI sys_vendor.
  Docker reuses the existing /.dockerenv + cgroup probe.

- agent_runtime: claude_code | codex | cursor | copilot_agent | jules
  | replit | devin | aider | gemini_cli | hermes | openclaw | null
  Detected by the EXISTENCE of well-known vendor env vars only — values
  are never read. Hermes rule keys on HERMES_QUIET=1 (set unconditionally
  at hermes-agent/cli.py:50). openclaw rule keys on OPENCLAW_STATE_DIR
  or OPENCLAW_CONFIG_PATH (set explicitly in the spawned child env at
  openclaw/extensions/qa-matrix/src/runners/contract/scenario-runtime-cli.ts).

Drive-by cleanups required by fallow because system.ts and client.ts
fall into the audit scope of this PR:
- Extract detectWSL into platform.ts to break the system.ts ↔ agent_runtime.ts cycle.
- Refactor detectCI / getCIName into a single CI_PROVIDERS table.
- Dedupe flush / flushSync via a shared drainQueueToPayload helper.

Privacy posture unchanged: HYPERFRAMES_NO_TELEMETRY=1 still opts out;
disclosure in docs/packages/cli.mdx updated to enumerate the new fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mintlify
Copy link
Copy Markdown

mintlify Bot commented May 22, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
hyperframes 🟢 Ready View Preview May 22, 2026, 10:26 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Copy link
Copy Markdown
Collaborator Author

jrusso1020 commented May 22, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat(telemetry): fingerprint sandbox runtime and agent vendor

Nice work — the PR description is excellent and the detection approach is well-researched. A few things:

kernel === "4.4.0" false positive risk

isGVisor() returns true for any Linux box reporting kernel 4.4.0 without checking /proc/version:

function isGVisor(): boolean {
  const kernel = release();
  if (kernel === "4.4.0" || kernel.includes("gvisor")) return true;  // ← 4.4.0 skips /proc/version
  ...

While the comment says "no production Linux box reports either today," 4.4.0 is a real upstream LTS kernel (2016). Embedded systems, Raspberry Pi images, and older distros could report it. The /proc/version backup check is more reliable — should the 4.4.0 branch also require the /proc/version confirmation?

// Suggestion: only trust the exact gvisor string without backup
if (kernel.includes("gvisor")) return true;
if (kernel === "4.4.0" && platform() === "linux") {
  try { return readFileSync("/proc/version", "utf-8").includes("gVisor"); }
  catch { return false; }
}

This way the legacy Sentry kernel still gets detected, but only after confirming via /proc/version.

Missing sandbox detection tests

Only gVisor has test coverage. Docker, Firecracker, KVM, and WSL are untested. These are straightforward file-existence checks, but since the gVisor tests set a good precedent with vi.doMock("node:os"), would be nice to see at least one Docker test (mock existsSync("/.dockerenv")) and one negative case (plain Linux → null).

HERMES_QUIET value check

Most rules check existence (typeof env[key] === "string"), but Hermes checks an exact value (env["HERMES_QUIET"] === "1"). If Hermes ever changes that value (or sets it to "true"), this breaks silently. Consider typeof env["HERMES_QUIET"] === "string" for consistency, since the var name itself is specific enough.

CI_PROVIDERS type nit

The truthy and presence properties are both optional booleans — a provider entry could have neither set (matching nothing). A discriminated union would be stricter:

type CIProvider =
  | { name: string | null; envVar: string; mode: "truthy" }
  | { name: string | null; envVar: string; mode: "presence" };

Not blocking, just cleaner.

Everything else looks good

  • drainQueueToPayload dedup is clean
  • detectWSL extraction to platform.ts is the right call for the import cycle
  • Privacy posture (existence-only, $ip: null, opt-out respected) is solid
  • Test that secret-shaped values never leak is a great pattern
  • CI_PROVIDERS table is much cleaner than the OR-chain
  • The copilot_agent requiring both GITHUB_ACTIONS + COPILOT_AGENT_ID is correct — tested and documented

jrusso1020 and others added 2 commits May 22, 2026 22:30
Addresses PR feedback from @magi: kernel string `4.4.0` is also the
Ubuntu 16.04 LTS / older-real-kernel version, so accepting it alone
false-positives. Now `4.4.0` only counts as gVisor when /proc/version
also contains "gVisor". `*-gvisor` kernel strings remain standalone-
sufficient since no real production kernel reports them.

Adds a regression test that an Ubuntu 16.04 box reporting
`Linux version 4.4.0-1128-aws (buildd@lcy01)` is NOT classified as
a gVisor sandbox.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-ups from @miguel-heygen's review:

1. HERMES_QUIET — switch to existence check.
   `env["HERMES_QUIET"] === "1"` was brittle vs. future Hermes changes
   (e.g. if cli.py ever sets it to "true"). The var name itself is
   specific enough that existence is the right signal.

2. CI_PROVIDERS — convert to a discriminated union.
   `mode: "truthy" | "presence"` is stricter than the previous pair of
   optional boolean flags (which allowed entries with neither set).

3. Sandbox detection tests — add coverage.
   - Docker positive: /.dockerenv present → docker.
   - Negative case: plain Linux laptop with no markers → null.

Together with the gVisor 4.4.0 fix in the previous commit, that addresses
all three actionable callouts (the discriminated-union nit was non-blocking
but worth doing while in the file).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jrusso1020
Copy link
Copy Markdown
Collaborator Author

Thanks for the thorough review @miguel-heygen — all four points addressed across two follow-up commits.

kernel === "4.4.0" false-positive — fixed in 0dc70db4

isGVisor() now only accepts 4.4.0 when /proc/version ALSO contains "gVisor". The unambiguous *-gvisor kernel string remains standalone-sufficient.

Added a regression test using a real Ubuntu 16.04 LTS-style /proc/version string (Linux version 4.4.0-1128-aws (buildd@lcy01)) to confirm legacy boxes no longer get bucketed as a gVisor sandbox.

Missing sandbox detection tests — added in 9af96bb0

Two new tests in agent_runtime.test.ts:

  • Docker positive: vi.doMock("node:fs") makes existsSync("/.dockerenv") return true → expects "docker".
  • Plain-Linux negative: all existsSync returns false, /proc/version is a vanilla 6.8.0-100-generic string → expects null.

Firecracker and KVM are layered DMI reads (sys_vendor + product_name + /dev/vsock), so mocking them gets verbose; I left those uncovered for now since the plain-Linux negative case indirectly exercises the fall-through to those branches. Happy to add focused tests if you want.

HERMES_QUIET value vs existence — switched in 9af96bb0

Good catch. Now typeof env["HERMES_QUIET"] === "string" — keying on existence rather than the literal "1". If Nous ever changes the value in cli.py:50 to "true" or anything else, we still detect. The var name itself is specific enough.

CI_PROVIDERS discriminated union — refactored in 9af96bb0

type CIProvider =
  | { name: string | null; envVar: string; mode: "truthy" }
  | { name: string | null; envVar: string; mode: "presence" };

matchesProvider now switches on p.mode. Stricter than the previous pair of optional booleans (which allowed entries with neither set).


Re Trae / Windsurf (from the Slack thread): will skim their docs/repos for env-var markers and PR a follow-up if I find one. If they're internal-adjacent and we can ask the team to set a vendor env var directly, that's the cleanest contract.

Audit of every detection rule in the registry against actual vendor
source code. Rules that lacked a public-source citation were guesses
and have been removed; surviving rules now all cite the file + line
that emits the marker.

Codex — replace per @magi's investigation:
- Drop CODEX_HOME (config override read at startup, NOT propagated to
  child processes — would miss most Codex invocations).
- Drop CODEX_SANDBOX (macOS Seatbelt only; covered by the others).
- Add CODEX_THREAD_ID (set unconditionally on every spawned shell
  command — codex-rs/protocol/src/shell_environment.rs:6 +
  codex-rs/core/src/unified_exec/process_manager.rs:1010).
- Add CODEX_CI (hardcoded in UNIFIED_EXEC_ENV — process_manager.rs:70).
- Keep CODEX_SANDBOX_NETWORK_DISABLED (default-on sandbox marker —
  codex-rs/core/src/sandboxing/mod.rs:135-138).

Cursor — drop unverified CURSOR_TRACE_ID and CURSOR_AGENT guesses.
Keep TERM_PROGRAM=cursor (set by Cursor's integrated terminal).

Pi — new rule. https://github.com/earendil-works/pi
packages/coding-agent/src/cli.ts:13 unconditionally executes
  process.env.PI_CODING_AGENT = "true";
at module entry, so every subprocess Pi spawns sees this marker.
Same propagation pattern as Hermes.

Removed (no source-cited marker found in this audit):
- aider — verified Aider sets no AIDER_* env vars; only OR_SITE_URL and
  OR_APP_NAME (OpenRouter integration). No reliable marker.
- gemini_cli — GEMINI_SANDBOX/GEMINI_CLI_TRUST_WORKSPACE are conditional
  on CLI flags; no unconditional marker found.
- jules, devin — closed source, no public marker documentation.

These vendors can be re-added later with a source citation; absence
in the registry will silently false-negative (events land in the null
bucket), but won't false-positive on other vendors.

Per @james-russo's review: do source-level research before shipping
detection rules. Memory updated to enforce this for future work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All four review points addressed:

  1. gVisor 4.4.0 false positive — Fixed. isGVisor() now only trusts kernel.includes("gvisor") directly; the 4.4.0 path falls through to the /proc/version check. Test added for Ubuntu 16.04 negative case. ✅
  2. Sandbox detection tests — Docker (.dockerenv mock) and plain-Linux-null tests added. ✅
  3. Hermes value check — Changed from === "1" to typeof === "string". ✅
  4. CI_PROVIDERS type — Not addressed, but not blocking.

Codex detection rebuilt with source-verified signals (CODEX_THREAD_ID, CODEX_CI, CODEX_SANDBOX_NETWORK_DISABLED). Unverified vendors (aider, gemini_cli, jules, devin) dropped — right call. Pi added with source citation.

The copilot_agent caveat in the code comment is honest ("not yet verified from a public-source citation") — fine to ship as-is since the var names are specific enough to not false-positive.

Ship it.

@jrusso1020 jrusso1020 merged commit e2ad165 into main May 22, 2026
37 checks passed
jrusso1020 added a commit that referenced this pull request May 22, 2026
Three follow-ups from @miguel-heygen's review:

1. HERMES_QUIET — switch to existence check.
   `env["HERMES_QUIET"] === "1"` was brittle vs. future Hermes changes
   (e.g. if cli.py ever sets it to "true"). The var name itself is
   specific enough that existence is the right signal.

2. CI_PROVIDERS — convert to a discriminated union.
   `mode: "truthy" | "presence"` is stricter than the previous pair of
   optional boolean flags (which allowed entries with neither set).

3. Sandbox detection tests — add coverage.
   - Docker positive: /.dockerenv present → docker.
   - Negative case: plain Linux laptop with no markers → null.

Together with the gVisor 4.4.0 fix in the previous commit, that addresses
all three actionable callouts (the discriminated-union nit was non-blocking
but worth doing while in the file).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jrusso1020 jrusso1020 deleted the feat/telemetry-agent-runtime-fingerprint branch May 22, 2026 23:10
Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-merge advisory review (this PR is already merged into main). Findings flagged for follow-up, not gating.

@miguel-heygen's review covered the substantive pre-merge issues (gVisor 4.4.0 false positive, missing sandbox tests, Hermes existence check, CI_PROVIDERS type) — all fixed before merge. Below are the gaps not yet on the PR.


Strengths

  • Detection is well-sourced throughout — every rule in VENDOR_RULES has a file:line citation from the upstream repo, and the decision to drop unverified vendors (aider, gemini_cli, jules, devin) in favour of a clean false-negative is explicitly documented (agent_runtime.ts ~line 65). That discipline is correct.
  • drainQueueToPayload dedup in client.ts eliminates real divergence risk between flush and flushSync — good refactor with zero behaviour change.
  • The "existence only, never read values" invariant is tested with a secret-shaped sibling var. Good pattern.

Findings

should-be-follow-up-ticket — cursor and replit rules conflate IDE-usage with agent-driven invocation (agent_runtime.ts lines ~93, ~105)

TERM_PROGRAM=cursor is exported by Cursor's integrated terminal for any human typing in it, not just Cursor Background Agent. REPL_ID / REPLIT_USER are present in any Replit workspace regardless of whether a human or an agent is at the keyboard.

The PR description frames agent_runtime as distinguishing "ephemeral cloud sandbox driving the CLI" from "real developer laptop." These two rules don't achieve that — they identify the host environment, not whether an agent is in the loop. A human using Cursor's terminal or coding in Replit will be counted in the same bucket as a Cursor/Replit background agent.

Downstream effect: the "DAU by agent_runtime" tile proposed in the PR will overcount agentic invocations for these two vendors. Worth documenting the semantic as "associated with agent vendor's ecosystem" rather than "driven by an agent" in the field's JSDoc comment, and noting the limitation in the dashboard tile description.

should-be-follow-up-ticket (nit) — duplicate Docker detection (system.ts:detectDocker vs agent_runtime.ts:isDocker)

system.ts has detectDocker() (~line 95, used for the existing is_docker field) and agent_runtime.ts has isDocker() (~line 156, used for sandbox_runtime). Both check /.dockerenv + /proc/1/cgroup for the same strings. Two implementations of the same predicate that can drift independently. detectDocker() in system.ts could import from agent_runtime.ts or both could delegate to the shared platform.ts helper, the same way detectWSL was extracted.

nit — "once at module load" claim in PR description doesn't match the code

PR description says: "Detection is cheap (one cached read per process), runs once at module load." Actual execution is lazy — detectSandboxRuntime() and detectAgentRuntime() are called inside getSystemMeta() (system.ts:71-72), which runs on first access (after cached == null). Not wrong in practice, just slightly misleading documentation. Module-load caching would require a top-level const assignment; this is first-call caching. Low stakes.

nit — copilot_agent self-citation gap (agent_runtime.ts ~line 83)

The code comment says: "Not yet verified from a public-source citation in this audit; the var names below match GitHub Copilot Coding Agent documentation but should be confirmed." James acknowledged in the PR response that this is fine to ship given the specificity of COPILOT_AGENT_ID. Agreed — but the comment is a paper trail that the contract isn't confirmed. A follow-up issue to verify or cite (or add "unverified" to the JSDoc the way other unverified vendors were simply dropped) would close the loop cleanly.


Verdict: COMMENT (post-merge: With fixes)
Reasoning: Core correctness issues were fixed pre-merge by Miguel + James. The remaining gaps are semantic (Cursor/Replit over-counting) and hygiene (duplicate Docker predicate, doc mismatch) — none block product value, but the semantic one is worth a note in the dashboard tile before drawing agentic-DAU conclusions.

— Vai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants