Skip to content

feat(telemetry): schema v3 extension — env_kind, activation funnel, autostart (spec 044)#401

Merged
Dumbris merged 31 commits into
mainfrom
044-retention-telemetry-v3
Apr 24, 2026
Merged

feat(telemetry): schema v3 extension — env_kind, activation funnel, autostart (spec 044)#401
Dumbris merged 31 commits into
mainfrom
044-retention-telemetry-v3

Conversation

@Dumbris
Copy link
Copy Markdown
Member

@Dumbris Dumbris commented Apr 24, 2026

Summary

Client-side implementation of spec 044: extends the existing v3 heartbeat payload
(spec 042) with activation, anonymity, and launch-source fields so retention
analytics can distinguish real humans from CI and measure IDE wire-up.

Companion PRs:

What ships

  • env_kind client-side detector: interactive | ci | cloud_ide | container | headless | unknown (decision tree per design §4.2, cached via sync.Once)
  • launch_source: tray | login_item | cli | installer | unknown (PPID heuristic + installer env-var one-shot)
  • env_markers: booleans only — has_ci_env, has_cloud_ide_env, is_container, has_tty, has_display
  • autostart_enabled (bool | null): macOS reads SMAppService state via tray sidecar file (~/.mcpproxy/tray-autostart.json, 1h TTL); Windows/Linux → nil today
  • activation object:
    • Monotonic first_connected_server_ever, first_mcp_client_ever, first_retrieve_tools_call_ever (BBolt-backed, persist across restarts)
    • mcp_clients_seen_ever — deduped list capped at 16 from MCP initialize.clientInfo.name, path-like values sanitized → "unknown"
    • 24h sliding counters: retrieve_tools_calls_24h, estimated_tokens_saved_24h_bucket (6-bucket: 0 / 1_100 / 100_1k / 1k_10k / 10k_100k / 100k_plus)
    • configured_ide_count
  • macOS tray first-run dialog: default ON with explicit consent language, one-click opt-out
  • Installer post-install launches the tray with MCPPROXY_LAUNCHED_BY=installer one-shot env var
  • anonymity_violations atomic counter + client-side ScanForPII pass on every outgoing payload (refuses to send if any env-var value, hostname, username, or home-dir basename leaks)

Ground rules observed

  • Anonymous_id unchanged (locally-generated UUID).
  • Env markers are booleans only — never env-var values.
  • Schema version stays at 3 (spec 042 already bumped); new fields co-exist.
  • No new outbound telemetry fields added to existing opt-out CLI / env-var escape hatch (MCPPROXY_TELEMETRY=false still works).

Verification

  • go test -race ./internal/telemetry/... — PASS (4.4s)
  • go test -race ./internal/runtime/supervisor/... — PASS (2.1s)
  • go build ./... — PASS
  • go build -tags server ./cmd/mcpproxy — PASS
  • Swift tray compile — PASS (5.9 MB binary at /tmp/MCPProxy-retention-telemetry; only pre-existing Sendable warnings on UpdateService, no errors)
  • ScanForPII integration test: full v3 payload passes; synthetic leak-injected payload rejected
  • anonymous_id stability: byte-identical across v2 → v3 upgrades (FR-017)
  • /api/v1/status now exposes env_kind, env_markers, launch_source, autostart_enabled, activation (manually curl-verified)

Known non-blockers

  • go test -race ./internal/server/ times out at 60s — pre-existing, reproduced on base main; tracked separately.
  • Windows login-item PPID detection deferred — no Windows tray yet.

Spec artifacts

specs/044-retention-telemetry-v3/ — full speckit set (spec / plan / tasks / research / data-model / contracts / quickstart / checklists).

Design: docs/superpowers/specs/2026-04-24-retention-telemetry-hygiene-design.md

claude added 27 commits April 24, 2026 15:00
…igns

Two brainstormed designs produced from 2026-04-24 analysis of telemetry DB:

1. Retention telemetry hygiene + activation instrumentation + auto-start defaults
   - Payload schema v3 with env_kind, launch_source, autostart_enabled, activation funnel
   - Worker + dashboard + client changes across mcpproxy-go / mcpproxy-telemetry / mcpproxy-dash
   - D1 backup mandated before any migration, PII audit included

2. Diagnostics & error taxonomy deep-dive
   - Stable error-code catalog (MCPX_*) with per-code fix steps
   - Surfacing in tray, web UI, CLI; v3 telemetry extension for code counts + fix outcomes

Next steps: each design feeds a speckit.specify flow in its own worktree for
autonomous implementation + verification (unit, e2e, curl, chrome, ui-test MCP).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Generated speckit spec/plan/tasks for feature 044-retention-telemetry-v3:
payload v3 extensions for env_kind, launch_source, autostart_enabled,
activation, env_markers. Scoped to mcpproxy-go client only; worker
and dashboard changes are sibling specs.

## Changes
- specs/044-retention-telemetry-v3/spec.md: 4 user stories, 18 FRs, 8 SCs
- specs/044-retention-telemetry-v3/plan.md: technical plan + constitution check
- specs/044-retention-telemetry-v3/research.md: 12 resolved design decisions
- specs/044-retention-telemetry-v3/data-model.md: entities + BBolt schema
- specs/044-retention-telemetry-v3/contracts/heartbeat-v3.json: JSON schema
- specs/044-retention-telemetry-v3/quickstart.md: end-to-end verification
- specs/044-retention-telemetry-v3/tasks.md: 70 tasks, TDD-first
- specs/044-retention-telemetry-v3/checklists/requirements.md: validated
- CLAUDE.md: appended 044 active technologies entry

## Testing
- N/A (docs-only commit). Implementation tasks begin in subsequent
  commits per tasks.md.
…itizer tests (TDD red)

Adds activation_test.go covering:
- T027(a-f): load empty, save/load round-trip, monotonic stickiness, dedup+cap,
  path-like client name → unknown, 24h window decay.
- T028: 100 concurrent IncrementRetrieveToolsCall → count=100 (race-safe).
- T029: BucketTokens bucketing table.

Plus sanitizeClientName unit tests and installer-pending flag round-trip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…itizer (TDD green)

- T032: bboltActivationStore with Load/Save/MarkFirst* methods, BBolt-backed.
- T033: BucketTokens(n int) string — 6 fixed buckets per FR-009.
- T034: 24h sliding window decay via readCounterWithDecay helper; window
  resets when (now - window_start) >= 86400s.
- T035: sanitizeClientName(raw) — regex a-z0-9._- (max 64), rejects paths,
  .., '@', whitespace; falls back to "unknown".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nto buildHeartbeat

- T036: HeartbeatPayload.Activation *ActivationState (omitempty) per
  data-model.md.
- T037: Service.SetActivationStore(store, db) + SetConfiguredIDECountProvider;
  buildHeartbeat loads activation state (decay applied at read) and splices
  in ConfiguredIDECount from the provider. Silently omits on load error so
  heartbeat is never blocked by a bucket hiccup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Table-driven test covering research R3 precedence: installer env wins over
tray handshake wins over PPID-login-item wins over TTY-cli, with unknown
fallthrough. Also asserts DetectLaunchSourceOnce caches the first result.

Tests reference undefined symbols (DetectLaunchSource, HandshakeChecker,
PPIDChecker, resetLaunchSourceOnce) — implemented in T048.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ader

T045: autostart_test.go — sidecar-based reader tests (TDD).
T048: DetectLaunchSource + DetectLaunchSourceOnce with research R3
  precedence: installer env → tray handshake → login_item (PPID) →
  cli (TTY) → unknown. Introduces HandshakeChecker + PPIDChecker
  interfaces; default impls delegate to isLoginItemParent() via
  launch_source_ppid.go (ps -o comm= on macOS/Linux; Windows stub for
  future work).
T049: AutostartReader reads ~/.mcpproxy/tray-autostart.json with 1h
  TTL cache. Pragmatic substitute for tray-side socket listener
  (design §7.3) — identical semantics: enabled true/false/nil.
  Linux short-circuits to nil (no tray today).

All tests pass: go test -race -count=1 -run
"TestDetectLaunchSource|TestReadAutostart|TestDefaultAutostartReader"
./internal/telemetry/... → ok

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
T046: installer-heartbeat-pending lifecycle test. Verifies SetInstallerPending(true)
  at startup is consumed on the first resolveLaunchSource() call (emitting
  launch_source=installer) and cleared so subsequent calls return the runtime
  detector's value — crash-safe one-shot contract from design §4.3.
T050: extend HeartbeatPayload with LaunchSource string + AutostartEnabled *bool.
  Pointer on AutostartEnabled preserves the JSON-null tri-state per data-model.md.
T051: wire buildHeartbeat → (a) resolveLaunchSource() with installer override
  + DetectLaunchSourceOnce fallback; (b) lazy-init DefaultAutostartReader on
  first heartbeat, read into AutostartEnabled (nil on Linux / tray absent /
  malformed sidecar). New Service.autostartReader field + SetAutostartReader
  test seam.
T052: in Runtime.SetTelemetry, when MCPPROXY_LAUNCHED_BY=installer is observed
  at startup, set installer_heartbeat_pending=true on the activation store so
  a crash before the first heartbeat still lets us emit launch_source=installer
  once recovery completes.

go test -race -count=1 ./internal/telemetry/... → ok

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…h tag

T053: AutoStartService.swift already exposed register/unregister/isEnabled via
  SMAppService.mainApp — reused as-is; no changes needed.
T054: FirstRunDialog.swift — SwiftUI modal shown once (UserDefaults flag
  MCPProxy.firstRunCompleted). "Launch at login" checkbox defaults to ON
  with opt-out copy "You can turn this off anytime from the tray menu".
  On Continue, register/unregister SMAppService and refresh the sidecar.
T055: AutostartSidecarService.swift — writes ~/.mcpproxy/tray-autostart.json
  {enabled, updated_at}. Called at launch, after first-run, and after the
  tray menu "Run at Startup" toggle. Pragmatic substitute for a tray-hosted
  HTTP /autostart endpoint: identical semantics, read by the core's
  telemetry.AutostartReader with 1h TTL.
T056: MCPProxyApp.applicationDidFinishLaunching — refresh sidecar first so
  the core always has a non-null reading, then present the first-run dialog
  if needed.
T057: packaging/macos/postinstall.sh — standalone post-install launcher:
  `open -a MCPProxy --env MCPPROXY_LAUNCHED_BY=installer`. Executable bit set.
T058: scripts/postinstall.sh — existing PKG post-install extended with the
  same tray-launch step so the installer-pkg flow picks up the env var.
  packaging/macos/postinstall.sh stays as the reference artifact (research R10).

Swift tray builds cleanly via:
  swiftc -target arm64-apple-macosx13.0 -sdk $(xcrun --sdk macosx --show-sdk-path)
         -module-name MCPProxy -emit-executable -O -o /tmp/MCPProxy-new \
         $(find MCPProxy -name "*.swift" -not -path "*/Tests/*")

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…variants)

Three unit tests that don't require a SwiftUI harness:
- testFirstRunFlagRoundTrip: UserDefaults marker isolation check.
- testFirstRunChoiceDefaultsToEnabled: spec §4 invariant — the first-run
  dialog MUST default to "Launch at login: ON". Regressing this test
  catches a dark-pattern reversal.
- testSidecarSchemaShape: ensures the tray's sidecar JSON schema matches
  what the Go autostart.go reader expects (enabled:bool, updated_at:string).

End-to-end visual verification of the modal is deferred to the
mcpproxy-ui-test MCP server per CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tatus

Adds two read-only fields alongside the existing env_kind/env_markers:
- launch_source: cached DetectLaunchSourceOnce() result (string)
- autostart_enabled: tri-state pointer (true/false/null) from the
  tray-owned sidecar with 1h TTL

Read-only: the status handler never clears installer_heartbeat_pending,
only the heartbeat builder does. This lets the tray/CLI introspect the
process's classifier verdict without waiting for the next heartbeat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion funnel

Tests (mcp_activation_test.go):
- T031: handleRetrieveToolsWithMode increments retrieve_tools_calls_24h and
  marks first_retrieve_tools_call_ever=true (end-to-end via runtime).
- T030: AfterInitialize hook path records sanitized clientInfo.name and
  marks first_mcp_client_ever=true; dedup + path→unknown verified.
- T040 coverage: MarkFirstConnectedServerForActivation flips the flag.

Implementation:
- T038: MCP AfterInitialize hook calls runtime.RecordMCPClientForActivation
  with clientInfo.name (sanitization + cap happens inside the store).
- T039: handleRetrieveToolsWithMode calls
  RecordRetrieveToolsCallForActivation on entry and estimates tokens-saved
  (150 tokens × hidden tools) after filtering results.
- T040: supervisor.OnServerConnected callback calls
  MarkFirstConnectedServerForActivation before dedup-guard runs, so every
  successful connect reaches the monotonic flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…us and CLI

- T041: /api/v1/status reads the activation snapshot from the telemetry
  service's store and embeds it in the response alongside env_kind /
  env_markers. Read-only — mutation happens only on MCP/connect events.
- T042: `mcpproxy telemetry status` opens the BBolt DB read-only (with a
  200ms flock timeout so a running daemon doesn't block the CLI) and
  renders the activation funnel in both JSON/YAML (via struct field) and
  table (dedicated "Activation Funnel" section) formats. Silently omits
  when the daemon holds the lock — the same data is available via the
  REST endpoint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 24, 2026

Deploying mcpproxy-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 148b163
Status: ✅  Deploy successful!
Preview URL: https://96ff5076.mcpproxy-docs.pages.dev
Branch Preview URL: https://044-retention-telemetry-v3.mcpproxy-docs.pages.dev

View logs

Second-opinion critique from Gemini CLI (v0.38.1, model gemini-3.1-pro-preview)
run in --yolo mode with read access to all 4 related repos
(mcpproxy-go-diagnostics-taxonomy, mcpproxy-go-retention-telemetry,
mcpproxy-telemetry, mcpproxy-dash).

Findings: 4 P1, 4 P2, 2 P3. Notable P1s to triage:

1. ErrorPanel.vue hides Execute button on non-destructive steps — users can
   only click "Preview (dry-run)", making non-destructive fixes unreachable.
2. env_kind decision tree evaluates HasCIEnv before HasCloudIDEEnv, so
   GitHub Codespaces / Gitpod (which set CI=true) are misclassified as CI bots.
3. Activation funnel query reads server_configured from LATEST heartbeat —
   users who configured then deleted a server fall out of the lifetime funnel.
4. autostart.go caches a missing sidecar file as nil for a full hour,
   poisoning the critical first heartbeat on slow-tray startup races.

Content is verbatim Gemini output; no human edits except a header block
documenting invocation + model.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
claude added 3 commits April 24, 2026 19:58
Gemini cross-review flagged that Codespaces + Gitpod routinely set
CI=true alongside their own markers (CODESPACES, GITPOD_WORKSPACE_ID).
With the original CI-wins ordering from design §4.2 / research.md R1,
real humans working in ephemeral cloud IDEs were being classified as
`ci`, artificially deflating Cloud IDE retention numbers and skewing
the activation funnel.

Reorder the decision tree so cloud_ide is checked before ci. Ordinary
CI runners (without CODESPACES/GITPOD/etc.) still classify as `ci`.

This intentionally deviates from the locked design doc; comment block
updated to explain why and cite the cross-review finding.

Tests:
- Add cloud-ide-codespaces-beats-ci (CI=true + CODESPACES=true -> cloud_ide)
- Add cloud-ide-gitpod-prebuild-beats-ci (CI=true + GITPOD_WORKSPACE_ID -> cloud_ide)
- Existing ci-beats-container case still passes (no cloud-IDE markers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Gemini cross-review found a tray-core boot race: if the core process
starts milliseconds before the tray writes tray-autostart.json, the
first Read() sees fs.ErrNotExist and (before this fix) marked
cachedOnce=true, locking autostart_enabled to null for the whole 1h
TTL window — even after the tray subsequently wrote `true`.

Fix: on ErrNotExist and other transient I/O errors, do NOT mark
cachedOnce. Return nil this call but allow the next Read() to re-probe.
Successful reads (including malformed-JSON "known unknowable") still
cache for the full TTL.

Adds TestReadAutostart_BootRaceDoesNotPoisonCache covering the race
scenario: first read on absent file -> nil; tray writes sidecar;
second read within TTL -> picks up the new value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

📦 Build Artifacts

Workflow Run: View Run
Branch: 044-retention-telemetry-v3

Available Artifacts

  • archive-darwin-amd64 (26 MB)
  • archive-darwin-arm64 (23 MB)
  • archive-linux-amd64 (15 MB)
  • archive-linux-arm64 (13 MB)
  • archive-windows-amd64 (26 MB)
  • archive-windows-arm64 (23 MB)
  • frontend-dist-pr (0 MB)
  • installer-dmg-darwin-amd64 (19 MB)
  • installer-dmg-darwin-arm64 (17 MB)

How to Download

Option 1: GitHub Web UI (easiest)

  1. Go to the workflow run page linked above
  2. Scroll to the bottom "Artifacts" section
  3. Click on the artifact you want to download

Option 2: GitHub CLI

gh run download 24903342891 --repo smart-mcp-proxy/mcpproxy-go

Note: Artifacts expire in 14 days.

@Dumbris Dumbris merged commit ebcbfcc into main Apr 24, 2026
40 checks passed
@Dumbris Dumbris deleted the 044-retention-telemetry-v3 branch April 24, 2026 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants