Skip to content

feat: route browser telemetry directly to the VM by default#109

Merged
rgarcia merged 3 commits into
nextfrom
raf/telemetry-default-routing
Jun 3, 2026
Merged

feat: route browser telemetry directly to the VM by default#109
rgarcia merged 3 commits into
nextfrom
raf/telemetry-default-routing

Conversation

@rgarcia
Copy link
Copy Markdown
Contributor

@rgarcia rgarcia commented Jun 3, 2026

What

  • Telemetry method path: change the telemetry stream method's path from /browsers/{id}/telemetry to /browsers/{id}/telemetry/stream (browsertelemetry.go). The direct-VM path rewrite is a pure passthrough, so this now yields {base_url}/telemetry/stream — the VM's SSE stream endpoint. ({base_url}/telemetry on the VM is a different, non-stream endpoint that returns JSON config.)
  • Default routing list: add telemetry to the default KERNEL_BROWSER_ROUTING_SUBRESOURCES allowlist (browser_routing.go), so the telemetry SSE stream is routed directly to the browser VM with no env var configuration.
  • Unit tests: updated the default-allowlist assertion to expect [curl, telemetry], and added TestBrowserRoutingRewritesTelemetryStreamToVM proving a telemetry call is rewritten to {base_url}/telemetry/stream with the Authorization header stripped and a jwt query param appended. Updated api.md title to get /browsers/{id}/telemetry/stream.
  • Added examples/browser-telemetry-routing-smoke — a live smoke program.

Dependency

DEPENDS ON the control-plane PR renaming the public endpoint /browsers/{id}/telemetry -> /browsers/{id}/telemetry/stream. That rename is not yet deployed to prod, so a non-routed call to the new SDK path 404s today; the direct-routed call (rewritten on the VM) already works. This is exactly why the live smoke exercises the direct-routed path — it also independently proves routing is doing the work.

Live smoke evidence (prod)

Created a telemetry-enabled headless browser (timeout_seconds=120, telemetry.enabled=true), opened client.Browsers.Telemetry.StreamStreaming, generated activity via browsers.curl, and asserted both event delivery and routing. Ran 3/3 green. Example:

created browser ej49cnww3docbzlqyz6mbv2s base_url https://proxy.sfo-intelligent-robinson.onkernel.com:8443/browser/kernel
received telemetry event: seq=5 type=api_call
recorded telemetry stream URL: https://proxy.sfo-intelligent-robinson.onkernel.com:8443/browser/kernel/telemetry/stream?jwt=<redacted>
routing verified: telemetry stream went directly to the VM
SMOKE PASS

Asserted: the recorded stream URL host matches the session base_url host (proxy.*:8443), path ends with /telemetry/stream, it is NOT api.onkernel.com, Authorization is stripped, and a jwt query param is present.

Note observed during smoke

The generated ssestream.Stream[T] wrapper surfaces SSE keepalive comment frames (: ...) as an "unexpected end of JSON input" error and terminates (it tries to json.Unmarshal an empty data buffer). The smoke example reopens the stream on that error and keeps reading until a data frame arrives, so it is reliable. This is a pre-existing, codegen-owned robustness issue independent of routing and is out of scope for this PR.

🤖 Generated with Claude Code


Note

Medium Risk
Changes default request routing and JWT/header handling for telemetry streams; mis-routing could break streaming until the control-plane path rename is deployed everywhere.

Overview
Browser telemetry is now routed directly to the VM by default, alongside curl, via the KERNEL_BROWSER_ROUTING_SUBRESOURCES default allowlist—no extra env configuration. Integration coverage confirms telemetry SSE calls are rewritten to {base_url}/telemetry/stream with JWT in the query and Authorization stripped.

The SSE client (packages/ssestream) ignores empty comment-only blocks (:\n\n keepalives) so idle telemetry streams no longer die with JSON decode errors; regression tests and a examples/browser-telemetry sample were added.

Note: Depends on the control-plane rename to /browsers/{id}/telemetry/stream for non-routed API calls; direct VM routing already targets the stream endpoint.

Reviewed by Cursor Bugbot for commit 59bd952. Bugbot is set up for automated code reviews on this repo. Configure here.


Update — idle-stream SSE keepalive fix now included in this PR

The earlier note that the SSE keepalive issue was out of scope is superseded: this PR now fixes it. packages/ssestream/ssestream.go no longer dispatches empty comment-only blocks (the server's :\n\n keepalive, sent every ~15s on an idle stream and on the initial idle window), which previously surfaced as unexpected end of JSON input and killed idle telemetry streams. Added regression tests (packages/ssestream/ssestream_test.go) and simplified examples/browser-telemetry to drop the background-activity workaround so it mirrors the node/python examples.

Note: packages/ssestream/ssestream.go is Stainless codegen-owned; this is a local patch pending an upstream fix in the generator config.

@firetiger-agent
Copy link
Copy Markdown

Created a monitoring plan for this PR.

What this PR does: Fixes the browser telemetry stream so it calls the correct API path (/telemetry/stream) and routes directly to the browser VM by default, matching how browsers.curl already works.

Intended effect:

  • /browsers/{id}/telemetry spans (old path): baseline ~3–8/hr, 0 errors; confirmed if traffic transitions to the new /telemetry/stream path or goes directly to VM (bypassing the API entirely).
  • /browsers/{id}/telemetry/stream spans (new path): baseline 0 (no prior traffic); confirmed if new calls on this span are error-free.
  • POST /browsers error rate: baseline 0.1–0.5%/hr; confirmed if it stays within that range.

Risks:

  • 404 on new path/browsers/{id}/telemetry/stream span errors; alert if error rate > 5% on that span (baseline 0 errors on old path).
  • Direct-VM routing failure — JWT mismatch or VM unreachable causes SSE connect failures; alert if any telemetry-related ERROR logs appear in API logs post-deploy.
  • Authorization header stripping regression — if the route cache isn't warmed before the first telemetry call, the fallback to the API path could fail with 401; alert if any HTTP 401 errors appear on /browsers/{id}/telemetry/stream.

Status updates will be posted automatically on this PR as monitoring progresses.

View monitor

rgarcia and others added 3 commits June 3, 2026 11:41
Add "telemetry" to the default KERNEL_BROWSER_ROUTING_SUBRESOURCES
allowlist so the telemetry SSE stream is routed directly to the browser
VM out of the box, and change the telemetry stream method path from
/browsers/{id}/telemetry to /browsers/{id}/telemetry/stream so that the
direct-VM path rewrite yields {base_url}/telemetry/stream (the VM's SSE
endpoint; {base_url}/telemetry is a different, non-stream endpoint).

This DEPENDS ON the control-plane PR renaming the public endpoint
/browsers/{id}/telemetry -> /browsers/{id}/telemetry/stream. Until that
deploys, the non-routed path 404s in prod, so telemetry.stream works only
via direct routing today (which also independently proves routing works).

Verified with a live smoke test against prod: created a telemetry-enabled
headless browser, opened the telemetry stream, generated VM API activity
via browsers.curl, observed telemetry events, and confirmed the stream
request was rewritten to the VM (host proxy.*:8443, path
/telemetry/stream, Authorization stripped, jwt query param appended) and
never hit api.onkernel.com.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
eventStreamDecoder.Next dispatched an event on every blank line, including
after a comment-only block such as the server's ":\n\n" keepalive (sent every
~15s on an idle stream, and on the initial idle window). That produced an
event with an empty Data buffer, which Stream[T].Next then fed to
json.Unmarshal -> "unexpected end of JSON input", ending the stream.

This made idle browser telemetry streams die. Skip blocks that have no event
type and no data so keepalive comments are ignored. Adds regression tests for a
keepalive following an event and a standalone keepalive.

Also simplifies examples/browser-telemetry: with the decoder fixed it no longer
needs a background goroutine generating continuous activity to dodge the
keepalive, so it now mirrors the node/python telemetry examples.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rgarcia rgarcia force-pushed the raf/telemetry-default-routing branch from ac1265d to 59bd952 Compare June 3, 2026 15:44
@rgarcia rgarcia requested a review from archandatta June 3, 2026 16:09
@rgarcia rgarcia merged commit 7e05757 into next Jun 3, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants