Skip to content

v0.6.54: mothership tracing, db pool size increase#4264

Merged
icecrasher321 merged 3 commits intomainfrom
staging
Apr 22, 2026
Merged

v0.6.54: mothership tracing, db pool size increase#4264
icecrasher321 merged 3 commits intomainfrom
staging

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

@waleedlatif1 waleedlatif1 commented Apr 22, 2026

Sg312 and others added 2 commits April 22, 2026 09:06
* fix(db): raise db pool size

* Raise socket connections

* bump up connection size even more
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Apr 22, 2026 8:54pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 22, 2026

PR Summary

Medium Risk
Touches core copilot chat streaming/stop/abort/confirm flows and adds pervasive OpenTelemetry context propagation; mistakes could break stream reconnect/stop behavior or reduce/overload telemetry export.

Overview
Adds end-to-end mothership tracing by introducing fetchGo for Sim→Go calls, withIncomingGoSpan for Go→Sim ingress, and extensive span attributes/outcomes across billing, API-key validation, chat abort/stop/confirm, and stream resume paths.

Refactors /api/mothership/chat and /api/mothership/chat/stream to propagate traceparent back to the browser and back into side-channel requests (stop/abort/replay/confirm), improve resume terminal replay to carry the latest requestId, and ensure queued follow-ups dispatch after a user Stop.

Updates persistence behavior so stopped turns can store requestId and avoids finalizing assistant turns on cancelled runs (letting /chat/stop be the sole writer), adds DB-operation spans around most async-run repository queries, and overhauls instrumentation-node.ts (service naming/origin tagging, allowlisted spans, OTLP endpoint/headers parsing, sampling config, and exporter error surfacing).

Reviewed by Cursor Bugbot for commit d927d8b. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d927d8b. Configure here.

Comment thread apps/sim/lib/copilot/chat/post.ts
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 22, 2026

Greptile Summary

This PR adds end-to-end OTel mothership tracing across the Sim↔Go copilot path — new span utilities (otel.ts), W3C traceparent propagation (propagation.ts), an instrumented Go fetch wrapper (fetch.ts), a MothershipOriginSpanProcessor in instrumentation-node.ts, and span hooks throughout the chat POST/stream/finalize lifecycle — plus a DB/socket connection pool increase from 10 → 30.

  • P1 — span leak in post.ts: When resolveBranch returns a NextResponse (feature-flag rejection, bad params, etc.), the return branch exits the otelContextApi.with() callback without calling otelRoot.finish(). The outer catch is never hit for a successful return, so the root gen_ai.agent.execute span stays open indefinitely for every pre-flight failure. The stream GET route handles this correctly (calls rootSpan.end() before each early return).
  • The default sampling ratio changed from the old hard-coded 10% (TraceIdRatioBasedSampler(0.1)) to a runtime-resolved value defaulting to 100% — confirm the OTLP backend is sized for full-sample load if TELEMETRY_SAMPLING_RATIO is not yet set in production.

Confidence Score: 4/5

Functionally safe to ship; the span leak on error paths only affects telemetry completeness, not user-visible behavior.

One P1 finding (root OTel span leaked on branch-resolution failure paths in post.ts) should be fixed before or shortly after merge — it degrades trace fidelity for all rejected pre-flight requests. All other findings are P2. Core chat, SSE, and DB functionality are unaffected.

apps/sim/lib/copilot/chat/post.ts — early-return paths need otelRoot.finish() before returning the NextResponse.

Important Files Changed

Filename Overview
apps/sim/lib/copilot/chat/post.ts Major refactor adding OTel root span lifecycle to the unified chat POST handler; root span is leaked on early-return paths (branch resolution failures) where otelRoot.finish() is never called.
apps/sim/instrumentation-node.ts OTel bootstrap rewritten for mothership tracing: new MothershipOriginSpanProcessor, configurable sampling ratio (defaults to 100% vs old hard-coded 10%), W3C header parsing, and monkey-patched exporter for error surfacing.
apps/sim/lib/copilot/request/otel.ts New file: comprehensive OTel utilities for copilot request lifecycle — root span management, tool/span helpers, message capture, and cancel-reason classification; well-documented and structured.
apps/sim/lib/copilot/request/go/fetch.ts New file: OTel-instrumented wrapper for outbound Sim→Go HTTP calls with W3C traceparent injection and lazy tracer resolution to avoid Turbopack race condition.
packages/db/index.ts DB connection pool max increased from 10 to 30; straightforward capacity fix.
apps/sim/socket/database/operations.ts Socket DB pool max increased from 10 to 30, matching the main pool bump.
apps/sim/lib/copilot/request/lifecycle/run.ts Adds cancelled discriminator to OrchestratorResult and threads OTel context; silently removes the X-Sim-Request-ID header from checkpoint-loop fetches.
apps/sim/app/api/copilot/chat/stream/route.ts Resume/reconnect route refactored to extract W3C parent context and wrap the full poll loop under an OTel root span; all early-return paths correctly call rootSpan.end().

Sequence Diagram

sequenceDiagram
    participant Browser
    participant SimAPI as Sim API (Next.js)
    participant OtelRoot as OTel Root Span<br/>(gen_ai.agent.execute)
    participant Go as Go Mothership
    participant OTLP as OTLP Backend

    Browser->>SimAPI: POST /api/copilot/chat/stream
    SimAPI->>OtelRoot: startCopilotOtelRoot() [ROOT_CONTEXT]
    SimAPI->>SimAPI: resolveBranch() [withCopilotSpan]
    alt branch is NextResponse (error)
        SimAPI-->>Browser: early return (span NOT finished)
    else branch resolved
        SimAPI->>SimAPI: persistUserMessage [withCopilotSpan]
        SimAPI->>SimAPI: buildPayload [withCopilotSpan]
        SimAPI->>Browser: SSE Response + traceparent header
        SimAPI->>Go: fetchGo(url) + W3C traceparent injected
        Go-->>SimAPI: SSE stream
        SimAPI->>OtelRoot: finish() on stream end
    end
    OtelRoot-->>OTLP: BatchSpanProcessor export

    Browser->>SimAPI: GET /api/copilot/chat/stream (reconnect)
    SimAPI->>SimAPI: contextFromRequestHeaders() -> child of original trace
    SimAPI->>SimAPI: poll/replay loop [rootSpan]
    SimAPI-->>Browser: SSE replay
    SimAPI->>SimAPI: rootSpan.end()
Loading

Reviews (1): Last reviewed commit: "fix(db): raise db pool size (#4263)" | Re-trigger Greptile

Comment thread apps/sim/lib/copilot/chat/post.ts
Comment thread apps/sim/instrumentation-node.ts
Comment thread apps/sim/lib/copilot/request/lifecycle/run.ts
Comment thread apps/sim/instrumentation-node.ts
@icecrasher321 icecrasher321 merged commit 64cfda5 into main Apr 22, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants