Skip to content

feat(obs): add wrapper telemetry foundations (FER-10253)#15

Merged
FedeZara merged 1 commit into
mainfrom
FedeZara/fer-10253-obs-foundations
May 14, 2026
Merged

feat(obs): add wrapper telemetry foundations (FER-10253)#15
FedeZara merged 1 commit into
mainfrom
FedeZara/fer-10253-obs-foundations

Conversation

@FedeZara
Copy link
Copy Markdown
Contributor

@FedeZara FedeZara commented May 14, 2026

#14 was merged to the wrong branch by mistake — re-opening against main.

Summary

Wraps every JS action through a single telemetry pipeline that emits structured automation_run_started / automation_run_completed / wrapper_failed events to four sinks: a ::fern-telemetry::<json> log line (always), PostHog (always), Sentry (failures only), and the Lightweight API at /v1/automation/events (failures only). Single touchpoint for actions: instrumentAction(name, fn) plus injectFernToken(token) to authenticate the Lightweight API POST.

Linear: FER-10253

Highlights

  • New packages/shared/src/obs/ module — telemetry-client, posthog, sentry, lightweight-api, automation-context, errors, types. All free functions, no class instances exposed.
  • WrapperError(errorCode, message, originalError?) is the single way for wrapper code to attach a stable SCREAMING_SNAKE error code to a thrown exception. Anything that isn't a WrapperError gets classified as UNKNOWN_ERROR.
  • injectFernToken(token) is called from inside instrumentAction's body after parsing — so input-parsing failures still get classified as wrapper_failed via the catch path before the token is available.
  • flushTelemetry() runs from runAction's exit path: awaits in-flight Lightweight API POSTs (Promise.allSettled), then shuts down the PostHog and Sentry SDK queues.
  • All 8 actions wired (preview, generate, upgrade, verify, sync-openapi, setup-cli, resolve-cli, verify-token). CLI invocations in generate / sync-openapi catch + re-throw as WrapperError with action-specific codes (CLI_AUTOMATIONS_GENERATE_FAILED, CLI_GHA_PULL_SPEC_FAILED, CLI_GHA_SYNC_SPECS_FAILED). setup-cli wraps installFernCli and the source-build path in WrapperError("CLI_INSTALL_*", ...).
  • Cancellation path: SIGINT/SIGTERM handler marks outcome=cancelled in saved state; the post phase emits automation_run_completed with status: cancelled. No wrapper_failed from cancellation — it's not a wrapper-side fault.

Follow-ups before this is "live"

Two things need to happen on top of this PR before any telemetry actually fires:

  1. Wire the actual build constants. POSTHOG_API_KEY, SENTRY_DSN_AUTOMATIONS, and LIGHTWEIGHT_API_URL in packages/shared/src/obs/build-constants.ts are empty strings. The PostHog and Sentry SDKs initialize as no-ops when their constant is empty; the Lightweight API short-circuits when its URL is empty. Once the automations Sentry project and the /v1/automation/events endpoint exist, hardcode the values here. (PostHog API keys and Sentry DSNs are designed to be embedded in client code — they're write-only at the project level. No CI secret needed.)

  2. Sentry release tagging + source-maps upload. Sentry.init({ release: ... }) is currently unset in packages/shared/src/obs/sentry.ts, and source maps aren't uploaded — so any captured exception today would point at the bundled dist/index.js line numbers, not the original TypeScript. Both depend on a CI-driven release pipeline that bakes a release tag at the moment dist is built. Open question: do we move release builds to CI (the release.yml header already has a TODO about this) so we can run sentry-cli releases new <tag> + sentry-cli releases files upload-sourcemaps keyed to the same tag we pass to Sentry.init? Or do we keep dist commits as-is and run a separate sourcemaps-upload workflow on tag publish?

Either way the value passed to Sentry.init({ release }) and the Sentry CLI's release identifier need to agree, otherwise deobfuscation won't resolve. Worth a separate Linear ticket.

Test plan

  • pnpm typecheck && pnpm check && pnpm test && pnpm build — local clean (typecheck across 10 packages, lint clean across 82 files, 51/51 shared tests pass, all 8 dists rebuilt at ~3.7 MB)
  • Cut a pre-release tag of one action (e.g. setup-cli@v0.0.0-obs1) and trigger a workflow that uses it. Verify ::fern-telemetry:: log lines appear with the right event names and that PostHog / Sentry stay silent (constants still empty). Once constants are wired (follow-up 1), repeat and confirm events show up in PostHog / Sentry.
  • Force a non-zero CLI exit inside a test workflow that uses generate or sync-openapi, verify wrapper_failed log line carries the action-specific error_code (e.g. CLI_AUTOMATIONS_GENERATE_FAILED).

Open in Devin Review

Wires every JS action through `instrumentAction` + `runPostCleanup` so each
run emits structured `automation_run_started` / `automation_run_completed` /
`wrapper_failed` events to four sinks: a `::fern-telemetry::<json>` log line,
PostHog (always), Sentry (`wrapper_failed` only), and the Lightweight API
(`/v1/automation/events`, `wrapper_failed` only).

- New `packages/shared/src/obs/` module — telemetry-client, posthog, sentry,
  lightweight-api, automation-context, errors, types. All free functions; no
  class instances exposed.
- `WrapperError(errorCode, message, originalError?)` is the single way for
  wrapper code to attach a stable SCREAMING_SNAKE error code to a thrown
  exception. Errors that aren't `WrapperError` get the generic `UNKNOWN_ERROR`
  code at classification time.
- `injectFernToken(token)` (called from inside `instrumentAction`'s body
  after parsing) configures the Lightweight API auth so input-parsing
  failures still get classified before the token is available.
- `flushTelemetry()` (called from `runAction` before `process.exit`) awaits
  every in-flight Lightweight API request, then shuts down the PostHog and
  Sentry SDK queues.
- All 8 actions wired (preview, generate, upgrade, verify, sync-openapi,
  setup-cli, resolve-cli, verify-token). CLI invocations in `generate` and
  `sync-openapi` catch + re-throw as `WrapperError` with action-specific
  codes (`CLI_AUTOMATIONS_GENERATE_FAILED`, `CLI_GHA_PULL_SPEC_FAILED`,
  `CLI_GHA_SYNC_SPECS_FAILED`). `setup-cli` wraps `installFernCli` and
  `buildCliFromSource` likewise (`CLI_INSTALL_*`).

Hardcoded build constants (`POSTHOG_API_KEY`, `SENTRY_DSN_AUTOMATIONS`,
`LIGHTWEIGHT_API_URL`) are empty strings today; telemetry is a runtime no-op
until they're populated. Sentry release tagging and source-maps upload also
pending — see PR description for the follow-ups.

Refs FER-10253.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@FedeZara FedeZara requested a review from Swimburger as a code owner May 14, 2026 13:42
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

export async function runAction(fn: () => Promise<void>): Promise<void> {
try {
await fn();
await flushTelemetry();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 flushTelemetry() inside try block causes successful actions to fail if flush throws

await flushTelemetry() at packages/shared/src/index.ts:59 is inside the try block, immediately after await fn(). If fn() completes successfully but flushTelemetry() throws an unexpected error, the catch block fires — calling core.setFailed() with the flush error message and then process.exit(1), turning a genuinely successful action into a failure. The flush call should be outside the try block (or in a finally with its own guard) so telemetry issues can never retroactively fail a successful run. While the individual shutdown functions (shutdownPostHog, shutdownSentry, etc.) have internal try-catch blocks that make this unlikely in practice, the structural placement is still wrong.

Prompt for agents
In packages/shared/src/index.ts, the runAction function has `await flushTelemetry()` on the success path inside the try block (line 59). Move it outside the try/catch so that a flush failure cannot convert a successful action into a failed one. One approach is to restructure as:

try {
  await fn();
} catch (err) {
  const message = err instanceof Error ? err.message : String(err);
  core.setFailed(message);
  await flushTelemetry();
  process.exit(1);
}
await flushTelemetry();

Alternatively, use a finally block with a flag tracking whether fn() succeeded, so flush always runs but never causes the action to be marked as failed.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

core.info(`${TELEMETRY_LOG_PREFIX}${JSON.stringify(logPayload)}`);

capturePostHogEvent(event, context);
captureFernAutomationsEvent(event, context);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Automation Event API receives all events instead of only wrapper_failed

captureFernAutomationsEvent is called unconditionally for every event at packages/shared/src/telemetry/telemetry.ts:114, but the JSDoc on emit() (line 88: "only when event === EventName.WrapperFailed") and the JSDoc on captureFernAutomationsEvent itself (packages/shared/src/telemetry/automation-event-api.ts:92: "No-op for non-wrapper_failed events") both document that only wrapper_failed events should flow to this sink. The Sentry call on line 116 correctly has the if (event.event === EventName.WrapperFailed) guard, but the Automation Event API call on line 114 is missing the same guard. Once AUTOMATION_EVENT_API_URL is configured (currently empty), this will POST automation_run_started and automation_run_completed events to the API — events the server doesn't expect from this sink.

Suggested change
captureFernAutomationsEvent(event, context);
if (event.event === EventName.WrapperFailed) {
captureFernAutomationsEvent(event, context);
captureSentryEvent(event, context, opts?.originalError);
}
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@FedeZara FedeZara merged commit 7a8f045 into main May 14, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants