feat(obs): add wrapper telemetry foundations (FER-10253)#15
Conversation
Wires every JS action through `instrumentAction` + `runPostCleanup` so each run emits structured `automation_run_started` / `automation_run_completed` / `wrapper_failed` events to four sinks: a `::fern-telemetry::<json>` log line, PostHog (always), Sentry (`wrapper_failed` only), and the Lightweight API (`/v1/automation/events`, `wrapper_failed` only). - New `packages/shared/src/obs/` module — telemetry-client, posthog, sentry, lightweight-api, automation-context, errors, types. All free functions; no class instances exposed. - `WrapperError(errorCode, message, originalError?)` is the single way for wrapper code to attach a stable SCREAMING_SNAKE error code to a thrown exception. Errors that aren't `WrapperError` get the generic `UNKNOWN_ERROR` code at classification time. - `injectFernToken(token)` (called from inside `instrumentAction`'s body after parsing) configures the Lightweight API auth so input-parsing failures still get classified before the token is available. - `flushTelemetry()` (called from `runAction` before `process.exit`) awaits every in-flight Lightweight API request, then shuts down the PostHog and Sentry SDK queues. - All 8 actions wired (preview, generate, upgrade, verify, sync-openapi, setup-cli, resolve-cli, verify-token). CLI invocations in `generate` and `sync-openapi` catch + re-throw as `WrapperError` with action-specific codes (`CLI_AUTOMATIONS_GENERATE_FAILED`, `CLI_GHA_PULL_SPEC_FAILED`, `CLI_GHA_SYNC_SPECS_FAILED`). `setup-cli` wraps `installFernCli` and `buildCliFromSource` likewise (`CLI_INSTALL_*`). Hardcoded build constants (`POSTHOG_API_KEY`, `SENTRY_DSN_AUTOMATIONS`, `LIGHTWEIGHT_API_URL`) are empty strings today; telemetry is a runtime no-op until they're populated. Sentry release tagging and source-maps upload also pending — see PR description for the follow-ups. Refs FER-10253. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| export async function runAction(fn: () => Promise<void>): Promise<void> { | ||
| try { | ||
| await fn(); | ||
| await flushTelemetry(); |
There was a problem hiding this comment.
🟡 flushTelemetry() inside try block causes successful actions to fail if flush throws
await flushTelemetry() at packages/shared/src/index.ts:59 is inside the try block, immediately after await fn(). If fn() completes successfully but flushTelemetry() throws an unexpected error, the catch block fires — calling core.setFailed() with the flush error message and then process.exit(1), turning a genuinely successful action into a failure. The flush call should be outside the try block (or in a finally with its own guard) so telemetry issues can never retroactively fail a successful run. While the individual shutdown functions (shutdownPostHog, shutdownSentry, etc.) have internal try-catch blocks that make this unlikely in practice, the structural placement is still wrong.
Prompt for agents
In packages/shared/src/index.ts, the runAction function has `await flushTelemetry()` on the success path inside the try block (line 59). Move it outside the try/catch so that a flush failure cannot convert a successful action into a failed one. One approach is to restructure as:
try {
await fn();
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
core.setFailed(message);
await flushTelemetry();
process.exit(1);
}
await flushTelemetry();
Alternatively, use a finally block with a flag tracking whether fn() succeeded, so flush always runs but never causes the action to be marked as failed.
Was this helpful? React with 👍 or 👎 to provide feedback.
| core.info(`${TELEMETRY_LOG_PREFIX}${JSON.stringify(logPayload)}`); | ||
|
|
||
| capturePostHogEvent(event, context); | ||
| captureFernAutomationsEvent(event, context); |
There was a problem hiding this comment.
🟡 Automation Event API receives all events instead of only wrapper_failed
captureFernAutomationsEvent is called unconditionally for every event at packages/shared/src/telemetry/telemetry.ts:114, but the JSDoc on emit() (line 88: "only when event === EventName.WrapperFailed") and the JSDoc on captureFernAutomationsEvent itself (packages/shared/src/telemetry/automation-event-api.ts:92: "No-op for non-wrapper_failed events") both document that only wrapper_failed events should flow to this sink. The Sentry call on line 116 correctly has the if (event.event === EventName.WrapperFailed) guard, but the Automation Event API call on line 114 is missing the same guard. Once AUTOMATION_EVENT_API_URL is configured (currently empty), this will POST automation_run_started and automation_run_completed events to the API — events the server doesn't expect from this sink.
| captureFernAutomationsEvent(event, context); | |
| if (event.event === EventName.WrapperFailed) { | |
| captureFernAutomationsEvent(event, context); | |
| captureSentryEvent(event, context, opts?.originalError); | |
| } |
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Wraps every JS action through a single telemetry pipeline that emits structured
automation_run_started/automation_run_completed/wrapper_failedevents to four sinks: a::fern-telemetry::<json>log line (always), PostHog (always), Sentry (failures only), and the Lightweight API at/v1/automation/events(failures only). Single touchpoint for actions:instrumentAction(name, fn)plusinjectFernToken(token)to authenticate the Lightweight API POST.Linear: FER-10253
Highlights
packages/shared/src/obs/module —telemetry-client,posthog,sentry,lightweight-api,automation-context,errors,types. All free functions, no class instances exposed.WrapperError(errorCode, message, originalError?)is the single way for wrapper code to attach a stable SCREAMING_SNAKE error code to a thrown exception. Anything that isn't aWrapperErrorgets classified asUNKNOWN_ERROR.injectFernToken(token)is called from insideinstrumentAction's body after parsing — so input-parsing failures still get classified aswrapper_failedvia the catch path before the token is available.flushTelemetry()runs fromrunAction's exit path: awaits in-flight Lightweight API POSTs (Promise.allSettled), then shuts down the PostHog and Sentry SDK queues.generate/sync-openapicatch + re-throw asWrapperErrorwith action-specific codes (CLI_AUTOMATIONS_GENERATE_FAILED,CLI_GHA_PULL_SPEC_FAILED,CLI_GHA_SYNC_SPECS_FAILED).setup-cliwrapsinstallFernCliand the source-build path inWrapperError("CLI_INSTALL_*", ...).outcome=cancelledin saved state; the post phase emitsautomation_run_completedwithstatus: cancelled. Nowrapper_failedfrom cancellation — it's not a wrapper-side fault.Follow-ups before this is "live"
Two things need to happen on top of this PR before any telemetry actually fires:
Wire the actual build constants.
POSTHOG_API_KEY,SENTRY_DSN_AUTOMATIONS, andLIGHTWEIGHT_API_URLinpackages/shared/src/obs/build-constants.tsare empty strings. The PostHog and Sentry SDKs initialize as no-ops when their constant is empty; the Lightweight API short-circuits when its URL is empty. Once theautomationsSentry project and the/v1/automation/eventsendpoint exist, hardcode the values here. (PostHog API keys and Sentry DSNs are designed to be embedded in client code — they're write-only at the project level. No CI secret needed.)Sentry release tagging + source-maps upload.
Sentry.init({ release: ... })is currently unset inpackages/shared/src/obs/sentry.ts, and source maps aren't uploaded — so any captured exception today would point at the bundleddist/index.jsline numbers, not the original TypeScript. Both depend on a CI-driven release pipeline that bakes a release tag at the moment dist is built. Open question: do we move release builds to CI (therelease.ymlheader already has a TODO about this) so we can runsentry-cli releases new <tag>+sentry-cli releases files upload-sourcemapskeyed to the same tag we pass toSentry.init? Or do we keep dist commits as-is and run a separate sourcemaps-upload workflow on tag publish?Either way the value passed to
Sentry.init({ release })and the Sentry CLI's release identifier need to agree, otherwise deobfuscation won't resolve. Worth a separate Linear ticket.Test plan
pnpm typecheck && pnpm check && pnpm test && pnpm build— local clean (typecheck across 10 packages, lint clean across 82 files, 51/51 shared tests pass, all 8 dists rebuilt at ~3.7 MB)setup-cli@v0.0.0-obs1) and trigger a workflow that uses it. Verify::fern-telemetry::log lines appear with the righteventnames and that PostHog / Sentry stay silent (constants still empty). Once constants are wired (follow-up 1), repeat and confirm events show up in PostHog / Sentry.generateorsync-openapi, verifywrapper_failedlog line carries the action-specificerror_code(e.g.CLI_AUTOMATIONS_GENERATE_FAILED).