feat(telemetry): per-method telemetry events for workflow runs (swamp-club#301) by keeb · Pull Request #1349 · systeminit/swamp

keeb · 2026-05-09T01:38:07Z

Summary

Workflow runs now emit one TelemetryEntry per workflow YAML step that resolves to a model method, alongside the parent CLI invocation entry. Children use the existing cli_invocation event shape (same redactions as a direct swamp model method run) and link to the parent via a new optional parentInvocationId field. A new optional workflowContext block carries workflowName / runId / jobName / stepName / modelType / driver so per-driver and per-model-type analytics are first-class without joining through the parent.

The design choice was deliberate: the issue originally proposed a new workflow_method_invocation event type. We pushed back during planning and chose additive optional fields on cli_invocation instead — the swamp-club ingest side declares properties: Record<string, unknown> so additive fields ride across with no consumer-side coordination. Analytics queries that aggregate by command/subcommand/duration immediately see workflow-internal method invocations alongside direct ones.

What's new on the wire

{
  "event": "cli_invocation",
  "properties": {
    "id": "<child-uuid>",
    "invocation": {
      "command": "model",
      "subcommand": "method",
      "args": ["run", "<REDACTED>", "<methodName>"],
      "optionKeys": [],
      "globalOptions": []
    },
    "result": { "status": "success", "exitCode": 0 },
    "parentInvocationId": "<parent-cli-invocation-uuid>",
    "workflowContext": {
      "workflowName": "deploy",
      "runId": "<workflow-run-uuid>",
      "jobName": "build",
      "stepName": "validate",
      "modelType": "command/shell",
      "driver": "local"
    }
    // ... existing fields (startedAt, completedAt, durationMs, swampVersion,
    //     denoVersion, platform, invocationContext) unchanged
  }
}

Older entries continue to decode without parentInvocationId / workflowContext (forward-compat regression test added).

Architecture

Bridge (src/libswamp/workflows/telemetry_bridge.ts) — tracks in-flight method invocations by ${jobId}:${stepId}, maps the existing method_executing → step_completed/step_failed event pairs into success/error child entries, synthesizes durationMs = 0 entries for pre-method-executing failures (model lookup, vault expression resolution, vary-key validation, env-var validation), and finalizes any unfinished invocations on stream termination so cancellation/timeout paths don't silently drop telemetry.
Sink (WorkflowTelemetrySink in src/libswamp/workflows/run.ts) — narrow callback shape on WorkflowRunDeps. CLI binds it to TelemetryService.recordChildInvocation; non-CLI consumers pass undefined and the bridge becomes a no-op. Keeps libswamp free of direct domain.telemetry imports beyond plain DTOs.
Pre-allocated parent id — TelemetryService exposes a stable invocationId (constructor pre-allocates a TelemetryId) so children can reference it as parentInvocationId before the parent entry itself is recorded at the end of the CLI lifecycle. Module-scoped accessor (getActiveTelemetryService in src/cli/telemetry_integration.ts) is set in runCli before parse and cleared in the surrounding try/finally.

Domain event extensions

step_failed gains optional modelName / methodName / driver, populated only at the model-method failure site (line ~1820 in runStep's catch block). Structural failures — max-nesting-depth, cycle detection, nested-workflow throw/failed — leave them undefined so the bridge can distinguish method failures from structural failures.
method_executing gains optional driver, captured from the resolved DriverPlan. The yield is reordered to fire after DriverPlan resolution; vary-key validation failures (which happen between event start and method_executing) become pre-method-executing failures by design — more accurate categorization since the method was never invoked.

Failure semantics

Step outcome	Child entry
Success	`status: success`, real duration
Failure after `method_executing`	`status: error`, real duration
Failure before `method_executing` (model lookup, vault, vary, env var)	`status: error`, `durationMs = 0` (synthesized)
`allowFailure: true` step	`status: error` on the child (method outcome); parent records workflow `success`
Workflow-task / nested workflow / cycle / depth	No child entry (no method was ever invoked at this step)
Cancellation / timeout / mid-stream throw	In-flight invocations drained as `error` via the bridge's `finalize()`

V1 limitations (documented in `design/workflow.md`)

Workflow-step granularity only. Sub-method follow-up calls inside DefaultMethodExecutionService.execute are not captured separately.
Failures before workflow validation (workflow not found, input schema validation) produce no child entry — no method was ever resolved.

Test Plan

Unit tests — TelemetryEntry round-trip with/without new fields (back-compat regression locked in); TelemetryService.recordChildInvocation success and error paths with UserError classification; WorkflowTelemetryBridge for all five branches (success, post-method failure, pre-method failure, structural skip, finalize drain) plus idempotency, sequential workflows, forEach, allowFailure semantics — 23 new test cases.
libswamp error-terminal test — mid-stream throw with an in-flight method invocation: bridge's try/finally drains it as an error child, parent stream's error event still propagates cleanly.
Integration test (integration/telemetry_workflow_method_invocations_test.ts) — end-to-end CLI invocation runs a workflow with success step + forEach iterations, asserts one parent + correct number of children with parentInvocationId linkage and full workflowContext (including driver, modelType).
Wire-shape tests — HttpTelemetrySender includes new fields at properties.parentInvocationId / properties.workflowContext.*; omitted entirely when absent (no undefined serialization).
Repository round-trip — JsonTelemetryRepository saves and reads new fields; legacy entries without them decode cleanly.
Verification gates — deno check, deno lint, deno fmt --check, deno run test (5723 passed, 0 failed), deno run compile.
Manual end-to-end — ran a throwaway workflow in ~/git/swamp-media and inspected ~/git/swamp-media/.swamp/telemetry/. Got one parent + three children (ok-step, fanout-a, fanout-b) with all workflowContext fields populated and consistent parentInvocationId / runId. Children share the redacted-args shape with direct model method run invocations. forEach iterations have distinct stepNames.

Consumer side

Verified against swamp-club: services/telemetry/lib/schema.ts declares properties: Record<string, unknown> so the additive fields ride across the wire with zero coordination. Existing rollup metrics in consumers/metrics.ts already follow the "read what you need from the opaque bag" pattern. A follow-up workflowContext rollup metric (per-driver / per-model-type / per-step counts) is a separate swamp-club issue, not blocking.

🤖 Generated with Claude Code

…-club#301) Workflow runs now emit one TelemetryEntry per workflow YAML step that resolves to a model method, alongside the parent CLI invocation entry. Children use the existing cli_invocation event shape (same redactions as a direct `swamp model method run`) and link to the parent via a new optional `parentInvocationId` field. A new optional `workflowContext` block carries workflowName/runId/jobName/stepName/modelType/driver so per-driver and per-model-type analytics are first-class without joining through the parent. The bridge lives in src/libswamp/workflows/telemetry_bridge.ts: it tracks in-flight method invocations between method_executing and the matching step_completed/step_failed events, synthesizes durationMs=0 entries for pre-method-executing failures (model lookup, vault expression resolution, vary-key validation, env-var validation), and finalizes any unfinished invocations on stream termination so cancellation/timeout paths don't silently drop telemetry. Domain event extensions: - step_failed gains optional modelName/methodName/driver, populated only at the model-method failure site; structural failures (max-depth, cycle, nested-workflow) leave them undefined so the bridge can distinguish method failures from structural failures. - method_executing gains optional driver, captured from the resolved DriverPlan; the yield is reordered to fire after DriverPlan resolution. Wire shape is opaque on the swamp-club ingest side (properties: Record<string, unknown>) so the additive fields ride across with no consumer-side coordination — verified against services/telemetry/lib/schema.ts and consumers/metrics.ts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous matcher used `stepName.startsWith("fanout-") && stepName.includes("a")` which non-deterministically aliased `"fanout-b"` to the same entry as `"fanout-a"` because the prefix `"fanout-"` itself contains the letter `"a"`. Linux CI's directory iteration order returned `"fanout-b"` first, so `find()` matched it for BOTH `fanoutA` and `fanoutB` and the distinct-stepNames assertion failed. Use exact `===` match instead — the iterations are known constants in this fixture. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The fixture uses POSIX shell built-ins (`echo`, `exit`) via the command/shell model. On Windows the shell exec exits with code -65536 because shell built-ins aren't directly resolvable as Windows binaries — already a known limitation handled by `keeb_shell_model_test.ts` which uses the same pattern. The bridge logic itself is platform-independent and covered by src/libswamp/workflows/telemetry_bridge_test.ts which runs on all platforms. This integration test verifies end-to-end CLI plumbing on POSIX only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions

CLI UX Review

Blocking

None.

Suggestions

None.

Verdict

PASS — This PR makes no user-facing changes. All modifications are internal telemetry plumbing: enriching the TelemetryEntry wire shape with parentInvocationId and workflowContext, wiring a telemetry sink into WorkflowRunDeps, and a module-scoped accessor in telemetry_integration.ts. No command flags, help text, log-mode output, JSON-mode output, or error messages were added or changed.

github-actions

Code Review

Well-architected feature with comprehensive test coverage across all layers.

Blocking Issues

None.

Suggestions

key.split(":") in finalize() is fragile (src/libswamp/workflows/telemetry_bridge.ts:197): The step key uses ${jobId}:${stepId} as the map key, then split(":") to recover the parts during drain. If a step ID ever contained a colon (e.g., from a CEL expression or template expansion), the split would misattribute the stepName in the workflow context. Step names don't currently use colons so this isn't realistic today, but a safer approach would be to store the (jobId, stepId) tuple directly on InFlightMethodInvocation rather than re-parsing the key. Low-priority since it only affects the error-drain path.

What looks good

DDD alignment: WorkflowContext is a proper value object (immutable, equality by value), the bridge acts as an application-layer service mediating between domain events and the telemetry sink, and the sink callback keeps libswamp decoupled from the domain telemetry service.
Import boundary: CLI command imports WorkflowTelemetrySink and WorkflowRunDeps from ../../libswamp/mod.ts — no direct internal imports.
Additive wire schema: New optional fields on cli_invocation event with backward-compat regression tests for legacy entries missing parentInvocationId/workflowContext. Clean zero-serialization for absent optional fields.
Failure semantics: The five-way failure categorization (success, post-method error, pre-method-executing error, structural skip, finalize drain) is well-mapped and each branch has dedicated test coverage.
Pre-allocated invocationId: Letting children reference the parent ID before the parent entry is written avoids timestamp-based join heuristics — correct design.
method_executing reordering: Moving the yield to after DriverPlan resolution is the right call — it gives the bridge the resolved driver and correctly reclassifies vary-key failures as pre-method-executing.
Test breadth: 23 new unit tests across bridge, service, entry, repository, and HTTP sender; plus an integration test that verifies the full CLI → libswamp → persistence path. The finalize() idempotency, sequential-workflows, and mid-stream-throw tests are particularly well-constructed.
License headers present on all new files.

github-actions

Adversarial Review

Critical / High

No critical or high severity issues found.

Medium

Unhandled telemetry write failure can crash workflow execution — src/libswamp/workflows/run.ts:579 and :631

await telemetryBridge.observe(mapped) inside the main for-await loop and await telemetryBridge.finalize() in the finally block both propagate any exception thrown by sink.recordChildInvocation. If the underlying JsonTelemetryRepository.save() fails (disk full, permission denied, corrupted directory), the workflow fails with a confusing telemetry error instead of completing normally.

Breaking scenario: Disk nears capacity during a long workflow run. A child telemetry entry write fails → observe() throws → the catch block yields { kind: "error", error: workflowExecutionFailed(diskError) } → the workflow appears to have failed, even though all model methods succeeded.

Additionally, if finalize() throws in the finally block, it can mask the original workflow error (the thrown error from finally replaces whatever the try/catch was doing).

Suggested fix: Wrap both the observe and finalize calls in try/catch to swallow telemetry failures gracefully:
```
if (telemetryBridge) {
  try { await telemetryBridge.observe(mapped); } catch { /* telemetry best-effort */ }
}
// ...
if (telemetryBridge) {
  try { await telemetryBridge.finalize(); } catch { /* telemetry best-effort */ }
}
```
Severity is medium rather than high because: (a) in practice the JSON repository writes small files to the .swamp/telemetry/ directory which is unlikely to fail in normal operation, and (b) the parent recordSuccess/recordError calls in the CLI lifecycle have the same unguarded pattern, so this isn't a regression — it's consistent with the existing design. But since child invocations fire mid-workflow (not just at CLI exit), the blast radius of a failure is larger here.
key.split(":") in finalize() is fragile when identifiers contain colons — src/libswamp/workflows/telemetry_bridge.ts:197

const [jobId, stepId] = key.split(":"); destructures only the first two segments. If a job name or step name contains a colon (e.g. "deploy:prod", or a forEach-expanded name like "step-host:port[0]"), the stepId would be truncated. The stepKey function on line 230 joins with : but the reverse split is not symmetric.

Breaking scenario: A workflow YAML names a job "deploy:us-east-1". The key becomes "deploy:us-east-1:validate". The split produces jobId = "deploy", stepId = "us-east-1" — both wrong, and "validate" is lost entirely. The workflowContext in the drained telemetry entry would have incorrect jobName and stepName.

Suggested fix: Use indexOf for a single split: const sep = key.indexOf(":"); const jobId = key.slice(0, sep); const stepId = key.slice(sep + 1);

Impact is limited to telemetry metadata for drained in-flight entries (the finalize path). The normal observe path uses the original event's jobId/stepId directly and is unaffected.

Low

new Date(0) dead write — src/libswamp/workflows/telemetry_bridge.ts:159

The synthesized InFlightMethodInvocation sets startedAt: new Date(0) but this value is never read — sameInstant on line 165 is passed to recordChildInvocation directly. The startedAt inside the synthesized object is only consumed by buildWorkflowContext, which doesn't use it. Not a bug, just a misleading dead value.
Nested workflow events forwarded to parent bridge — The execution service's runWorkflowStep forwards child workflow events to the parent stream (line 1961 in execution_service.ts). If a nested workflow emits its own method_executing / step_completed pairs, the parent bridge would observe them and create child telemetry entries attributed to the parent workflow's workflowName/runId. This is arguably correct (the parent bridge sees all events from the parent stream), but nested workflow method invocations would carry the outer workflow's name, not the inner workflow's. The PR explicitly documents nested workflows as out of scope for V1, so this is just a note for future iterations.

Verdict

PASS — The architecture is well-considered. The bridge design is clean, idempotent, and correctly handles all five documented failure branches. Event ordering is correct — method_executing fires after driver resolution, model_resolved fires before it, and step_failed carries the right context for pre/post-method-executing failures. Wire-shape tests lock the contract. The two medium findings are worth addressing in a follow-up but neither represents data loss or incorrect behavior in normal operation.

keeb and others added 3 commits May 8, 2026 18:37

github-actions Bot approved these changes May 9, 2026

View reviewed changes

github-actions Bot reviewed May 9, 2026

View reviewed changes

stack72 merged commit 16e942b into main May 9, 2026
11 checks passed

stack72 deleted the feat/per-method-workflow-telemetry branch May 9, 2026 02:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(telemetry): per-method telemetry events for workflow runs (swamp-club#301)#1349

feat(telemetry): per-method telemetry events for workflow runs (swamp-club#301)#1349
stack72 merged 3 commits intomainfrom
feat/per-method-workflow-telemetry

keeb commented May 9, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

keeb commented May 9, 2026

Summary

What's new on the wire

Architecture

Domain event extensions

Failure semantics

V1 limitations (documented in design/workflow.md)

Test Plan

Consumer side

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

CLI UX Review

Blocking

Suggestions

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Code Review

Blocking Issues

Suggestions

What looks good

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Adversarial Review

Critical / High

Medium

Low

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

V1 limitations (documented in `design/workflow.md`)