feat(notifications): user-defined domain notifications (type: notification) by PolyphonyRequiem · Pull Request #213 · microsoft/conductor

PolyphonyRequiem · 2026-05-19T22:33:02Z

Summary

Adds a new type: notification step that lets workflow authors publish typed, declared domain events to a dedicated JSONL stream for external tooling to hook off.

Notifications are distinct from execution events (agent_started, route_evaluated, ...): execution events describe what the runtime is doing; notifications describe what the workflow is asserting to the outside world.

Notifications are a visibility primitive, not a control-flow primitive. They do not block, do not wait for an ack, and do not gate routing. If a decision needs to flow back into the workflow, use human_gate.

YAML surface

workflow:
  notifications:
    namespace: polyphony.feature_pr   # defaults to slugified workflow name
    correlation: [apex_id]            # inherited by sub-workflows
    types:
      pr_ready:
        version: 1
        payload:
          pr_url: { type: string }
          pr_id:  { type: number }

agents:
  - name: announce_ready
    type: notification
    notification: pr_ready
    payload:
      pr_url: "{{ open_pr.output.url }}"
      pr_id:  "{{ open_pr.output.id }}"
    routes:
      - to: next_step       # emission is independent of routing

Envelope shape

emission_id = <run_id>:<dotted_step_path>:<iteration> — stable across resume/replay so the first downstream consumer can dedupe.
schema_id = <namespace>.<type>@<version> — consumers MUST pin on this, not on notification_type alone.
correlation — first-class field; auto-merged from declared keys and inherited from the parent workflow (parent wins on collision so the upstream trail survives deep nesting).
payload — Jinja2-rendered then type-validated against the declared OutputField schema.

Delivery (v1)

Every emission flows through WorkflowEventEmitter and is written by NotificationLogSubscriber to a dedicated …notifications.jsonl file next to the existing event log (path printed at end-of-run).

No webhooks, shell hooks, or model-driven emission in v1.

What's in this PR

src/conductor/config/schema.py — NotificationsConfig + NotificationTypeDef; notification and payload fields on AgentDef; strict positive/negative validators.
src/conductor/config/validator.py — cross-references undeclared types, payload field mismatches, unknown correlation keys.
src/conductor/executor/notification.py — pure envelope builder (rendering, coercion, type validation, emission-id construction).
src/conductor/engine/workflow.py — dispatch branch + correlation snapshot inherited into sub-workflows.
src/conductor/engine/event_log.py — NotificationLogSubscriber filters to notification events only.
src/conductor/cli/run.py — wires the subscriber in both run and resume paths.
examples/notifications.yaml — runnable example with namespace, correlation, mid-workflow emission.
AGENTS.md — full design + schema-version-bump policy + "visibility not control flow" callout.
Tests: tests/test_config/test_notifications.py + tests/test_executor/test_notification.py (18 focused tests).

Schema-version-bump policy (documented, not enforced)

Adding an optional field is non-breaking — do not bump version.
Removing/renaming a field, changing a field's type, or making an optional field required is breaking — bump version and emit under the new schema_id.
During a transition a workflow MAY emit both pr_ready@1 and pr_ready@2 from adjacent steps.
Consumers MUST pin on schema_id.

Validation

Undeclared type, payload field mismatch, unknown correlation key, or a type mismatch on a rendered value all fail the workflow loudly with ValidationError. The contract is the whole point.

Testing

uv run ruff check — clean
uv run ruff format --check — clean
uv run pytest tests/test_config tests/test_executor tests/test_engine tests/test_cli -m "not performance" — 1692 passed, 5 skipped
uv run conductor validate examples/notifications.yaml — clean
All other examples/*.yaml still validate clean.

Deliberately deferred

Edge-attached notifications (notify: on RouteDef) — possible v1.5 sugar reusing the same envelope.
Webhooks / shell hooks for delivery.
MCP tool for model-driven emission.
Ack / response semantics (use human_gate instead).

codecov-commenter · 2026-05-19T22:44:06Z

Codecov Report

❌ Patch coverage is 70.27027% with 88 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@75b01b5). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/conductor/engine/workflow.py	33.33%	30 Missing ⚠️
src/conductor/executor/notification.py	68.47%	29 Missing ⚠️
src/conductor/config/schema.py	77.64%	19 Missing ⚠️
src/conductor/engine/event_log.py	68.96%	9 Missing ⚠️
src/conductor/config/validator.py	96.77%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #213   +/-   ##
=======================================
  Coverage        ?   87.67%           
=======================================
  Files           ?       61           
  Lines           ?     9943           
  Branches        ?        0           
=======================================
  Hits            ?     8718           
  Misses          ?     1225           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

jrob5756

Nice work. I have a few questions/comments below

jrob5756 · 2026-05-21T17:22:10Z

                            continue

+                        # Handle notification steps (fire-and-forget visibility)
+                        if agent.type == "notification":


Parallel / for-each containment is undefined. This dispatch lives only in the sequential path; behavior inside parallel: or for_each: is unspecified and get_agent_execution_count won't give per-slot iteration counts (breaking emission_id uniqueness).

Either disallow at validation time, or thread emission through the group dispatchers + add tests.

Applied. Added validator disallowance for type: notification inside both parallel groups and for_each inline agents, matching the existing guard pattern for script, wait, terminate, and workflow steps. Error messages direct authors to place the notification step in the main agent list and route to it from the group's routes: block instead. Threading dispatch through the group executors (iteration-scoped emission_id uniqueness) is a follow-up; the conservative static guard is in commit 27006af.

jrob5756 · 2026-05-21T17:22:10Z

+                            # notifications block exists when a notification step
+                            # is present, but assert defensively at runtime so
+                            # the type narrowing is explicit for callers.
+                            assert notif_config is not None  # noqa: S101


assert is stripped under python -O — this becomes a no-op and the next line dereferences None. Make it an explicit raise.

Applied. Replaced assert notif_config is not None with an explicit raise ExecutionError(...) (commit 27006af). The ExecutionError matches the pattern used for other internal invariant violations in workflow.py.

jrob5756 · 2026-05-21T17:22:10Z

+                            # The domain notification itself — this is what
+                            # external tooling tails the notifications.jsonl
+                            # subscriber for.
+                            self._emit("notification", envelope)


Dashboard rendering for the new event types is missing. Per your discussion with Daniel about a bottom-pane tab at the workflow level — worth landing alongside this PR (or as an immediate follow-up) so --web users actually see emissions. Currently the four new event types (notification_started, notification, notification_completed, notification_failed) and the new step type aren't handled in workflow-store.ts or the Cytoscape view.

Acknowledged — deferring dashboard rendering as an immediate follow-up PR per the 'or as an immediate follow-up' suggestion. This PR's scope is the backend notification pipeline (YAML surface, executor, event emission, JSONL subscriber). The four new event types (notification_started, notification, notification_completed, notification_failed) and the new step type will need handlers in workflow-store.ts and the Cytoscape view; that's a standalone frontend change.

jrob5756 · 2026-05-21T17:22:10Z

+                            # Store an empty dict so routes can fire without
+                            # exposing notification internals as agent output
+                            # (consistent with the "no response flow" contract).
+                            self.context.store(agent.name, {})


Storing {} means {{ announce_ready.output.x }} downstream silently renders empty under accumulate mode — quiet footgun for authors who confuse a notification with a regular agent. Either document the behavior in AGENTS.md, or store a sentinel that lets templates fail loudly.

Applied documentation approach. Added a note to AGENTS.md under the Notifications section documenting that notification steps store {} in the workflow context and that downstream accumulate-mode templates referencing step_name.output.field will silently render empty — this is intentional fire-and-forget behavior, and authors who need a value to flow back should use a regular agent step instead. Commit 27006af.

jrob5756 · 2026-05-21T17:22:10Z

+            # run_id stem with the event log so the two files line up.
+            from conductor.engine.event_log import NotificationLogSubscriber
+
+            notification_log_subscriber = NotificationLogSubscriber(


Quick confirmation: on resume the dashboard already replays the main event log (dashboard.replay_events_from_jsonl at line 1823), and notification events flow through the same emitter — so old emissions will render in the dashboard on resume. The separate .notifications.jsonl is fresh per resume, which is fine for external tailers given the stable emission_id dedup. No action needed; flagging only to confirm the behavior is intentional.

Confirmed — that's intentional. Notification events flow through WorkflowEventEmitter → EventLog → notifications.jsonl subscriber on the same path as all other events. The replay_events_from_jsonl call at resume replays everything from the main event log, so past emissions will be visible in the dashboard tab on resume. The separate .notifications.jsonl is a fresh-per-resume streaming target for external tailers; emission_id dedup in consumers handles the fresh start.

jrob5756 · 2026-05-21T17:22:10Z

+        return value
+    try:
+        return json.loads(value)
+    except (json.JSONDecodeError, ValueError):


Silent fallback to the raw string means a bad template (e.g. "3.14abc" for a number field) fails one frame later with "got str" rather than "failed to coerce 'X' as number". Better author UX to raise here with the offending value.

Applied. _coerce_rendered now raises ValidationError on json.loads failure instead of silently returning the raw string, and the call site in build_envelope catches ValidationError from the coerce and re-raises with the field name for context (e.g. Payload field 'pr_id' for notification step 'bad': Failed to coerce rendered value 'not-a-number' to declared type 'number'). Commit 27006af.

jrob5756 · 2026-05-21T17:22:10Z

+            "source_agent": agent.name,
+            "subworkflow_path": list(subworkflow_path),
+            "correlation": dict(correlation),
+            "workflow_metadata": dict(workflow_metadata),


Nit: workflow_metadata feels redundant in the envelope. Consumers already get run_id + workflow + subworkflow_path and can join against the event log if they need more. Splatting the whole metadata dict into every emission bloats the wire format and couples the notification schema to whatever ad-hoc keys workflows happen to use. Consider dropping it (or opt-in allowlist).

Applied. Dropped workflow_metadata from the build_envelope signature and the envelope dict. Consumers already have run_id, workflow, and subworkflow_path to join against the event log for any workflow-level metadata they need. Commit 27006af.

jrob5756 · 2026-05-21T17:22:10Z

          effort: high
    """

+    notification: str | None = None


Nit: type: notification + notification: pr_ready reads redundant. Siblings use script→command, human_gate→dialog. Something like emit: or event: would parse more naturally — easier to rename now than after it ships.

Applied, using emit:. The field on AgentDef is now emit: str | None (schema.py, validator.py, executor, workflow.py, tests, examples, AGENTS.md all updated). The YAML surface reads type: notification + emit: pr_ready which parses naturally as 'this step emits the pr_ready notification type'. Agreed that now is the right time — commit 27006af.

…ification) Introduces a new `type: notification` step that lets workflow authors publish typed, declared *domain events* to a dedicated JSONL stream for external tooling to hook off. Distinct from execution events (`agent_started`, `route_evaluated`, ...): execution events describe what the runtime is doing; notifications describe what the workflow is asserting to the outside world. Notifications are a visibility primitive, not a control-flow primitive. They do not block, do not wait for an ack, and do not gate routing. If a decision needs to flow back into the workflow, use `human_gate`. Workflow-level config (`workflow.notifications`): - `namespace` -- dotted-identifier (defaults to slugified workflow name) - `correlation` -- workflow input keys auto-surfaced on every emission; inherited by sub-workflows (parent wins on collision so the upstream trail survives deep nesting) - `types` -- per-type `version` + `payload` schema (reusing `OutputField`) Each emission ships a fixed envelope: - `emission_id` = `<run_id>:<dotted_step_path>:<iteration>` -- stable across resume/replay so the first downstream consumer can dedupe - `schema_id` = `<namespace>.<type>@<version>` -- consumers MUST pin on this, not on `notification_type` alone - `correlation` -- first-class, not buried in metadata - `payload` -- Jinja2-rendered then type-validated Delivery (v1): rides `WorkflowEventEmitter`; `NotificationLogSubscriber` writes a dedicated `...notifications.jsonl` next to the existing event log. No webhooks, shell hooks, or model-driven emission in v1. Schema-version-bump policy documented in AGENTS.md (additive = no bump; remove/rename/retype/required = bump; consumers MUST pin on schema_id). Validation failures (undeclared type, payload field mismatch, unknown correlation key, rendered-value type mismatch) fail the workflow loudly with `ValidationError` -- the contract is the point. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PolyphonyRequiem · 2026-05-28T22:58:29Z

Rebased \eat/notifications\ onto v0.1.18 for local polyphony dogfood testing. One mechanical conflict in \schema.py\ (combined \AgentDef.type\ Literal enum to include \set/terminate/wait\ from v0.1.18 plus
otification\ from this branch, plus adjacent field and validator blocks). Conflict was fully mechanical — no overlapping logic. Pushed to PolyphonyRequiem/conductor feat/notifications. New HEAD: 7d46793.

Addresses all actionable inline comments on microsoft#213. Changes applied: - Rename AgentDef.notification -> AgentDef.emit (comment 3283084228: 'type: notification' + 'notification: pr_ready' reads redundant; 'emit: pr_ready' parses more naturally and avoids the tautology) - Disallow notification steps inside parallel groups and for_each inline agents at validation time (comment 3283084198: parallel/for-each containment undefined; disallow conservatively until threaded dispatch is implemented) - Replace assert notif_config is not None with explicit raise ExecutionError (comment 3283084205: assert is stripped under python -O) - Raise ValidationError on coerce failure instead of silently returning raw value (comment 3283084219: silent fallback on _coerce_rendered gives misleading 'got str' error one frame later; raise with the offending value) - Drop workflow_metadata from build_envelope signature and envelope dict (comment 3283084222: redundant field bloats wire format; run_id + workflow + subworkflow_path already in envelope for consumers to join on) - Document empty-output ({}) behavior in AGENTS.md (comment 3283084213: downstream accumulate-mode templates silently render empty for notification output.x; this is intentional fire-and-forget behavior, now documented) Not applied (reply-only): - Dashboard rendering for new event types (comment 3283084208): deferred as immediate follow-up per reviewer suggestion; requires TypeScript work in workflow-store.ts and Cytoscape view, out of scope for this PR - Resume dashboard replay confirmation (comment 3283084216): behavior is intentional; replied to confirm Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jrob5756 reviewed May 21, 2026

View reviewed changes

PolyphonyRequiem force-pushed the feat/notifications branch from f5397cd to 7d46793 Compare May 28, 2026 22:47

PolyphonyRequiem mentioned this pull request May 28, 2026

feat: notify_then_route — error observability for all on_error: abort paths PolyphonyRequiem/polyphony#543

Open

Uh oh!

Conversation

PolyphonyRequiem commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

YAML surface

Envelope shape

Delivery (v1)

What's in this PR

Schema-version-bump policy (documented, not enforced)

Validation

Testing

Deliberately deferred

Uh oh!

codecov-commenter commented May 19, 2026

Codecov Report

Uh oh!

jrob5756 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PolyphonyRequiem commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PolyphonyRequiem commented May 19, 2026 •

edited

Loading