feat(dialog): agent dialog mode with web dashboard support by PolyphonyRequiem · Pull Request #130 · microsoft/conductor

PolyphonyRequiem · 2026-05-01T20:52:26Z

Summary

Add dialog mode for agents, enabling multi-turn conversational interactions within workflows.

Changes

Core

Dialog gate (\gates/dialog.py) — full dialog gate implementation supporting multi-turn conversations
Dialog evaluator (\�ngine/dialog_evaluator.py) — condition evaluation for dialog transitions
Schema — new dialog-related Pydantic models in \config/schema.py\
Workflow engine — dialog execution integration in \�ngine/workflow.py\
Linkify utility (\�xecutor/linkify.py) — link rendering for dialog content

Provider Support

Both Copilot and Claude providers updated with dialog support

Web Dashboard

New React components: \DialogDetail, \DialogEngagementPrompt, \DialogOverlay\
Updated workflow store, event types, detail panel, and layout
Rebuilt static assets

Tests

\ est_dialog_schema.py\ — schema validation tests
\ est_dialog_evaluator.py\ — evaluator unit tests
\ est_dialog.py\ — dialog gate integration tests

Docs & Examples

\workflow-syntax.md\ — dialog mode documentation
\�xamples/dialog-mode.yaml\ and \�xamples/dialog-test.yaml\

Add dialog mode for agents enabling multi-turn conversational interactions within workflows. Includes dialog gate implementation, condition evaluation, provider support (Copilot + Claude), web dashboard UI components, and comprehensive tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

codecov-commenter · 2026-05-01T20:55:44Z

Codecov Report

❌ Patch coverage is 81.28898% with 90 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@b3e3485). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/conductor/executor/linkify.py	61.11%	35 Missing ⚠️
src/conductor/gates/dialog.py	85.90%	32 Missing ⚠️
src/conductor/engine/workflow.py	77.14%	8 Missing ⚠️
src/conductor/web/server.py	22.22%	7 Missing ⚠️
src/conductor/providers/copilot.py	89.74%	4 Missing ⚠️
src/conductor/engine/dialog_evaluator.py	96.07%	2 Missing ⚠️
src/conductor/providers/base.py	50.00%	1 Missing ⚠️
src/conductor/providers/claude.py	94.11%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #130   +/-   ##
=======================================
  Coverage        ?   85.33%           
=======================================
  Files           ?       56           
  Lines           ?     8109           
  Branches        ?        0           
=======================================
  Hits            ?     6920           
  Misses          ?     1189           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

jrob5756

Cool change. A couple of recommendations to consider before merging.

jrob5756 · 2026-05-04T16:29:42Z

+            if (isRelativeFileLink(href)) {
+              const handleOpenInEditor = (e: React.MouseEvent) => {
+                e.preventDefault();
+                fetch('/api/open-file', {


Missing endpoint — This POSTs to /api/open-file, but no such route is registered in src/conductor/web/server.py (grep open-file returns nothing). Clicks will silently 404.

Either:

Implement the endpoint with strict validation (reject .., resolve symlinks, enforce containment within the workflow directory), OR

Remove the link rewriting in DialogMarkdown and render plain anchors.

Additionally, even with the endpoint, isRelativeFileLink (lines 13–19) accepts paths like ../../../../etc/passwd — server-side validation is mandatory.

the missing endpoint is from another PR (the metadata enrichment on human gates)

follow-up: also tightened isRelativeFileLink in 889bb65 to reject .. segments and Windows drive letters (C:\path) as defense-in-depth. when the /api/open-file endpoint lands from the metadata PR, the client side won't be the only line of defense.

It looks like this was in #114 but that PR was closed

jrob5756 · 2026-05-04T16:29:42Z

+        try:
+            candidate = (base_dir / stripped).resolve()
+            # Security: must be within base_dir
+            if not str(candidate).startswith(str(base_dir.resolve())):


Path traversal — str(candidate).startswith(str(base_dir.resolve())) is a string-prefix check, not a path-aware containment check.

Example bypass: base_dir resolves to /foo/bar; a candidate resolving to /foo/barbaz/x.md passes the check ('/foo/barbaz/x.md'.startswith('/foo/bar') is True).

Fix (Python 3.9+):

try: candidate.relative_to(base_dir.resolve()) except ValueError: return None

or candidate.is_relative_to(base_dir.resolve()).

fixed in 889bb65 — switched to Path.is_relative_to (3.12+) against the resolved base. drops the string-prefix gotcha entirely.

jrob5756 · 2026-05-04T16:29:42Z

+                break
+
+            # Send to agent and get response
+            history.append({"role": "user", "content": user_input})


History corruption on exception (CLI mode) — User input is appended to history before execute_dialog_turn. On exception, the handler at line 280–289 logs and continues without rolling back. The next iteration appends another user message, leaving two consecutive role=user entries with no agent response between them — confusing the LLM on retry.

Fix: append after successful response, or explicitly history.pop() in the exception handler.

fixed in 889bb65. CLI loop now pops the user turn from history in the exception branch before continue. result.messages (the user-visible transcript) is left intact — orphan user msg without an agent reply is the honest representation of what happened. added test_cli_exception_pops_user_history covering the success/fail/success sequence.

jrob5756 · 2026-05-04T16:29:42Z

+        # Dialog loop
+        while True:
+            # Send to agent and get response
+            history.append({"role": "user", "content": user_input})


Same history corruption pattern in web mode — Same issue as the CLI loop above. Line 431 appends user_input before the try, but on exception the new user_input from wait_for_dialog_message (line 460) is appended again at the next iteration's line 431. The failed user message stays in history forever.

Fix: pop the failed entry in the exception handler before waiting for the next message, or restructure to append after success.

fixed in 889bb65. same pop-on-exception in the web _web_handle_dialog loop. also caught a related bug while in there — the READY-decline branch was double-appending approval to history (once explicitly, once at the loop top), creating two consecutive user turns even on the happy path. dropped the explicit append. added TestWebDialogFlow with 5 tests covering decline, happy path, exception recovery, READY-decline-no-duplicate-history, and READY-approval.

jrob5756 · 2026-05-04T16:29:42Z

+            )
+
+            # Check if agent proposed completion
+            if "[READY_TO_CONTINUE]" in agent_response:


Suggestion: [READY_TO_CONTINUE] is user-injectable — A user could include the literal string in their message; if the agent quotes it back, the dialog ends prematurely. Consider:

Stripping the marker from user input before sending to the agent, OR

Requiring it to be at start/end of the response, not anywhere in it.

fixed in 889bb65. _extract_ready_marker now uses rstrip().endswith() so the marker is only treated as terminal when it's actually at the end. mid-message marker is left as literal text, agent's reply is stored cleanly in result.messages with the marker stripped. TestReadyMarkerTerminalOnly covers both cases.

jrob5756 · 2026-05-04T16:29:42Z

  sendGateResponse: (agentName: string, selectedValue: string, additionalInput?: Record<string, string>) => void;
+  // Dialog state
+  activeDialog: { agentName: string; dialogId: string } | null;
+  dialogEngaged: boolean;


replayState doesn't reset these new fields — activeDialog and dialogEngaged are declared here but replayState (around line 395, outside this diff) doesn't reset them in its explicit reset block. The spread ...state carries them over from the prior workflow, so late-joiners or re-replays can show a stale dialog overlay.

Fix: add to the reset object in replayState:

activeDialog: null, dialogEngaged: false,

fixed in 889bb65. all three replay-state mutators (replayState, setReplayMode, setReplayPosition) now reset activeDialog: null, dialogEngaged: false. you were right that setReplayPosition was the same hazard — easy to scrub through replay and end up with a stale dialog panel.

jrob5756 · 2026-05-04T16:29:42Z

+      set((state) => {
+        const nodes = { ...state.nodes };
+        if (nodes[agentName]) {
+          nodes[agentName] = { ...nodes[agentName]!, dialog_awaiting_response: true };


Optimistic UI race — The optimistic dialog_awaiting_response: true is a separate set() from the eventual dialog_message handler that flips it back to false. If the agent response arrives before this update is committed, the UI can stick in 'awaiting' until the next message. Low likelihood, but easily fixed by either:

Removing the optimistic update (let the echo from the server set it), OR

Using a single atomic set() keyed off the send action.

adopted in 889bb65 — dropped the optimistic set in sendDialogMessage. the dialog_message event handler is now the single source of truth: sets dialog_awaiting_response = true on role === 'user' echo, false on role === 'agent'. server round-trip is fast enough that the perceived snappiness loss is negligible, and the race is just gone.

jrob5756 · 2026-05-04T16:29:42Z

+        assert result.messages == []
+        assert result.user_dismissed is False
+        assert result.user_declined is False
+        assert result.agent_proposed_continue is False


Zero coverage for web dialog flow — _web_handle_dialog() in gates/dialog.py:337-557 is 220 lines with no tests. Missing coverage:

wait_for_dialog_message happy path

dialog_decline message handling (lines 386, 457, 495, 527)

Web-mode agent approval flow (lines 494–513)

Exception/retry path (lines 439–474)

All existing tests use CLI mode with web_dashboard=None. Please add a test_web_dialog.py (or extend this file) mocking WebDashboard.wait_for_dialog_message.

fixed in 889bb65. added TestWebDialogFlow in test_dialog.py with 5 tests covering the web flow end-to-end: engagement decline, happy path, provider exception → recovery (no orphan user in history), READY-decline (asserts no duplicate history append), and READY-approval. plus a wait_for_dialog_message fix in web/server.py to filter on dialog_id not just agent_name — caught while writing the tests.

jrob5756 · 2026-05-04T16:29:42Z

+        # Verify the system prompt contains the trigger criteria
+        call_args = provider.execute_dialog_turn.call_args
+        assert "Trigger when confused" in call_args.kwargs["system_prompt"]
+        assert "my_agent" in call_args.kwargs["user_message"]


No integration test for dialog → re-execute — _handle_dialog() in workflow.py:561-638 formats the dialog transcript into re-execution guidance and re-runs the agent. The end-to-end flow (evaluate → dialog → re-execute with transcript appended → return new output) is not exercised — existing tests mock the DialogHandler.

Please add an integration test that runs a real workflow with dialog enabled and asserts the agent re-executes with the transcript present in its guidance/context.

fixed in 889bb65. new tests/test_engine/test_dialog_integration.py runs the real WorkflowEngine end-to-end with a CopilotProvider(mock_handler=...) plus mocked execute_dialog_turn and patched dialog-handler interactive prompts. asserts (a) the agent runs twice, (b) the second run's prompt contains --- DIALOG WITH USER ---, --- END DIALOG ---, and the user's actual message text.

jrob5756 · 2026-05-04T16:29:42Z

                is_retryable=False,
            ) from e

+    async def execute_dialog_turn(


No tests for execute_dialog_turn on either provider — Both providers gained ~45 lines of new code with different message-format strategies (Copilot builds a User: …\n\nAssistant: … string, Claude uses a message list). Per AGENTS.md provider parity rules, these need parallel test coverage.

Please add tests in tests/test_providers/test_copilot.py and tests/test_providers/test_claude.py covering: empty history, multi-turn history, model override, error propagation, and equivalent output for the same logical input.

fixed in 889bb65. added 4 tests each to test_claude.py and test_copilot.py (TestClaudeExecuteDialogTurn / TestCopilotExecuteDialogTurn) covering empty history, multi-turn history serialization, model override, and error wrapping. for Copilot the test mocks the create_session + event-driven flow and asserts the prompt is built correctly with User: / Assistant: blocks.

Real bugs: - linkify: use Path.is_relative_to instead of str startswith for path containment (true path-aware boundary, not string-prefix) - dialog (CLI + web): pop user turn from provider history on exception so the next iteration doesn't append a second consecutive user turn - dialog (web): drop duplicate history.append in READY-decline branch (loop top already appends the approval as the next user turn) - dialog: treat [READY_TO_CONTINUE] as a terminal token only (rstrip().endswith) so user-typed marker mid-message can't trick the agent into ending the dialog; strip the marker before storing in transcript - dialog_evaluator: only strip last code-fence line if it's actually a closing fence (unterminated fences no longer drop the JSON line) - web/server: filter wait_for_dialog_message by dialog_id as well as agent_name so messages can't cross dialog boundaries - workflow-store: reset activeDialog/dialogEngaged in setReplayPosition too (was only in replayState/setReplayMode) Suggestions adopted: - evaluator: append \n…[truncated] marker to truncated outputs so the evaluator (and future inspection) knows context was cut - workflow-store sendDialogMessage: drop optimistic dialog_awaiting_response set; the dialog_message event handler now flips it on user/agent role echo (single source of truth) - DialogDetail isRelativeFileLink: defense in depth — reject .. segments and Windows drive letters before the eventual /api/open-file lands Tests: - web dialog flow tests (engagement decline, happy path, exception recovery, READY-decline-no-duplicate-history, READY-approval) - READY marker terminal-only tests - CLI exception history-pop test - evaluator truncation marker tests - evaluator unterminated/terminated code-fence tests - Claude provider execute_dialog_turn parity tests (empty/multi-turn history, model override, error wrapping) - Copilot provider execute_dialog_turn parity tests (prompt construction via mocked session, model override, error wrapping) - end-to-end integration test: dialog triggers → user converses → agent re-executes with transcript in guidance section Note: TS source changes only — frontend static assets need a CI rebuild. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PolyphonyRequiem · 2026-05-04T21:20:07Z

review pass complete in 889bb65. summary by category:

real bugs (fixed):

linkify.py path traversal: Path.is_relative_to instead of str.startswith
dialog.py CLI + web: pop user turn from provider history on exception (no more orphan-user → consecutive-user-turn corruption)
dialog.py web READY-decline: dropped duplicate history.append (loop top already does it)
dialog_evaluator.py code-fence parser: only strip last line if it's an actual closing fence
workflow-store.ts replay: setReplayPosition now resets dialog state too (was missing)
web/server.py wait_for_dialog_message: filter on dialog_id not just agent_name (caught while writing tests)

suggestions adopted:

READY_TO_CONTINUE: terminal-only via rstrip().endswith(), marker stripped from stored transcript
evaluator truncation: append \n…[truncated] marker so the evaluator knows context was cut
optimistic UI in sendDialogMessage: dropped; dialog_message event handler is now single source of truth for dialog_awaiting_response
DialogDetail.tsx: defense-in-depth — isRelativeFileLink rejects .. and Windows drive letters

tests added:

TestWebDialogFlow (5): engagement decline, happy path, exception recovery, READY-decline-no-duplicate-history, READY-approval
TestDialogHandlerExceptionRecovery: 3-call provider sequence asserting no consecutive user turns
TestReadyMarkerTerminalOnly: mid-message vs terminal marker
TestEvaluatorTruncation: end-to-end 5500-char round-trip
test_dialog_integration.py: full WorkflowEngine end-to-end with dialog → re-execute, asserts transcript appears in re-execution guidance
TestClaudeExecuteDialogTurn + TestCopilotExecuteDialogTurn (4 each): empty/multi-turn history, model override, error wrapping
code-fence parser: terminated + unterminated cases

169 dialog/evaluator/provider/gate tests pass. ruff clean. ty has only pre-existing termios platform diagnostics.

note: TS source only — frontend static assets need a CI rebuild before this is mergeable.

thanks for the thorough pass — the history-corruption catches were genuinely subtle and the test-coverage call-outs forced the integration test that I'd been deferring.

The DialogDetail component intercepted relative-file links in agent markdown and POSTed to /api/open-file, but that endpoint was meant to land in microsoft#114 (closed). Clicks silently 404 and leave users staring at links that look interactive but do nothing. Drop the link rewriting (and the now-unused isRelativeFileLink guard) and render every link as a plain anchor with target=_blank + rel=noopener. External URLs still open in a new tab; relative paths become honest broken links instead of fake interactive ones. If/when the file-open endpoint comes back in a future PR, the rewriting can return alongside the matching server-side validation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix(dialog): drop dead /api/open-file call from dialog markdown

Keep both dialog features (from PR microsoft#130) and breadcrumb/subworkflow features (from main via microsoft#113). Conflict resolution: - schema.py: keep both dialog + max_depth validations for human_gate/script - claude.py: keep _max_input_cache reset in close() AND execute_dialog_turn method - DetailPanel.tsx: import both DialogDetail and SubworkflowDetail - workflow-store.ts: include both dialog state + subworkflow state in all reset blocks - events.ts: include both Dialog* and Subworkflow* event interfaces - test_claude.py / test_copilot.py: keep both dialog turn + max prompt tokens test classes - static assets: rebuilt from merged source Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

merge: resolve conflicts with main

jrob5756

All 12 review comments addressed. Tests pass (157), lint/format clean, merged with latest main. LGTM 🚀

…, #121-#123, #125, #129, #130, #131, #139, #141-#144, #146) CHANGELOG: add 6 newer PRs (#119, #121, #122, #123, #125, #113, #130, #131, #141, #146) to [Unreleased] alongside the previously documented batch. docs/workflow-syntax.md: - Add metadata + instructions fields to the workflow configuration block. - Add input_mapping and max_depth to Sub-Workflow Steps; correct stale claims that circular references are rejected and that workflow steps cannot be used in for_each groups. - Add 'Sub-workflows in for_each groups' subsection with example. - Add JSON stdout auto-parsing note + example to Script Steps output section. - Add type-appropriate zero values table to Workflow Inputs. - Add 'Workflow Metadata Variables' subsection covering workflow.dir, workflow.file, workflow.name. - Update on_start hook context list to include the new workflow.dir/file vars. docs/cli-reference.md: - Document --metadata/-m, --workspace-instructions, and --instructions flags on conductor run. - Add 'Metadata and Instructions' examples block. - Update conductor validate to describe the new template-reference error/warning checks added in #125. docs/providers/claude.md, docs/providers/comparison.md: - Replace stale 'All models support a 200K token context window' / '200K (all models)' claims with notes that the dashboard now sources context_window_max from each provider's SDK at runtime (#144). README.md: - Refresh the Features list to mention sub-workflow composition, dialog mode, workspace instructions, breadcrumb navigation, and the enhanced validate behavior. - Add --metadata, --workspace-instructions, --instructions to the conductor run options table. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: changelog + doc updates for unreleased PRs (#100, #109-#111, #119, #121-#123, #125, #129, #130, #131, #139, #141-#144, #146) CHANGELOG: add 6 newer PRs (#119, #121, #122, #123, #125, #113, #130, #131, #141, #146) to [Unreleased] alongside the previously documented batch. docs/workflow-syntax.md: - Add metadata + instructions fields to the workflow configuration block. - Add input_mapping and max_depth to Sub-Workflow Steps; correct stale claims that circular references are rejected and that workflow steps cannot be used in for_each groups. - Add 'Sub-workflows in for_each groups' subsection with example. - Add JSON stdout auto-parsing note + example to Script Steps output section. - Add type-appropriate zero values table to Workflow Inputs. - Add 'Workflow Metadata Variables' subsection covering workflow.dir, workflow.file, workflow.name. - Update on_start hook context list to include the new workflow.dir/file vars. docs/cli-reference.md: - Document --metadata/-m, --workspace-instructions, and --instructions flags on conductor run. - Add 'Metadata and Instructions' examples block. - Update conductor validate to describe the new template-reference error/warning checks added in #125. docs/providers/claude.md, docs/providers/comparison.md: - Replace stale 'All models support a 200K token context window' / '200K (all models)' claims with notes that the dashboard now sources context_window_max from each provider's SDK at runtime (#144). README.md: - Refresh the Features list to mention sub-workflow composition, dialog mode, workspace instructions, breadcrumb navigation, and the enhanced validate behavior. - Add --metadata, --workspace-instructions, --instructions to the conductor run options table. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: bump version to 0.1.11 and changelog #148 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jrob5756 reviewed May 4, 2026

View reviewed changes

PolyphonyRequiem mentioned this pull request May 4, 2026

fix(dialog): drop dead /api/open-file call from dialog markdown PolyphonyRequiem/conductor#4

Merged

PolyphonyRequiem and others added 3 commits May 4, 2026 15:47

Merge pull request #4 from PolyphonyRequiem/fix/dialog-open-file

94b446c

fix(dialog): drop dead /api/open-file call from dialog markdown

Merge pull request #5 from PolyphonyRequiem/merge/main-into-dialog

664a0c4

merge: resolve conflicts with main

jrob5756 approved these changes May 4, 2026

View reviewed changes

jrob5756 merged commit bf0e8e9 into microsoft:main May 4, 2026
7 checks passed

PolyphonyRequiem deleted the feat/agent-dialog-mode-clean branch May 5, 2026 00:03

jrob5756 mentioned this pull request May 5, 2026

docs: changelog + doc updates for unreleased PRs #147

Merged

3 tasks

Conversation

PolyphonyRequiem commented May 1, 2026

Summary

Changes

Core

Provider Support

Web Dashboard

Tests

Docs & Examples

Uh oh!

codecov-commenter commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jrob5756 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PolyphonyRequiem commented May 4, 2026

Uh oh!

jrob5756 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented May 1, 2026 •

edited

Loading