feat(dialog): agent dialog mode with web dashboard support#130
Conversation
Add dialog mode for agents enabling multi-turn conversational interactions within workflows. Includes dialog gate implementation, condition evaluation, provider support (Copilot + Claude), web dashboard UI components, and comprehensive tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #130 +/- ##
=======================================
Coverage ? 85.33%
=======================================
Files ? 56
Lines ? 8109
Branches ? 0
=======================================
Hits ? 6920
Misses ? 1189
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
jrob5756
left a comment
There was a problem hiding this comment.
Cool change. A couple of recommendations to consider before merging.
| if (isRelativeFileLink(href)) { | ||
| const handleOpenInEditor = (e: React.MouseEvent) => { | ||
| e.preventDefault(); | ||
| fetch('/api/open-file', { |
There was a problem hiding this comment.
Missing endpoint — This POSTs to /api/open-file, but no such route is registered in src/conductor/web/server.py (grep open-file returns nothing). Clicks will silently 404.
Either:
- Implement the endpoint with strict validation (reject
.., resolve symlinks, enforce containment within the workflow directory), OR - Remove the link rewriting in
DialogMarkdownand render plain anchors.
Additionally, even with the endpoint, isRelativeFileLink (lines 13–19) accepts paths like ../../../../etc/passwd — server-side validation is mandatory.
There was a problem hiding this comment.
the missing endpoint is from another PR (the metadata enrichment on human gates)
There was a problem hiding this comment.
follow-up: also tightened isRelativeFileLink in 889bb65 to reject .. segments and Windows drive letters (C:\path) as defense-in-depth. when the /api/open-file endpoint lands from the metadata PR, the client side won't be the only line of defense.
There was a problem hiding this comment.
It looks like this was in #114 but that PR was closed
| try: | ||
| candidate = (base_dir / stripped).resolve() | ||
| # Security: must be within base_dir | ||
| if not str(candidate).startswith(str(base_dir.resolve())): |
There was a problem hiding this comment.
Path traversal — str(candidate).startswith(str(base_dir.resolve())) is a string-prefix check, not a path-aware containment check.
Example bypass: base_dir resolves to /foo/bar; a candidate resolving to /foo/barbaz/x.md passes the check ('/foo/barbaz/x.md'.startswith('/foo/bar') is True).
Fix (Python 3.9+):
try:
candidate.relative_to(base_dir.resolve())
except ValueError:
return Noneor candidate.is_relative_to(base_dir.resolve()).
There was a problem hiding this comment.
fixed in 889bb65 — switched to Path.is_relative_to (3.12+) against the resolved base. drops the string-prefix gotcha entirely.
| break | ||
|
|
||
| # Send to agent and get response | ||
| history.append({"role": "user", "content": user_input}) |
There was a problem hiding this comment.
History corruption on exception (CLI mode) — User input is appended to history before execute_dialog_turn. On exception, the handler at line 280–289 logs and continues without rolling back. The next iteration appends another user message, leaving two consecutive role=user entries with no agent response between them — confusing the LLM on retry.
Fix: append after successful response, or explicitly history.pop() in the exception handler.
There was a problem hiding this comment.
fixed in 889bb65. CLI loop now pops the user turn from history in the exception branch before continue. result.messages (the user-visible transcript) is left intact — orphan user msg without an agent reply is the honest representation of what happened. added test_cli_exception_pops_user_history covering the success/fail/success sequence.
| # Dialog loop | ||
| while True: | ||
| # Send to agent and get response | ||
| history.append({"role": "user", "content": user_input}) |
There was a problem hiding this comment.
Same history corruption pattern in web mode — Same issue as the CLI loop above. Line 431 appends user_input before the try, but on exception the new user_input from wait_for_dialog_message (line 460) is appended again at the next iteration's line 431. The failed user message stays in history forever.
Fix: pop the failed entry in the exception handler before waiting for the next message, or restructure to append after success.
There was a problem hiding this comment.
fixed in 889bb65. same pop-on-exception in the web _web_handle_dialog loop. also caught a related bug while in there — the READY-decline branch was double-appending approval to history (once explicitly, once at the loop top), creating two consecutive user turns even on the happy path. dropped the explicit append. added TestWebDialogFlow with 5 tests covering decline, happy path, exception recovery, READY-decline-no-duplicate-history, and READY-approval.
| ) | ||
|
|
||
| # Check if agent proposed completion | ||
| if "[READY_TO_CONTINUE]" in agent_response: |
There was a problem hiding this comment.
Suggestion: [READY_TO_CONTINUE] is user-injectable — A user could include the literal string in their message; if the agent quotes it back, the dialog ends prematurely. Consider:
- Stripping the marker from user input before sending to the agent, OR
- Requiring it to be at start/end of the response, not anywhere in it.
There was a problem hiding this comment.
fixed in 889bb65. _extract_ready_marker now uses rstrip().endswith() so the marker is only treated as terminal when it's actually at the end. mid-message marker is left as literal text, agent's reply is stored cleanly in result.messages with the marker stripped. TestReadyMarkerTerminalOnly covers both cases.
| sendGateResponse: (agentName: string, selectedValue: string, additionalInput?: Record<string, string>) => void; | ||
| // Dialog state | ||
| activeDialog: { agentName: string; dialogId: string } | null; | ||
| dialogEngaged: boolean; |
There was a problem hiding this comment.
replayState doesn't reset these new fields — activeDialog and dialogEngaged are declared here but replayState (around line 395, outside this diff) doesn't reset them in its explicit reset block. The spread ...state carries them over from the prior workflow, so late-joiners or re-replays can show a stale dialog overlay.
Fix: add to the reset object in replayState:
activeDialog: null,
dialogEngaged: false,There was a problem hiding this comment.
fixed in 889bb65. all three replay-state mutators (replayState, setReplayMode, setReplayPosition) now reset activeDialog: null, dialogEngaged: false. you were right that setReplayPosition was the same hazard — easy to scrub through replay and end up with a stale dialog panel.
| set((state) => { | ||
| const nodes = { ...state.nodes }; | ||
| if (nodes[agentName]) { | ||
| nodes[agentName] = { ...nodes[agentName]!, dialog_awaiting_response: true }; |
There was a problem hiding this comment.
Optimistic UI race — The optimistic dialog_awaiting_response: true is a separate set() from the eventual dialog_message handler that flips it back to false. If the agent response arrives before this update is committed, the UI can stick in 'awaiting' until the next message. Low likelihood, but easily fixed by either:
- Removing the optimistic update (let the echo from the server set it), OR
- Using a single atomic
set()keyed off the send action.
There was a problem hiding this comment.
adopted in 889bb65 — dropped the optimistic set in sendDialogMessage. the dialog_message event handler is now the single source of truth: sets dialog_awaiting_response = true on role === 'user' echo, false on role === 'agent'. server round-trip is fast enough that the perceived snappiness loss is negligible, and the race is just gone.
| assert result.messages == [] | ||
| assert result.user_dismissed is False | ||
| assert result.user_declined is False | ||
| assert result.agent_proposed_continue is False |
There was a problem hiding this comment.
Zero coverage for web dialog flow — _web_handle_dialog() in gates/dialog.py:337-557 is 220 lines with no tests. Missing coverage:
wait_for_dialog_messagehappy pathdialog_declinemessage handling (lines 386, 457, 495, 527)- Web-mode agent approval flow (lines 494–513)
- Exception/retry path (lines 439–474)
All existing tests use CLI mode with web_dashboard=None. Please add a test_web_dialog.py (or extend this file) mocking WebDashboard.wait_for_dialog_message.
There was a problem hiding this comment.
fixed in 889bb65. added TestWebDialogFlow in test_dialog.py with 5 tests covering the web flow end-to-end: engagement decline, happy path, provider exception → recovery (no orphan user in history), READY-decline (asserts no duplicate history append), and READY-approval. plus a wait_for_dialog_message fix in web/server.py to filter on dialog_id not just agent_name — caught while writing the tests.
| # Verify the system prompt contains the trigger criteria | ||
| call_args = provider.execute_dialog_turn.call_args | ||
| assert "Trigger when confused" in call_args.kwargs["system_prompt"] | ||
| assert "my_agent" in call_args.kwargs["user_message"] |
There was a problem hiding this comment.
No integration test for dialog → re-execute — _handle_dialog() in workflow.py:561-638 formats the dialog transcript into re-execution guidance and re-runs the agent. The end-to-end flow (evaluate → dialog → re-execute with transcript appended → return new output) is not exercised — existing tests mock the DialogHandler.
Please add an integration test that runs a real workflow with dialog enabled and asserts the agent re-executes with the transcript present in its guidance/context.
There was a problem hiding this comment.
fixed in 889bb65. new tests/test_engine/test_dialog_integration.py runs the real WorkflowEngine end-to-end with a CopilotProvider(mock_handler=...) plus mocked execute_dialog_turn and patched dialog-handler interactive prompts. asserts (a) the agent runs twice, (b) the second run's prompt contains --- DIALOG WITH USER ---, --- END DIALOG ---, and the user's actual message text.
| is_retryable=False, | ||
| ) from e | ||
|
|
||
| async def execute_dialog_turn( |
There was a problem hiding this comment.
No tests for execute_dialog_turn on either provider — Both providers gained ~45 lines of new code with different message-format strategies (Copilot builds a User: …\n\nAssistant: … string, Claude uses a message list). Per AGENTS.md provider parity rules, these need parallel test coverage.
Please add tests in tests/test_providers/test_copilot.py and tests/test_providers/test_claude.py covering: empty history, multi-turn history, model override, error propagation, and equivalent output for the same logical input.
There was a problem hiding this comment.
fixed in 889bb65. added 4 tests each to test_claude.py and test_copilot.py (TestClaudeExecuteDialogTurn / TestCopilotExecuteDialogTurn) covering empty history, multi-turn history serialization, model override, and error wrapping. for Copilot the test mocks the create_session + event-driven flow and asserts the prompt is built correctly with User: / Assistant: blocks.
Real bugs: - linkify: use Path.is_relative_to instead of str startswith for path containment (true path-aware boundary, not string-prefix) - dialog (CLI + web): pop user turn from provider history on exception so the next iteration doesn't append a second consecutive user turn - dialog (web): drop duplicate history.append in READY-decline branch (loop top already appends the approval as the next user turn) - dialog: treat [READY_TO_CONTINUE] as a terminal token only (rstrip().endswith) so user-typed marker mid-message can't trick the agent into ending the dialog; strip the marker before storing in transcript - dialog_evaluator: only strip last code-fence line if it's actually a closing fence (unterminated fences no longer drop the JSON line) - web/server: filter wait_for_dialog_message by dialog_id as well as agent_name so messages can't cross dialog boundaries - workflow-store: reset activeDialog/dialogEngaged in setReplayPosition too (was only in replayState/setReplayMode) Suggestions adopted: - evaluator: append \n…[truncated] marker to truncated outputs so the evaluator (and future inspection) knows context was cut - workflow-store sendDialogMessage: drop optimistic dialog_awaiting_response set; the dialog_message event handler now flips it on user/agent role echo (single source of truth) - DialogDetail isRelativeFileLink: defense in depth — reject .. segments and Windows drive letters before the eventual /api/open-file lands Tests: - web dialog flow tests (engagement decline, happy path, exception recovery, READY-decline-no-duplicate-history, READY-approval) - READY marker terminal-only tests - CLI exception history-pop test - evaluator truncation marker tests - evaluator unterminated/terminated code-fence tests - Claude provider execute_dialog_turn parity tests (empty/multi-turn history, model override, error wrapping) - Copilot provider execute_dialog_turn parity tests (prompt construction via mocked session, model override, error wrapping) - end-to-end integration test: dialog triggers → user converses → agent re-executes with transcript in guidance section Note: TS source changes only — frontend static assets need a CI rebuild. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
review pass complete in 889bb65. summary by category: real bugs (fixed):
suggestions adopted:
tests added:
169 dialog/evaluator/provider/gate tests pass. ruff clean. ty has only pre-existing termios platform diagnostics. note: TS source only — frontend static assets need a CI rebuild before this is mergeable. thanks for the thorough pass — the history-corruption catches were genuinely subtle and the test-coverage call-outs forced the integration test that I'd been deferring. |
The DialogDetail component intercepted relative-file links in agent markdown and POSTed to /api/open-file, but that endpoint was meant to land in microsoft#114 (closed). Clicks silently 404 and leave users staring at links that look interactive but do nothing. Drop the link rewriting (and the now-unused isRelativeFileLink guard) and render every link as a plain anchor with target=_blank + rel=noopener. External URLs still open in a new tab; relative paths become honest broken links instead of fake interactive ones. If/when the file-open endpoint comes back in a future PR, the rewriting can return alongside the matching server-side validation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fix(dialog): drop dead /api/open-file call from dialog markdown
Keep both dialog features (from PR microsoft#130) and breadcrumb/subworkflow features (from main via microsoft#113). Conflict resolution: - schema.py: keep both dialog + max_depth validations for human_gate/script - claude.py: keep _max_input_cache reset in close() AND execute_dialog_turn method - DetailPanel.tsx: import both DialogDetail and SubworkflowDetail - workflow-store.ts: include both dialog state + subworkflow state in all reset blocks - events.ts: include both Dialog* and Subworkflow* event interfaces - test_claude.py / test_copilot.py: keep both dialog turn + max prompt tokens test classes - static assets: rebuilt from merged source Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
merge: resolve conflicts with main
jrob5756
left a comment
There was a problem hiding this comment.
All 12 review comments addressed. Tests pass (157), lint/format clean, merged with latest main. LGTM 🚀
…, #121-#123, #125, #129, #130, #131, #139, #141-#144, #146) CHANGELOG: add 6 newer PRs (#119, #121, #122, #123, #125, #113, #130, #131, #141, #146) to [Unreleased] alongside the previously documented batch. docs/workflow-syntax.md: - Add metadata + instructions fields to the workflow configuration block. - Add input_mapping and max_depth to Sub-Workflow Steps; correct stale claims that circular references are rejected and that workflow steps cannot be used in for_each groups. - Add 'Sub-workflows in for_each groups' subsection with example. - Add JSON stdout auto-parsing note + example to Script Steps output section. - Add type-appropriate zero values table to Workflow Inputs. - Add 'Workflow Metadata Variables' subsection covering workflow.dir, workflow.file, workflow.name. - Update on_start hook context list to include the new workflow.dir/file vars. docs/cli-reference.md: - Document --metadata/-m, --workspace-instructions, and --instructions flags on conductor run. - Add 'Metadata and Instructions' examples block. - Update conductor validate to describe the new template-reference error/warning checks added in #125. docs/providers/claude.md, docs/providers/comparison.md: - Replace stale 'All models support a 200K token context window' / '200K (all models)' claims with notes that the dashboard now sources context_window_max from each provider's SDK at runtime (#144). README.md: - Refresh the Features list to mention sub-workflow composition, dialog mode, workspace instructions, breadcrumb navigation, and the enhanced validate behavior. - Add --metadata, --workspace-instructions, --instructions to the conductor run options table. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* docs: changelog + doc updates for unreleased PRs (#100, #109-#111, #119, #121-#123, #125, #129, #130, #131, #139, #141-#144, #146) CHANGELOG: add 6 newer PRs (#119, #121, #122, #123, #125, #113, #130, #131, #141, #146) to [Unreleased] alongside the previously documented batch. docs/workflow-syntax.md: - Add metadata + instructions fields to the workflow configuration block. - Add input_mapping and max_depth to Sub-Workflow Steps; correct stale claims that circular references are rejected and that workflow steps cannot be used in for_each groups. - Add 'Sub-workflows in for_each groups' subsection with example. - Add JSON stdout auto-parsing note + example to Script Steps output section. - Add type-appropriate zero values table to Workflow Inputs. - Add 'Workflow Metadata Variables' subsection covering workflow.dir, workflow.file, workflow.name. - Update on_start hook context list to include the new workflow.dir/file vars. docs/cli-reference.md: - Document --metadata/-m, --workspace-instructions, and --instructions flags on conductor run. - Add 'Metadata and Instructions' examples block. - Update conductor validate to describe the new template-reference error/warning checks added in #125. docs/providers/claude.md, docs/providers/comparison.md: - Replace stale 'All models support a 200K token context window' / '200K (all models)' claims with notes that the dashboard now sources context_window_max from each provider's SDK at runtime (#144). README.md: - Refresh the Features list to mention sub-workflow composition, dialog mode, workspace instructions, breadcrumb navigation, and the enhanced validate behavior. - Add --metadata, --workspace-instructions, --instructions to the conductor run options table. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: bump version to 0.1.11 and changelog #148 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Add dialog mode for agents, enabling multi-turn conversational interactions within workflows.
Changes
Core
Provider Support
Web Dashboard
Tests
Docs & Examples