fix(engine): coerce Python literal "True"/"False"/"None" in workflow output by PolyphonyRequiem · Pull Request #139 · microsoft/conductor

PolyphonyRequiem · 2026-05-03T03:33:21Z

Problem

Workflow output templates pass through _maybe_parse_json (engine/workflow.py:3398) to convert JSON-shaped strings back into native Python types. The function only recognized lowercase JSON literals (true / false / null).

In practice, workflow author expressions like:

output:
  matched: "{{ left == right }}"
  done:    "{{ count >= threshold }}"

render their bool results via Jinja's default str(), producing the strings "True" / "False". These then survived _maybe_parse_json unchanged — as truthy non-empty strings — so downstream route when: clauses comparing them against true / false silently misbehaved. The string "False" is truthy and == true is also false, so behavior was hard to reason about and harder to spot.

A {{ none }} expression has the same shape: renders as "None", survives as a string.

Fix

Recognize the three Python-literal forms ("True" / "False" / "None") explicitly before the existing JSON-literal check. Lowercase JSON literals continue to coerce as before.

Three lines of behavior change in _maybe_parse_json:

if stripped == "True":
    return True
if stripped == "False":
    return False
if stripped == "None":
    return None

Tests

Added 3 cases to TestWorkflowEngineOutputTemplates:

test_output_template_python_bool_literals — verifies {{ a == b }} / {{ a != b }} produce native bool.
test_output_template_python_none_literal — verifies {{ none }} produces native None.
test_output_template_lowercase_json_literals_still_work — regression check for true / false / null.

All 77 tests in test_engine/test_workflow.py pass; ruff clean.

Why this matters

The mismatch between Jinja's str(bool) output and the JSON-literal recognition list was a stable footgun: workflows lint clean, validate clean, run without errors, and silently take wrong route branches. Even authors who knew about it had to reach for awkward workarounds like {{ (a == b) | string | lower }} or {{ 1 if a == b else 0 }}.

This patch is intentionally small and additive — no behavior change for any input that already coerced correctly. Happy to adjust scope if you'd prefer a different approach (e.g. coerce at template-render time rather than at output-map time).

…output Workflow output templates pass through `_maybe_parse_json` to convert JSON-shaped strings back into native types. Previously this only recognized lowercase JSON literals (`true`/`false`/`null`). Jinja expressions like `{{ a == b }}` render Python bool via `str()`, producing `"True"` / `"False"`, which then survived as truthy non-empty strings downstream. Route `when:` clauses comparing such values against `true` / `false` silently misbehaved. Add explicit handling for the three Python literal forms before the existing JSON parse path. Lowercase JSON literals continue to work (regression covered). Tests: 3 new cases under `TestWorkflowEngineOutputTemplates` covering `True`/`False` from `==` / `!=` expressions, `None` from `{{ none }}`, and a regression check for the lowercase forms. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Two existing integration tests asserted the broken behavior they were exercising: - test_examples.py:214: `result["syntax_passed"] == "True"` - test_parallel_workflows.py:410: `result["success"] == "True"` Both had inline comments acknowledging the workaround (`# Templates return strings`, `# Boolean rendered as string`). With this PR's fix, those values now coerce to native bool. Update the assertions accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PolyphonyRequiem · 2026-05-03T03:37:57Z

The previous CI run surfaced 2 failing tests. Both were asserting the broken behavior this PR fixes (with inline comments acknowledging the workaround):

test_examples.py:214 — result["syntax_passed"] == "True" # Templates return strings
test_parallel_workflows.py:410 — result["success"] == "True" # Boolean rendered as string

Pushed 4a61195 updating both to assert native bool (is True).

This is also a useful signal: those tests effectively codified the bug as expected behavior. Worth flagging to anyone reviewing — the change is intentional, not an oversight in the assertions.

codecov-commenter · 2026-05-03T03:39:57Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@45f682d). Learn more about missing BASE report.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #139   +/-   ##
=======================================
  Coverage        ?   84.68%           
=======================================
  Files           ?       53           
  Lines           ?     7255           
  Branches        ?        0           
=======================================
  Hits            ?     6144           
  Misses          ?     1111           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

jrob5756

LGTM. Surgical fix with clear docstring and good test coverage (new behavior + None + regression for lowercase JSON literals). The two updated existing tests honestly correct previously-asserted-but-buggy behavior.

…gelog Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ion (#129) * fix(copilot): pass streaming=True to SDK to prevent tool-call truncation The Copilot SDK's create_session accepts a 'streaming' parameter that defaults to false. In non-streaming mode the model must emit its entire turn (text + tool_use blocks + arguments) under a single per-turn output budget. For agents that issue large tool-call arguments — most commonly 'create' with multi-KB 'file_text' — that budget is exhausted mid-JSON and the CLI silently executes the partial tool call (path only, no file_text). The model sees the tool succeed with no content, retries the same broken call, and loops indefinitely until the wall-clock session limit fires (default 1800s). The interactive 'copilot' CLI defaults to streaming, which is why the same model + tool combination works there but not via the SDK without this flag. Empirically verified red→green on the same workflow + model (claude-opus-4.7-1m-internal, single ~50 KB create tool call): - Without streaming=True: 9m08s wall-clock failure, 0 bytes written (ProviderError: tool 'create' was executing). - With streaming=True: 4m57s success, 62 KB written in a single create call. Tests: - tests/test_providers/test_copilot_streaming.py — unit test that verifies create_session is called with streaming=True (and that the existing required kwargs are preserved). - tests/test_integration/test_copilot_large_write.py — opt-in (real_api marker) regression test that builds a workflow inline, asks the writer agent to produce a single large create call, and asserts the file is at least 30 KB. Skips automatically when no copilot CLI is available. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: add changelog entry for streaming fix (#129) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: add #107 and #109 to unreleased changelog Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: add #100, #110, #111, #139, #142, #143, #144 to unreleased changelog Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…, #121-#123, #125, #129, #130, #131, #139, #141-#144, #146) CHANGELOG: add 6 newer PRs (#119, #121, #122, #123, #125, #113, #130, #131, #141, #146) to [Unreleased] alongside the previously documented batch. docs/workflow-syntax.md: - Add metadata + instructions fields to the workflow configuration block. - Add input_mapping and max_depth to Sub-Workflow Steps; correct stale claims that circular references are rejected and that workflow steps cannot be used in for_each groups. - Add 'Sub-workflows in for_each groups' subsection with example. - Add JSON stdout auto-parsing note + example to Script Steps output section. - Add type-appropriate zero values table to Workflow Inputs. - Add 'Workflow Metadata Variables' subsection covering workflow.dir, workflow.file, workflow.name. - Update on_start hook context list to include the new workflow.dir/file vars. docs/cli-reference.md: - Document --metadata/-m, --workspace-instructions, and --instructions flags on conductor run. - Add 'Metadata and Instructions' examples block. - Update conductor validate to describe the new template-reference error/warning checks added in #125. docs/providers/claude.md, docs/providers/comparison.md: - Replace stale 'All models support a 200K token context window' / '200K (all models)' claims with notes that the dashboard now sources context_window_max from each provider's SDK at runtime (#144). README.md: - Refresh the Features list to mention sub-workflow composition, dialog mode, workspace instructions, breadcrumb navigation, and the enhanced validate behavior. - Add --metadata, --workspace-instructions, --instructions to the conductor run options table. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: changelog + doc updates for unreleased PRs (#100, #109-#111, #119, #121-#123, #125, #129, #130, #131, #139, #141-#144, #146) CHANGELOG: add 6 newer PRs (#119, #121, #122, #123, #125, #113, #130, #131, #141, #146) to [Unreleased] alongside the previously documented batch. docs/workflow-syntax.md: - Add metadata + instructions fields to the workflow configuration block. - Add input_mapping and max_depth to Sub-Workflow Steps; correct stale claims that circular references are rejected and that workflow steps cannot be used in for_each groups. - Add 'Sub-workflows in for_each groups' subsection with example. - Add JSON stdout auto-parsing note + example to Script Steps output section. - Add type-appropriate zero values table to Workflow Inputs. - Add 'Workflow Metadata Variables' subsection covering workflow.dir, workflow.file, workflow.name. - Update on_start hook context list to include the new workflow.dir/file vars. docs/cli-reference.md: - Document --metadata/-m, --workspace-instructions, and --instructions flags on conductor run. - Add 'Metadata and Instructions' examples block. - Update conductor validate to describe the new template-reference error/warning checks added in #125. docs/providers/claude.md, docs/providers/comparison.md: - Replace stale 'All models support a 200K token context window' / '200K (all models)' claims with notes that the dashboard now sources context_window_max from each provider's SDK at runtime (#144). README.md: - Refresh the Features list to mention sub-workflow composition, dialog mode, workspace instructions, breadcrumb navigation, and the enhanced validate behavior. - Add --metadata, --workspace-instructions, --instructions to the conductor run options table. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: bump version to 0.1.11 and changelog #148 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Daniel Green and others added 2 commits May 2, 2026 20:32

PolyphonyRequiem mentioned this pull request May 3, 2026

fix: apply audit findings across all 9 v2 workflow YAMLs PolyphonyRequiem/polyphony-conductor-workflows#3

Merged

jrob5756 approved these changes May 4, 2026

View reviewed changes

jrob5756 merged commit 5f2f82c into microsoft:main May 4, 2026
7 checks passed

jrob5756 added a commit that referenced this pull request May 4, 2026

docs: add #100, #110, #111, #139, #142, #143, #144 to unreleased chan…

74682ea

…gelog Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jrob5756 mentioned this pull request May 5, 2026

docs: changelog + doc updates for unreleased PRs #147

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(engine): coerce Python literal "True"/"False"/"None" in workflow output#139

fix(engine): coerce Python literal "True"/"False"/"None" in workflow output#139
jrob5756 merged 2 commits intomicrosoft:mainfrom
PolyphonyRequiem:feat/parse-python-bools-and-none

PolyphonyRequiem commented May 3, 2026

Uh oh!

PolyphonyRequiem commented May 3, 2026

Uh oh!

codecov-commenter commented May 3, 2026

Uh oh!

jrob5756 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

PolyphonyRequiem commented May 3, 2026

Problem

Fix

Tests

Why this matters

Uh oh!

PolyphonyRequiem commented May 3, 2026

Uh oh!

codecov-commenter commented May 3, 2026

Codecov Report

Uh oh!

jrob5756 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants