Skip to content

fix(engine): coerce Python literal "True"/"False"/"None" in workflow output#139

Merged
jrob5756 merged 2 commits intomicrosoft:mainfrom
PolyphonyRequiem:feat/parse-python-bools-and-none
May 4, 2026
Merged

fix(engine): coerce Python literal "True"/"False"/"None" in workflow output#139
jrob5756 merged 2 commits intomicrosoft:mainfrom
PolyphonyRequiem:feat/parse-python-bools-and-none

Conversation

@PolyphonyRequiem
Copy link
Copy Markdown
Member

Problem

Workflow output templates pass through _maybe_parse_json (engine/workflow.py:3398) to convert JSON-shaped strings back into native Python types. The function only recognized lowercase JSON literals (true / false / null).

In practice, workflow author expressions like:

output:
  matched: "{{ left == right }}"
  done:    "{{ count >= threshold }}"

render their bool results via Jinja's default str(), producing the strings "True" / "False". These then survived _maybe_parse_json unchanged — as truthy non-empty strings — so downstream route when: clauses comparing them against true / false silently misbehaved. The string "False" is truthy and == true is also false, so behavior was hard to reason about and harder to spot.

A {{ none }} expression has the same shape: renders as "None", survives as a string.

Fix

Recognize the three Python-literal forms ("True" / "False" / "None") explicitly before the existing JSON-literal check. Lowercase JSON literals continue to coerce as before.

Three lines of behavior change in _maybe_parse_json:

if stripped == "True":
    return True
if stripped == "False":
    return False
if stripped == "None":
    return None

Tests

Added 3 cases to TestWorkflowEngineOutputTemplates:

  • test_output_template_python_bool_literals — verifies {{ a == b }} / {{ a != b }} produce native bool.
  • test_output_template_python_none_literal — verifies {{ none }} produces native None.
  • test_output_template_lowercase_json_literals_still_work — regression check for true / false / null.

All 77 tests in test_engine/test_workflow.py pass; ruff clean.

Why this matters

The mismatch between Jinja's str(bool) output and the JSON-literal recognition list was a stable footgun: workflows lint clean, validate clean, run without errors, and silently take wrong route branches. Even authors who knew about it had to reach for awkward workarounds like {{ (a == b) | string | lower }} or {{ 1 if a == b else 0 }}.

This patch is intentionally small and additive — no behavior change for any input that already coerced correctly. Happy to adjust scope if you'd prefer a different approach (e.g. coerce at template-render time rather than at output-map time).

Daniel Green and others added 2 commits May 2, 2026 20:32
…output

Workflow output templates pass through `_maybe_parse_json` to convert
JSON-shaped strings back into native types. Previously this only
recognized lowercase JSON literals (`true`/`false`/`null`). Jinja
expressions like `{{ a == b }}` render Python bool via `str()`, producing
`"True"` / `"False"`, which then survived as truthy non-empty strings
downstream. Route `when:` clauses comparing such values against `true` /
`false` silently misbehaved.

Add explicit handling for the three Python literal forms before the
existing JSON parse path. Lowercase JSON literals continue to work
(regression covered).

Tests: 3 new cases under `TestWorkflowEngineOutputTemplates` covering
`True`/`False` from `==` / `!=` expressions, `None` from `{{ none }}`,
and a regression check for the lowercase forms.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two existing integration tests asserted the broken behavior they were
exercising:
- test_examples.py:214: `result["syntax_passed"] == "True"`
- test_parallel_workflows.py:410: `result["success"] == "True"`

Both had inline comments acknowledging the workaround
(`# Templates return strings`, `# Boolean rendered as string`). With
this PR's fix, those values now coerce to native bool. Update the
assertions accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PolyphonyRequiem
Copy link
Copy Markdown
Member Author

The previous CI run surfaced 2 failing tests. Both were asserting the broken behavior this PR fixes (with inline comments acknowledging the workaround):

  • test_examples.py:214result["syntax_passed"] == "True" # Templates return strings
  • test_parallel_workflows.py:410result["success"] == "True" # Boolean rendered as string

Pushed 4a61195 updating both to assert native bool (is True).

This is also a useful signal: those tests effectively codified the bug as expected behavior. Worth flagging to anyone reviewing — the change is intentional, not an oversight in the assertions.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@45f682d). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #139   +/-   ##
=======================================
  Coverage        ?   84.68%           
=======================================
  Files           ?       53           
  Lines           ?     7255           
  Branches        ?        0           
=======================================
  Hits            ?     6144           
  Misses          ?     1111           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Collaborator

@jrob5756 jrob5756 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Surgical fix with clear docstring and good test coverage (new behavior + None + regression for lowercase JSON literals). The two updated existing tests honestly correct previously-asserted-but-buggy behavior.

@jrob5756 jrob5756 merged commit 5f2f82c into microsoft:main May 4, 2026
7 checks passed
jrob5756 added a commit that referenced this pull request May 4, 2026
…gelog

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jrob5756 added a commit that referenced this pull request May 4, 2026
…ion (#129)

* fix(copilot): pass streaming=True to SDK to prevent tool-call truncation

The Copilot SDK's create_session accepts a 'streaming' parameter that
defaults to false. In non-streaming mode the model must emit its entire
turn (text + tool_use blocks + arguments) under a single per-turn output
budget. For agents that issue large tool-call arguments — most commonly
'create' with multi-KB 'file_text' — that budget is exhausted mid-JSON
and the CLI silently executes the partial tool call (path only, no
file_text). The model sees the tool succeed with no content, retries the
same broken call, and loops indefinitely until the wall-clock session
limit fires (default 1800s). The interactive 'copilot' CLI defaults to
streaming, which is why the same model + tool combination works there
but not via the SDK without this flag.

Empirically verified red→green on the same workflow + model
(claude-opus-4.7-1m-internal, single ~50 KB create tool call):
- Without streaming=True: 9m08s wall-clock failure, 0 bytes written
  (ProviderError: tool 'create' was executing).
- With streaming=True: 4m57s success, 62 KB written in a single
  create call.

Tests:
- tests/test_providers/test_copilot_streaming.py — unit test that
  verifies create_session is called with streaming=True (and that the
  existing required kwargs are preserved).
- tests/test_integration/test_copilot_large_write.py — opt-in
  (real_api marker) regression test that builds a workflow inline,
  asks the writer agent to produce a single large create call, and
  asserts the file is at least 30 KB. Skips automatically when no
  copilot CLI is available.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: add changelog entry for streaming fix (#129)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: add #107 and #109 to unreleased changelog

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: add #100, #110, #111, #139, #142, #143, #144 to unreleased changelog

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jrob5756 added a commit that referenced this pull request May 5, 2026
…, #121-#123, #125, #129, #130, #131, #139, #141-#144, #146)

CHANGELOG: add 6 newer PRs (#119, #121, #122, #123, #125, #113, #130, #131, #141, #146) to [Unreleased] alongside the previously documented batch.

docs/workflow-syntax.md:
  - Add metadata + instructions fields to the workflow configuration block.
  - Add input_mapping and max_depth to Sub-Workflow Steps; correct stale claims that circular references are rejected and that workflow steps cannot be used in for_each groups.
  - Add 'Sub-workflows in for_each groups' subsection with example.
  - Add JSON stdout auto-parsing note + example to Script Steps output section.
  - Add type-appropriate zero values table to Workflow Inputs.
  - Add 'Workflow Metadata Variables' subsection covering workflow.dir, workflow.file, workflow.name.
  - Update on_start hook context list to include the new workflow.dir/file vars.

docs/cli-reference.md:
  - Document --metadata/-m, --workspace-instructions, and --instructions flags on conductor run.
  - Add 'Metadata and Instructions' examples block.
  - Update conductor validate to describe the new template-reference error/warning checks added in #125.

docs/providers/claude.md, docs/providers/comparison.md:
  - Replace stale 'All models support a 200K token context window' / '200K (all models)' claims with notes that the dashboard now sources context_window_max from each provider's SDK at runtime (#144).

README.md:
  - Refresh the Features list to mention sub-workflow composition, dialog mode, workspace instructions, breadcrumb navigation, and the enhanced validate behavior.
  - Add --metadata, --workspace-instructions, --instructions to the conductor run options table.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jrob5756 added a commit that referenced this pull request May 5, 2026
* docs: changelog + doc updates for unreleased PRs (#100, #109-#111, #119, #121-#123, #125, #129, #130, #131, #139, #141-#144, #146)

CHANGELOG: add 6 newer PRs (#119, #121, #122, #123, #125, #113, #130, #131, #141, #146) to [Unreleased] alongside the previously documented batch.

docs/workflow-syntax.md:
  - Add metadata + instructions fields to the workflow configuration block.
  - Add input_mapping and max_depth to Sub-Workflow Steps; correct stale claims that circular references are rejected and that workflow steps cannot be used in for_each groups.
  - Add 'Sub-workflows in for_each groups' subsection with example.
  - Add JSON stdout auto-parsing note + example to Script Steps output section.
  - Add type-appropriate zero values table to Workflow Inputs.
  - Add 'Workflow Metadata Variables' subsection covering workflow.dir, workflow.file, workflow.name.
  - Update on_start hook context list to include the new workflow.dir/file vars.

docs/cli-reference.md:
  - Document --metadata/-m, --workspace-instructions, and --instructions flags on conductor run.
  - Add 'Metadata and Instructions' examples block.
  - Update conductor validate to describe the new template-reference error/warning checks added in #125.

docs/providers/claude.md, docs/providers/comparison.md:
  - Replace stale 'All models support a 200K token context window' / '200K (all models)' claims with notes that the dashboard now sources context_window_max from each provider's SDK at runtime (#144).

README.md:
  - Refresh the Features list to mention sub-workflow composition, dialog mode, workspace instructions, breadcrumb navigation, and the enhanced validate behavior.
  - Add --metadata, --workspace-instructions, --instructions to the conductor run options table.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: bump version to 0.1.11 and changelog #148

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants