fix(engine): coerce Python literal "True"/"False"/"None" in workflow output#139
Conversation
…output
Workflow output templates pass through `_maybe_parse_json` to convert
JSON-shaped strings back into native types. Previously this only
recognized lowercase JSON literals (`true`/`false`/`null`). Jinja
expressions like `{{ a == b }}` render Python bool via `str()`, producing
`"True"` / `"False"`, which then survived as truthy non-empty strings
downstream. Route `when:` clauses comparing such values against `true` /
`false` silently misbehaved.
Add explicit handling for the three Python literal forms before the
existing JSON parse path. Lowercase JSON literals continue to work
(regression covered).
Tests: 3 new cases under `TestWorkflowEngineOutputTemplates` covering
`True`/`False` from `==` / `!=` expressions, `None` from `{{ none }}`,
and a regression check for the lowercase forms.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two existing integration tests asserted the broken behavior they were exercising: - test_examples.py:214: `result["syntax_passed"] == "True"` - test_parallel_workflows.py:410: `result["success"] == "True"` Both had inline comments acknowledging the workaround (`# Templates return strings`, `# Boolean rendered as string`). With this PR's fix, those values now coerce to native bool. Update the assertions accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
The previous CI run surfaced 2 failing tests. Both were asserting the broken behavior this PR fixes (with inline comments acknowledging the workaround):
Pushed 4a61195 updating both to assert native This is also a useful signal: those tests effectively codified the bug as expected behavior. Worth flagging to anyone reviewing — the change is intentional, not an oversight in the assertions. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #139 +/- ##
=======================================
Coverage ? 84.68%
=======================================
Files ? 53
Lines ? 7255
Branches ? 0
=======================================
Hits ? 6144
Misses ? 1111
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
jrob5756
left a comment
There was a problem hiding this comment.
LGTM. Surgical fix with clear docstring and good test coverage (new behavior + None + regression for lowercase JSON literals). The two updated existing tests honestly correct previously-asserted-but-buggy behavior.
…ion (#129) * fix(copilot): pass streaming=True to SDK to prevent tool-call truncation The Copilot SDK's create_session accepts a 'streaming' parameter that defaults to false. In non-streaming mode the model must emit its entire turn (text + tool_use blocks + arguments) under a single per-turn output budget. For agents that issue large tool-call arguments — most commonly 'create' with multi-KB 'file_text' — that budget is exhausted mid-JSON and the CLI silently executes the partial tool call (path only, no file_text). The model sees the tool succeed with no content, retries the same broken call, and loops indefinitely until the wall-clock session limit fires (default 1800s). The interactive 'copilot' CLI defaults to streaming, which is why the same model + tool combination works there but not via the SDK without this flag. Empirically verified red→green on the same workflow + model (claude-opus-4.7-1m-internal, single ~50 KB create tool call): - Without streaming=True: 9m08s wall-clock failure, 0 bytes written (ProviderError: tool 'create' was executing). - With streaming=True: 4m57s success, 62 KB written in a single create call. Tests: - tests/test_providers/test_copilot_streaming.py — unit test that verifies create_session is called with streaming=True (and that the existing required kwargs are preserved). - tests/test_integration/test_copilot_large_write.py — opt-in (real_api marker) regression test that builds a workflow inline, asks the writer agent to produce a single large create call, and asserts the file is at least 30 KB. Skips automatically when no copilot CLI is available. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: add changelog entry for streaming fix (#129) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: add #107 and #109 to unreleased changelog Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: add #100, #110, #111, #139, #142, #143, #144 to unreleased changelog Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…, #121-#123, #125, #129, #130, #131, #139, #141-#144, #146) CHANGELOG: add 6 newer PRs (#119, #121, #122, #123, #125, #113, #130, #131, #141, #146) to [Unreleased] alongside the previously documented batch. docs/workflow-syntax.md: - Add metadata + instructions fields to the workflow configuration block. - Add input_mapping and max_depth to Sub-Workflow Steps; correct stale claims that circular references are rejected and that workflow steps cannot be used in for_each groups. - Add 'Sub-workflows in for_each groups' subsection with example. - Add JSON stdout auto-parsing note + example to Script Steps output section. - Add type-appropriate zero values table to Workflow Inputs. - Add 'Workflow Metadata Variables' subsection covering workflow.dir, workflow.file, workflow.name. - Update on_start hook context list to include the new workflow.dir/file vars. docs/cli-reference.md: - Document --metadata/-m, --workspace-instructions, and --instructions flags on conductor run. - Add 'Metadata and Instructions' examples block. - Update conductor validate to describe the new template-reference error/warning checks added in #125. docs/providers/claude.md, docs/providers/comparison.md: - Replace stale 'All models support a 200K token context window' / '200K (all models)' claims with notes that the dashboard now sources context_window_max from each provider's SDK at runtime (#144). README.md: - Refresh the Features list to mention sub-workflow composition, dialog mode, workspace instructions, breadcrumb navigation, and the enhanced validate behavior. - Add --metadata, --workspace-instructions, --instructions to the conductor run options table. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* docs: changelog + doc updates for unreleased PRs (#100, #109-#111, #119, #121-#123, #125, #129, #130, #131, #139, #141-#144, #146) CHANGELOG: add 6 newer PRs (#119, #121, #122, #123, #125, #113, #130, #131, #141, #146) to [Unreleased] alongside the previously documented batch. docs/workflow-syntax.md: - Add metadata + instructions fields to the workflow configuration block. - Add input_mapping and max_depth to Sub-Workflow Steps; correct stale claims that circular references are rejected and that workflow steps cannot be used in for_each groups. - Add 'Sub-workflows in for_each groups' subsection with example. - Add JSON stdout auto-parsing note + example to Script Steps output section. - Add type-appropriate zero values table to Workflow Inputs. - Add 'Workflow Metadata Variables' subsection covering workflow.dir, workflow.file, workflow.name. - Update on_start hook context list to include the new workflow.dir/file vars. docs/cli-reference.md: - Document --metadata/-m, --workspace-instructions, and --instructions flags on conductor run. - Add 'Metadata and Instructions' examples block. - Update conductor validate to describe the new template-reference error/warning checks added in #125. docs/providers/claude.md, docs/providers/comparison.md: - Replace stale 'All models support a 200K token context window' / '200K (all models)' claims with notes that the dashboard now sources context_window_max from each provider's SDK at runtime (#144). README.md: - Refresh the Features list to mention sub-workflow composition, dialog mode, workspace instructions, breadcrumb navigation, and the enhanced validate behavior. - Add --metadata, --workspace-instructions, --instructions to the conductor run options table. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: bump version to 0.1.11 and changelog #148 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem
Workflow output templates pass through
_maybe_parse_json(engine/workflow.py:3398) to convert JSON-shaped strings back into native Python types. The function only recognized lowercase JSON literals (true/false/null).In practice, workflow author expressions like:
render their
boolresults via Jinja's defaultstr(), producing the strings"True"/"False". These then survived_maybe_parse_jsonunchanged — as truthy non-empty strings — so downstream routewhen:clauses comparing them againsttrue/falsesilently misbehaved. The string"False"is truthy and== trueis alsofalse, so behavior was hard to reason about and harder to spot.A
{{ none }}expression has the same shape: renders as"None", survives as a string.Fix
Recognize the three Python-literal forms (
"True"/"False"/"None") explicitly before the existing JSON-literal check. Lowercase JSON literals continue to coerce as before.Three lines of behavior change in
_maybe_parse_json:Tests
Added 3 cases to
TestWorkflowEngineOutputTemplates:test_output_template_python_bool_literals— verifies{{ a == b }}/{{ a != b }}produce nativebool.test_output_template_python_none_literal— verifies{{ none }}produces nativeNone.test_output_template_lowercase_json_literals_still_work— regression check fortrue/false/null.All 77 tests in
test_engine/test_workflow.pypass; ruff clean.Why this matters
The mismatch between Jinja's
str(bool)output and the JSON-literal recognition list was a stable footgun: workflows lint clean, validate clean, run without errors, and silently take wrong route branches. Even authors who knew about it had to reach for awkward workarounds like{{ (a == b) | string | lower }}or{{ 1 if a == b else 0 }}.This patch is intentionally small and additive — no behavior change for any input that already coerced correctly. Happy to adjust scope if you'd prefer a different approach (e.g. coerce at template-render time rather than at output-map time).