feat(validate): catch template reference errors before runtime#125
feat(validate): catch template reference errors before runtime#125jrob5756 merged 4 commits intomicrosoft:mainfrom
Conversation
4744356 to
c2b107f
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #125 +/- ##
=======================================
Coverage ? 85.16%
=======================================
Files ? 53
Lines ? 7579
Branches ? 0
=======================================
Hits ? 6455
Misses ? 1124
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Enhanced `conductor validate` to scan Jinja2 templates in agent prompts,
script args, input_mapping, and workflow output for reference errors:
Level 1 — Template reference resolution:
- {{ X.output.Y }} where X is not a valid agent name → error
- {{ workflow.input.X }} where X is not a declared input → error
- In explicit mode, LLM agents referencing agent outputs not in their
input: list → warning
Also wires semantic validation (validate_workflow_config) into the CLI
`conductor validate` command, which previously only ran Pydantic schema
validation.
Catches errors like: stale agent name references after renames, missing
workflow input declarations, and undeclared dependencies in explicit mode
— all before runtime, at zero cost.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
c2b107f to
aaac6dd
Compare
jrob5756
left a comment
There was a problem hiding this comment.
Some testing and cleanup
| # {{ agent_name.output }} | ||
| # {% if agent_name.output.field %} | ||
| # Excludes built-in namespaces: workflow, context, item, _index, _key | ||
| _TEMPLATE_REF_PATTERN = re.compile(r"(?:\{\{|\{%)[^}%]*?\b(\w+)\.(?:output|outputs)\b") |
There was a problem hiding this comment.
Critical: false positives on valid workflows.
This regex has no awareness of Jinja2 syntax beyond the {{/{% delimiters. Two problem cases I verified:
- Jinja2
{% for %}loop variables —{% for r in researcher.outputs %}{{ r.output.text }}{% endfor %}matchesr.outputand reportsras 'unknown agent'. This is a normal Jinja2 idiom and these workflows used to validate fine. - String literals inside templates —
{{ x | replace("foo.output", "") }}reportsfooas 'unknown agent'.
Repro:
import re
p = re.compile(r'(?:\{\{|\{%)[^}%]*?\b(\w+)\.(?:output|outputs)\b')
[m.group(1) for m in p.finditer('{% for x in y %}{{ x.output }}{% endfor %}')]
# => ['x']
[m.group(1) for m in p.finditer('{{ x | replace("foo.output", "") }}')]
# => ['foo']Because these become errors (not warnings), conductor validate will hard-fail on previously-valid workflows. Either:
- Replace with a real Jinja2 AST walk (
jinja2.Environment().parse()+jinja2.meta.find_undeclared_variables), which gives string- and scope-aware variable tracking, or - Downgrade these to warnings until the AST approach lands.
There was a problem hiding this comment.
Confirmed both repros locally — good catches.
Replaced the regex with jinja2.Environment().parse(...) + meta.find_undeclared_variables, then walked nodes.Getattr chains rooted at the truly-free names. Both classes of false positive are now structurally impossible:
- The loop variable
ris bound by{% for r in ... %}and so doesn't appear infind_undeclared_variables— it's filtered out before we ever look at the chain. - String literals are
nodes.Const, notnodes.Getattr, so they never enter the walk.
Added regression tests for both in TestExtractTemplateRefs.test_for_loop_variable_is_not_a_ref and test_string_literal_is_not_a_ref, plus end-to-end versions in TestTemplateReferenceValidation.
Fix: 278b389
| r"^(?:" | ||
| r"(?P<agent>[a-zA-Z_][a-zA-Z0-9_]*)\.output(?:\.(?P<field>[a-zA-Z_][a-zA-Z0-9_]*))?|" | ||
| r"(?P<parallel>[a-zA-Z_][a-zA-Z0-9_]*)\.outputs\.(?P<pg_agent>[a-zA-Z_][a-zA-Z0-9_]*)(?:\.(?P<pg_field>[a-zA-Z_][a-zA-Z0-9_]*))?|" | ||
| r"(?P<parallel>[a-zA-Z_][a-zA-Z0-9_]*)\.(?:outputs|errors)(?:\.(?P<pg_agent>[a-zA-Z_][a-zA-Z0-9_]*)(?:\.(?P<pg_field>[a-zA-Z_][a-zA-Z0-9_]*))?)?|" |
There was a problem hiding this comment.
Needs test coverage for the new accepted forms.
This pattern was extended to accept .errors (in addition to .outputs) and to make the trailing .pg_agent[.pg_field] optional. Please add tests asserting the new accepted shapes — at minimum:
{group}.errors{group}.errors.{member}{group}.outputs(bare, no member)- Optional
?suffix on each of the above
Without tests, future refactors can silently regress this regex.
There was a problem hiding this comment.
Added — see TestInputRefPatternExtensions in tests/test_config/test_validator.py. Parametrized coverage for the new shapes:
{group}.errorsand{group}.errors?{group}.errors.{member}and{group}.errors.{member}.{field}{group}.outputs(bare) and{group}.outputs?
Plus a legacy-shape test (agent.output, workflow.input.param, etc.) so the regex can't drift backward, and a negative test that explicitly rejects bare workflow.input and other malformed shapes — so a future refactor that loosens the pattern fails a test instead of silently regressing.
Fix: 278b389
| templates.append((f"agent '{agent.name}' command", agent.command)) | ||
| for i, arg in enumerate(agent.args): | ||
| templates.append((f"agent '{agent.name}' args[{i}]", arg)) | ||
| input_mapping: dict[str, str] | None = getattr(agent, "input_mapping", None) |
There was a problem hiding this comment.
Dead code.
AgentDef has no input_mapping field — confirmed by grep across the entire repo (no occurrences in src/, tests/, or examples/). Pydantic v2 defaults to extra='ignore', so getattr(agent, 'input_mapping', None) always returns None and this whole branch is unreachable.
The PR description still claims:
Stale references in script args, input_mapping, system_prompt: All template-bearing fields are scanned
Either remove this block or, if forward-looking, gate it behind an actual schema field added in this PR.
There was a problem hiding this comment.
There was a problem hiding this comment.
Correcting myself — #101 actually closed via #109 on May 1, which added input_mapping to AgentDef. So the field DOES exist on main, the validator block in main collects from it, and dropping it from this PR would have been a regression once it merges back.
Restored the collection in 8cad8e3, kept the getattr shim so the code works on this branch (which is behind main and doesn't have the field yet) and on main (which does). Added TestInputMappingTemplateCollection exercising the path via a duck-typed agent — three tests covering the present/absent paths and stale-ref detection inside an input_mapping expression.
PR description updated to put input_mapping back in the list of scanned fields.
| def _resolve_prompt_file(agent_name: str, prompt: str, workflow_path: Path | None) -> str | None: | ||
| """If a prompt looks like it came from a !file tag, try to load the file content. | ||
|
|
||
| The !file tag is already resolved during YAML loading, so the prompt field | ||
| contains the file content. This function is for scanning prompts that may | ||
| have been loaded from files for additional template references. | ||
|
|
||
| Returns the prompt as-is (it's already resolved by the loader). | ||
| """ | ||
| return prompt if prompt else None |
There was a problem hiding this comment.
Dead code — please remove.
_resolve_prompt_file is never called anywhere (verified via grep). The body is a no-op (return prompt if prompt else None) and the docstring explicitly says "the prompt is already resolved by the loader." Either delete this entire function, or replace it with the real !file resolution it implies (and call it).
There was a problem hiding this comment.
Deleted in 278b389. No callers, no behavior — confirmed via grep. If we want real !file resolution at validate time it's a separate concern; the loader already handles it for run.
| return prompt if prompt else None | ||
|
|
||
|
|
||
| def _validate_template_references( |
There was a problem hiding this comment.
Suggestion: use Jinja2's own parser instead of regex.
Jinja2 ships with jinja2.Environment().parse(source) and jinja2.meta.find_undeclared_variables(ast) — these give you string-aware, comment-aware, and scope-aware (recognizing {% for x in y %}, {% set x = ... %}, macro params, etc.) variable tracking. This eliminates both classes of false positives flagged on _TEMPLATE_REF_PATTERN with no ongoing regex maintenance burden, and naturally handles cases like {{ workflow.input.x | filter('foo.output') }}.
Rough sketch:
from jinja2 import Environment, meta
env = Environment()
ast = env.parse(template)
undeclared = meta.find_undeclared_variables(ast) # set of top-level names
# Then resolve each name to either: builtin, agent, parallel/foreach group, workflow input, loop varAlso: find_referenced_templates(ast) for {% include %}/{% extends %} if those become relevant.
If this is too much for this PR, please at least downgrade missing-reference detection from errors to warnings so false positives don't block valid workflows.
There was a problem hiding this comment.
Done — switched to Environment().parse(...) + meta.find_undeclared_variables + a nodes.Getattr walk. See _extract_template_refs in 278b389. Both false-positive classes from the regex are now structurally impossible (loop vars are bound and so excluded from find_undeclared_variables; string literals are nodes.Const and never enter the walk).
One Jinja2 quirk worth flagging for future maintainers: meta.find_undeclared_variables actually runs the code generator over the AST, so it raises TemplateAssertionError for unknown filters/tests at find time, not just at render time. Conductor registers | json and friends on a different env at render time, which would have made every meta.find_undeclared_variables call blow up on real workflows. Worked around with a _TolerantNameMap(dict) installed onto _JINJA_ENV.filters and .tests that returns an identity function for any unknown name. Covered in TestExtractTemplateRefs.test_unknown_filter_does_not_raise and test_unknown_test_does_not_raise.
| output_console.print(f" [yellow]⚠[/yellow] {warning}") | ||
| except ConductorError as e: | ||
| display_validation_error(e, workflow_path, output_console) | ||
| return False, None |
There was a problem hiding this comment.
Missing tests for the new validation behavior.
This PR adds ~200 lines of nontrivial regex/AST-style validation logic (in validator.py) and wires it into the CLI here, but adds zero new tests. Codecov reports 16 patch lines uncovered.
At minimum, please add:
- Positive regression tests: every workflow under
examples/should still validate successfully (this would catch issues Bump cryptography from 46.0.4 to 46.0.5 #1 today). - Negative tests for each new error path: stale agent ref, stale workflow input ref, references in
system_prompt/command/args/working_dir. - Explicit-mode warning tests: undeclared agent ref + undeclared
workflow.input.Xref in explicit mode. - For-each tests: inline-agent template scanning,
for_each_namesaccepted in_validate_input_references. - Pattern-extension tests: new
INPUT_REF_PATTERNaccepts.errorsand bare{group}.outputs. - CLI integration: this branch — verify warnings render and that errors return
(False, None).
There was a problem hiding this comment.
Added a substantial test suite in 278b389:
In tests/test_config/test_validator.py:
TestExtractTemplateRefs— 12 unit tests on the new extractor (loop vars, string literals,{% set %}, macros, builtins, malformed templates, unknown filters/tests).TestInputRefPatternExtensions— parametrized regex shape tests for the new.errors/ bare.outputsforms (and a negative test for shapes that should still be rejected).TestTemplateReferenceValidation— end-to-end stale-ref detection inprompt,system_prompt, and parallel-group inputs, plus regression tests for both false-positive classes.TestExplicitModeWarnings— warnings vs no-warning paths forcontext.mode: explicit.TestOutputTemplateValidation— workflowoutput:template stale-ref detection.TestExamplesRegression— loops overexamples/*.yamlto prove no example regresses (your suggested positive regression test).
In tests/test_cli/test_validate.py:
TestSemanticValidationIntegration— 5 CLI-level tests covering exit codes, warning rendering, and the false-positive case being non-blocking.
Total: ~600 net lines of test coverage. Locally the new tests pull the patch coverage well above the 80% bar; will see what codecov reports on the next run.
… tests Address review feedback on PR microsoft#125 from @jrob5756. Replace regex-based template scanning with Jinja2's parser + AST walk. The previous `_TEMPLATE_REF_PATTERN` matched any `ident.attr.attr` chain in the raw template source, producing two classes of false positives the reviewer flagged: - Loop variables: `{% for r in pg.outputs %}{{ r.output.text }}{% endfor %}` triggered "stale agent reference: r" because the regex doesn't track Jinja2 scopes. - String literals: `{{ "agent.output.text" }}` triggered the same error because the regex doesn't distinguish strings from code. The new `_extract_template_refs` parses the template, calls `meta.find_undeclared_variables` to get only the truly free names, then walks `Getattr` chains rooted at those names. Loop variables, `{% set %}` bindings, and macro params are excluded automatically because they're declared inside the template. Also remove dead code: - The `input_mapping` collection block in `_collect_template_strings` referenced an `AgentDef` field that doesn't exist (issue microsoft#101 hasn't landed). `getattr` always returned `None`. - `_resolve_prompt_file` had no callers. - The vestigial `has_full_workflow_input` flag — bare `workflow.input` is no longer a valid input declaration after the INPUT_REF_PATTERN tightening. Add comprehensive tests: - `TestExtractTemplateRefs`: 12 unit tests covering loop vars, string literals, `{% set %}`, macros, builtins, malformed templates, and unknown filters/tests (the AST walk uses a tolerant filter map). - `TestInputRefPatternExtensions`: parametrized tests for `.errors` and bare `.outputs` that the PR added support for. - `TestTemplateReferenceValidation`: end-to-end stale-ref detection plus regression tests for both false-positive classes. - `TestExplicitModeWarnings`: warning emission for missing inputs in `context.mode: explicit`. - `TestOutputTemplateValidation`: workflow `output:` template checks. - `TestExamplesRegression`: loops over `examples/*.yaml` to prove no example breaks. - `TestSemanticValidationIntegration` in test_validate.py: 5 CLI-level tests covering exit codes, warning rendering, and false-positive non-blocking. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR microsoft#109 (closing microsoft#101) merged input_mapping onto AgentDef on main. The previous commit dropped input_mapping handling on the assumption that the field was nonexistent — true at the time of the PR's original review, but no longer. Restore the collection block (using getattr for forward-compat with branches that haven't merged microsoft#109 yet), and add tests via a duck-typed agent so coverage works regardless of whether the schema field is present on this branch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jrob5756
left a comment
There was a problem hiding this comment.
LGTM — all 6 review comments addressed substantively. Switch to Jinja2 AST is well-implemented, regression tests cover both false-positive classes, and example workflows still validate.
…, #121-#123, #125, #129, #130, #131, #139, #141-#144, #146) CHANGELOG: add 6 newer PRs (#119, #121, #122, #123, #125, #113, #130, #131, #141, #146) to [Unreleased] alongside the previously documented batch. docs/workflow-syntax.md: - Add metadata + instructions fields to the workflow configuration block. - Add input_mapping and max_depth to Sub-Workflow Steps; correct stale claims that circular references are rejected and that workflow steps cannot be used in for_each groups. - Add 'Sub-workflows in for_each groups' subsection with example. - Add JSON stdout auto-parsing note + example to Script Steps output section. - Add type-appropriate zero values table to Workflow Inputs. - Add 'Workflow Metadata Variables' subsection covering workflow.dir, workflow.file, workflow.name. - Update on_start hook context list to include the new workflow.dir/file vars. docs/cli-reference.md: - Document --metadata/-m, --workspace-instructions, and --instructions flags on conductor run. - Add 'Metadata and Instructions' examples block. - Update conductor validate to describe the new template-reference error/warning checks added in #125. docs/providers/claude.md, docs/providers/comparison.md: - Replace stale 'All models support a 200K token context window' / '200K (all models)' claims with notes that the dashboard now sources context_window_max from each provider's SDK at runtime (#144). README.md: - Refresh the Features list to mention sub-workflow composition, dialog mode, workspace instructions, breadcrumb navigation, and the enhanced validate behavior. - Add --metadata, --workspace-instructions, --instructions to the conductor run options table. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* docs: changelog + doc updates for unreleased PRs (#100, #109-#111, #119, #121-#123, #125, #129, #130, #131, #139, #141-#144, #146) CHANGELOG: add 6 newer PRs (#119, #121, #122, #123, #125, #113, #130, #131, #141, #146) to [Unreleased] alongside the previously documented batch. docs/workflow-syntax.md: - Add metadata + instructions fields to the workflow configuration block. - Add input_mapping and max_depth to Sub-Workflow Steps; correct stale claims that circular references are rejected and that workflow steps cannot be used in for_each groups. - Add 'Sub-workflows in for_each groups' subsection with example. - Add JSON stdout auto-parsing note + example to Script Steps output section. - Add type-appropriate zero values table to Workflow Inputs. - Add 'Workflow Metadata Variables' subsection covering workflow.dir, workflow.file, workflow.name. - Update on_start hook context list to include the new workflow.dir/file vars. docs/cli-reference.md: - Document --metadata/-m, --workspace-instructions, and --instructions flags on conductor run. - Add 'Metadata and Instructions' examples block. - Update conductor validate to describe the new template-reference error/warning checks added in #125. docs/providers/claude.md, docs/providers/comparison.md: - Replace stale 'All models support a 200K token context window' / '200K (all models)' claims with notes that the dashboard now sources context_window_max from each provider's SDK at runtime (#144). README.md: - Refresh the Features list to mention sub-workflow composition, dialog mode, workspace instructions, breadcrumb navigation, and the enhanced validate behavior. - Add --metadata, --workspace-instructions, --instructions to the conductor run options table. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: bump version to 0.1.11 and changelog #148 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Enhanced
conductor validateto catch Jinja2 template reference errors before runtime. Previously,conductor validateonly checked YAML schema (Pydantic) — stale agent references, missing workflow inputs, and undeclared dependencies only surfaced at runtime after minutes of execution and real API costs.What it catches
Errors (validation fails)
{{ old_agent_name.output.field }}whereold_agent_namedoesn't exist{{ workflow.input.missing_field }}wheremissing_fieldisn't declared ininput:prompt,system_prompt,command,args,working_dir,input_mapping, parallel-group inputs, and workflowoutput:templates — all template-bearing fields are scannedWarnings (validation passes with notes)
{{ a.output.val }}but doesn't declarea.outputin itsinput:listExample
Changes
src/conductor/config/validator.py:_extract_template_refs()— parses each template viajinja2.Environment().parse()and walksnodes.Getattrchains rooted at the truly-free names returned bymeta.find_undeclared_variables. Loop variables,{% set %}bindings, and macro params are excluded automatically because they don't appear infind_undeclared_variables. String literals arenodes.Constand never enter the walk._TolerantNameMap(dict)on_JINJA_ENV.filters/.testsso workflow-specific filters (registered on a different env at render time, e.g.| json) don't causemeta.find_undeclared_variablesto raiseTemplateAssertionErrorduring validation._collect_template_strings()to gather all Jinja2 template strings from an agent. Includesinput_mappingfor sub-workflow agents (added by feat(composition): dynamic sub-workflow inputs via input_mapping (#101) #109 closing feat(composition): dynamic sub-workflow inputs via input_mapping #101) — usesgetattrso it stays a no-op on branches that haven't merged that schema field yet._validate_template_references()with agent name, workflow input, and explicit-mode checks.INPUT_REF_PATTERNto accept{group}.errors,{group}.errors.{member}, and bare{group}.outputs(with optional?suffix on each).validate_workflow_config()signature to accept optionalworkflow_path.src/conductor/cli/validate.py:validate_workflow_config) into the CLIvalidatecommand — it previously only ran Pydantic schema validation.Testing
tests/test_config+tests/test_cli/test_validate.py), no regressions.tests/test_config/test_validator.py:TestExtractTemplateRefs— unit tests on the new extractor (loop vars, string literals,{% set %}, macros, builtins, malformed templates, unknown filters/tests).TestInputRefPatternExtensions— parametrized regex shape tests for the new.errors/ bare.outputsforms, plus negative cases.TestTemplateReferenceValidation— end-to-end stale-ref detection inprompt,system_prompt, and parallel-group inputs, plus regression tests for the false-positive classes flagged in review (Jinja2 loop variables, string literals).TestExplicitModeWarnings— warnings vs. no-warning paths.TestOutputTemplateValidation— workflowoutput:template stale-ref detection.TestExamplesRegression— loops overexamples/*.yamlto prove no example regresses.TestInputMappingTemplateCollection— coversinput_mappingcollection + stale-ref detection via a duck-typed agent (so the test runs regardless of whether this branch has merged theinput_mappingschema field from feat(composition): dynamic sub-workflow inputs via input_mapping (#101) #109).TestSemanticValidationIntegrationclass intests/test_cli/test_validate.py— 5 CLI-level tests covering exit codes, warning rendering, and the false-positive case being non-blocking.examples/*.yamlvalidate cleanly.