fix: unify pre-execution validation with --validate-only (fixes #66) by spinje · Pull Request #67 · spinje/pflow

spinje · 2026-01-14T13:32:33Z

Summary

Unifies validation behavior between normal workflow execution and --validate-only mode. Previously, normal execution used weaker schema-only validation while --validate-only used full WorkflowValidator. This caused unknown node types to show ugly tracebacks instead of clean error messages.

Changes

Add _validate_before_execution() function using full WorkflowValidator with real execution params
Remove _validate_workflow_structure() and _validate_and_handle_workflow_errors() (dead code)
Remove unused validate_ir import from main.py
Add validation call after _prepare_execution_environment() but before execute_workflow()
Update test fixtures to patch WorkflowValidator.validate instead of removed validate_ir
Add new test file test_validation_before_execution.py with 5 tests

Explanation

The CLI had two different validation paths:

Mode	Before	After
Normal execution	`validate_ir()` (schema only)	`WorkflowValidator.validate()` (full)
`--validate-only`	`WorkflowValidator.validate()` (full)	`WorkflowValidator.validate()` (full)
`--auto-repair`	`WorkflowValidator.validate()` (full)	`WorkflowValidator.validate()` (full)

The key insight was that validation should happen AFTER _prepare_execution_environment() creates enhanced_params, so we can validate with real params instead of dummy placeholders. This provides better template validation than --validate-only mode.

Before (ugly traceback):

ERROR: Node instantiation failed
Traceback (most recent call last):
  ...
pflow.runtime.compiler.CompilationError: compiler: Node type 'nonexistent-node' not found
...
❌ Planning failed: compiler: Node type 'nonexistent-node' not found

After (clean error):

✗ Static validation failed:
  • Unknown node type: 'nonexistent-node'

Testing

All 3985 tests pass
make check passes (linting, type checking)
New test file verifies:
- Unknown nodes are caught before any node executes
- Clean error messages (no tracebacks)
- JSON format works correctly
- Valid workflows still execute normally
- Validation is consistent between --validate-only and normal execution

Run make test to verify all tests pass.

🤖 Implemented by Claude

Previously, normal workflow execution used weaker validation (schema-only via validate_ir) while --validate-only used full WorkflowValidator. This caused unknown node types to show ugly tracebacks instead of clean error messages. Changes: - Add _validate_before_execution() using full WorkflowValidator with real params - Remove _validate_workflow_structure() and _validate_and_handle_workflow_errors() - Remove unused validate_ir import from main.py - Add validation call after _prepare_execution_environment() in execute_json_workflow() - Update tests to patch WorkflowValidator.validate instead of removed validate_ir Benefits: - Consistent validation across all modes (normal, --validate-only, --auto-repair) - Clean error messages for unknown node types - Validation happens BEFORE any nodes execute - Removed ~65 lines of duplicate validation logic Fixes #66

- Remove unused metrics_collector parameter from _validate_before_execution() - Remove overly broad except Exception block that was hiding potential bugs (let WorkflowValidator exceptions propagate naturally)

claude · 2026-01-14T13:35:16Z

Code Review

This PR successfully unifies validation behavior between normal execution and --validate-only mode. The implementation is clean, well-tested, and solves a real user experience problem. However, there are a few areas that need attention before merge.

Critical — Must Fix Before Merge

1. Registry Instantiation in Validation Path

Location: src/pflow/cli/main.py:2092

registry = Registry()

Issue: Creating a fresh Registry() instance on every validation call triggers a full rescan of the registry from disk. This is wasteful and could cause performance issues, especially if validation is called multiple times or in loops.

Fix: Follow the existing pattern in _perform_validation() (line 1966) and other parts of the codebase - instantiate the registry once and reuse it. Consider:

Pass registry as a parameter to _validate_before_execution() (preferred - maintains functional purity)
Cache registry in context object
Use a module-level cached registry instance

Recommendation: Add an optional registry parameter with lazy instantiation:

def _validate_before_execution(
    ctx: click.Context,
    ir_data: dict[str, Any],
    execution_params: dict[str, Any],
    output_format: str,
    verbose: bool,
    metrics_collector: Any | None,
    registry: Registry | None = None,  # Allow caller to provide
) -> None:
    ...
    if registry is None:
        registry = Registry()
    ...

Then pass the registry from the caller if available.

2. Unused `metrics_collector` Parameter

Location: src/pflow/cli/main.py:2073

def _validate_before_execution(
    ...
    metrics_collector: Any | None,  # <-- Never used
) -> None:

Issue: The metrics_collector parameter is documented and passed but never actually used in the function body. The comment at line 2119 even acknowledges this: "Not including metrics since validation fails before execution starts"

Impact:

Misleading function signature
Future maintainers may try to use it
Violates principle of minimal interfaces

Fix: Either:

Remove the parameter entirely (preferred - YAGNI principle)
If you anticipate needing it for validation metrics in the future, add a TODO comment explaining the plan

Recommendation: Remove it. The function calls ctx.exit(1) on validation failure, so metrics would never be collected anyway.

def _validate_before_execution(
    ctx: click.Context,
    ir_data: dict[str, Any],
    execution_params: dict[str, Any],
    output_format: str,
    verbose: bool,
) -> None:

Update the call site at line 2174 accordingly.

Warnings — Should Be Addressed

3. Error Handling Could Be More Specific

Location: src/pflow/cli/main.py:2101-2107

except Exception as e:
    # Handle unexpected validation errors
    if output_format == "json":
        click.echo(json.dumps({"success": False, "error": f"Validation error: {e}"}))
    else:
        click.echo(f"✗ Validation error: {e}", err=True)
    ctx.exit(1)

Issue: Catching all Exceptions is too broad. If WorkflowValidator.validate() has a bug (e.g., KeyError, AttributeError), this will silently convert it to a user-facing validation error instead of exposing the actual bug.

Impact:

Makes debugging harder
Hides potential bugs in the validator
Users see "Validation error: 'nodes'" instead of a proper traceback

Recommendation: Only catch expected exceptions or let unexpected ones bubble up (they'll be caught by Click's error handler):

# Remove the try-except entirely - let WorkflowValidator raise its own errors
errors, warnings = WorkflowValidator.validate(
    workflow_ir=ir_data,
    extracted_params=execution_params,
    registry=registry,
    skip_node_types=False,
)

If you need to handle specific exceptions from the validator, catch those specifically. But a bare Exception catch is almost always wrong.

4. Test Helper Function Has Overly Complex Setup

Location: tests/test_cli/test_validation_before_execution.py:17-66

The invoke_cli() helper manually manages sys.argv, sys.stdout, sys.stderr, and sys.stdin. While functional, this is fragile and duplicates logic.

Issue:

Manual stream management is error-prone
If an exception occurs before the finally block, state could leak
The custom Result class duplicates Click's testing infrastructure

Recommendation: Use Click's built-in CliRunner for testing:

from click.testing import CliRunner
from pflow.cli.main_wrapper import cli_main

def invoke_cli(args: list[str]) -> Any:
    runner = CliRunner(mix_stderr=False)
    return runner.invoke(cli_main, args)

This is more maintainable and follows Click's recommended testing patterns. The CliRunner handles all stream capture and restoration automatically.

5. Missing Test: Warnings Display in Verbose Mode

Location: src/pflow/cli/main.py:2126-2128

# Warnings are non-blocking, just display them in verbose mode
if warnings and output_format != "json" and verbose:
    for warning in warnings:
        click.echo(f"⚠️  {warning}", err=True)

Issue: This code path is not tested. The test file has 5 tests, but none verify that warnings are displayed correctly in verbose mode.

Recommendation: Add a test case:

def test_warnings_displayed_in_verbose_mode(self, tmp_path: Path) -> None:
    """Warnings should be shown in verbose mode."""
    # Create workflow that triggers a warning (e.g., runtime-validated template)
    workflow = {...}
    
    result = invoke_cli(["--verbose", str(workflow_path)])
    
    assert result.exit_code == 0
    assert "⚠️" in result.stderr  # Verify warning is shown

6. Inconsistent Comment Style

Location: src/pflow/cli/main.py:2171-2174

# Validate before execution (if not using auto-repair)
# Auto-repair mode handles validation inside execute_workflow() with repair capability
if not auto_repair:
    _validate_before_execution(ctx, ir_data, enhanced_params, output_format, verbose, metrics_collector)

Issue: The comment is accurate but verbose. The function name _validate_before_execution already conveys intent. The auto-repair explanation is valuable but could be a docstring or more concise.

Recommendation: Simplify:

# Skip validation if auto-repair enabled (executes inside execute_workflow with repair)
if not auto_repair:
    _validate_before_execution(...)

Or move the explanation to _validate_before_execution's docstring since it explains the relationship between validation and repair.

Suggestions — Optional Improvements

7. Consider Extracting JSON Error Output to Helper

Locations: Lines 2103-2107, 2110-2120

Both error paths (exception and validation errors) manually construct JSON error output. Consider extracting this pattern:

def _create_validation_error_json(error_msg: str, validation_errors: list[str] | None = None, 
                                   metadata: dict | None = None, verbose: bool = False) -> str:
    output = {"success": False, "error": error_msg}
    if validation_errors:
        output["validation_errors"] = validation_errors
    if metadata:
        output["metadata"] = metadata
    return json.dumps(output, indent=2 if verbose else None)

This would reduce duplication and ensure consistent JSON error format across the codebase.

8. Test Coverage: Edge Case for Empty Workflows

The tests don't cover edge cases like:

Workflows with zero nodes
Workflows with nodes but no edges
Workflows with circular dependencies (though data flow validation should catch this)

Consider adding negative test cases to ensure validation handles degenerate inputs gracefully.

9. Performance: Could Cache Registry in Context

Since the registry is needed for both validation and execution, consider caching it in ctx.obj after first instantiation:

registry = ctx.obj.get("registry")
if registry is None:
    registry = Registry()
    ctx.obj["registry"] = registry

This would avoid multiple rescans during a single CLI invocation.

Positive Observations

✅ Excellent test quality - The "proof file" pattern in test_unknown_node_caught_before_any_execution is clever and validates the core contract clearly

✅ Clear documentation - The PR description, code comments, and test docstrings explain the "why" behind the change

✅ Proper separation of concerns - Validation logic lives in WorkflowValidator, CLI just orchestrates

✅ Consistent error formatting - Uses existing format_validation_failure formatter for text output

✅ Dead code removal - Removing _validate_and_handle_workflow_errors and _validate_workflow_structure is the right call

✅ Backwards compatibility - Auto-repair mode continues to work as before

Summary

This is a solid PR that addresses a real usability issue. The critical fixes are straightforward:

Fix registry instantiation (reuse instead of recreate)
Remove unused metrics_collector parameter

The warnings are quality improvements that would prevent future issues. The suggestions are optional optimizations.

After addressing the critical issues, this is ready to merge. Great work on the comprehensive test coverage!

Estimated fix time: 10-15 minutes for critical issues, 30 minutes for warnings.

Risk level: Low - changes are isolated to validation flow, well-tested.

spinje · 2026-01-14T13:45:45Z

Response to Review

Thanks for the thorough review! I've addressed the critical issues.

Fixed ✅

Critical #2: Unused metrics_collector parameter

Removed the parameter from _validate_before_execution() and its call site
Commit: 4fb2657

Critical #3: Bare except Exception

Removed the try-except block entirely
WorkflowValidator.validate() exceptions now propagate naturally, making debugging easier
Commit: 4fb2657

Not Addressed (by design)

Critical #1: Registry instantiation

Valid optimization, but _perform_validation() (used by --validate-only) follows the same pattern
Registry has internal caching (_cached_nodes), so file reads are minimized
Would require broader refactoring to pass registry through the call chain
Recommendation: Address in a separate optimization PR if performance becomes an issue

Warning #4: Test helper function

The invoke_cli() pattern is consistent with test_validate_only.py
Changing it would create inconsistency with existing tests
Both work correctly, so this is a style preference

Warnings #5-9: Low priority, can be addressed in future cleanup PRs

The PR is ready for merge after CI passes.

spinje added 2 commits January 14, 2026 12:00

fix: address review feedback for validation unification [skip review]

4fb2657

- Remove unused metrics_collector parameter from _validate_before_execution() - Remove overly broad except Exception block that was hiding potential bugs (let WorkflowValidator exceptions propagate naturally)

spinje merged commit 243e4dc into main Jan 14, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: unify pre-execution validation with --validate-only (fixes #66)#67

fix: unify pre-execution validation with --validate-only (fixes #66)#67
spinje merged 2 commits into
mainfrom
fix/unify-pre-execution-validation

spinje commented Jan 14, 2026

Uh oh!

claude Bot commented Jan 14, 2026

Uh oh!

spinje commented Jan 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

spinje commented Jan 14, 2026

Summary

Changes

Explanation

Testing

Uh oh!

claude Bot commented Jan 14, 2026

Code Review

Critical — Must Fix Before Merge

1. Registry Instantiation in Validation Path

2. Unused metrics_collector Parameter

Warnings — Should Be Addressed

3. Error Handling Could Be More Specific

4. Test Helper Function Has Overly Complex Setup

5. Missing Test: Warnings Display in Verbose Mode

6. Inconsistent Comment Style

Suggestions — Optional Improvements

7. Consider Extracting JSON Error Output to Helper

8. Test Coverage: Edge Case for Empty Workflows

9. Performance: Could Cache Registry in Context

Positive Observations

Summary

Uh oh!

spinje commented Jan 14, 2026

Response to Review

Fixed ✅

Not Addressed (by design)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

2. Unused `metrics_collector` Parameter