Skip to content

fix: unify pre-execution validation with --validate-only (fixes #66)#67

Merged
spinje merged 2 commits into
mainfrom
fix/unify-pre-execution-validation
Jan 14, 2026
Merged

fix: unify pre-execution validation with --validate-only (fixes #66)#67
spinje merged 2 commits into
mainfrom
fix/unify-pre-execution-validation

Conversation

@spinje
Copy link
Copy Markdown
Owner

@spinje spinje commented Jan 14, 2026

Summary

Unifies validation behavior between normal workflow execution and --validate-only mode. Previously, normal execution used weaker schema-only validation while --validate-only used full WorkflowValidator. This caused unknown node types to show ugly tracebacks instead of clean error messages.

Changes

  • Add _validate_before_execution() function using full WorkflowValidator with real execution params
  • Remove _validate_workflow_structure() and _validate_and_handle_workflow_errors() (dead code)
  • Remove unused validate_ir import from main.py
  • Add validation call after _prepare_execution_environment() but before execute_workflow()
  • Update test fixtures to patch WorkflowValidator.validate instead of removed validate_ir
  • Add new test file test_validation_before_execution.py with 5 tests

Explanation

The CLI had two different validation paths:

Mode Before After
Normal execution validate_ir() (schema only) WorkflowValidator.validate() (full)
--validate-only WorkflowValidator.validate() (full) WorkflowValidator.validate() (full)
--auto-repair WorkflowValidator.validate() (full) WorkflowValidator.validate() (full)

The key insight was that validation should happen AFTER _prepare_execution_environment() creates enhanced_params, so we can validate with real params instead of dummy placeholders. This provides better template validation than --validate-only mode.

Before (ugly traceback):

ERROR: Node instantiation failed
Traceback (most recent call last):
  ...
pflow.runtime.compiler.CompilationError: compiler: Node type 'nonexistent-node' not found
...
❌ Planning failed: compiler: Node type 'nonexistent-node' not found

After (clean error):

✗ Static validation failed:
  • Unknown node type: 'nonexistent-node'

Testing

  • All 3985 tests pass
  • make check passes (linting, type checking)
  • New test file verifies:
    • Unknown nodes are caught before any node executes
    • Clean error messages (no tracebacks)
    • JSON format works correctly
    • Valid workflows still execute normally
    • Validation is consistent between --validate-only and normal execution

Run make test to verify all tests pass.


🤖 Implemented by Claude

Previously, normal workflow execution used weaker validation (schema-only
via validate_ir) while --validate-only used full WorkflowValidator.
This caused unknown node types to show ugly tracebacks instead of clean
error messages.

Changes:
- Add _validate_before_execution() using full WorkflowValidator with real params
- Remove _validate_workflow_structure() and _validate_and_handle_workflow_errors()
- Remove unused validate_ir import from main.py
- Add validation call after _prepare_execution_environment() in execute_json_workflow()
- Update tests to patch WorkflowValidator.validate instead of removed validate_ir

Benefits:
- Consistent validation across all modes (normal, --validate-only, --auto-repair)
- Clean error messages for unknown node types
- Validation happens BEFORE any nodes execute
- Removed ~65 lines of duplicate validation logic

Fixes #66
- Remove unused metrics_collector parameter from _validate_before_execution()
- Remove overly broad except Exception block that was hiding potential bugs
  (let WorkflowValidator exceptions propagate naturally)
@claude
Copy link
Copy Markdown

claude Bot commented Jan 14, 2026

Code Review

This PR successfully unifies validation behavior between normal execution and --validate-only mode. The implementation is clean, well-tested, and solves a real user experience problem. However, there are a few areas that need attention before merge.


Critical — Must Fix Before Merge

1. Registry Instantiation in Validation Path

Location: src/pflow/cli/main.py:2092

registry = Registry()

Issue: Creating a fresh Registry() instance on every validation call triggers a full rescan of the registry from disk. This is wasteful and could cause performance issues, especially if validation is called multiple times or in loops.

Fix: Follow the existing pattern in _perform_validation() (line 1966) and other parts of the codebase - instantiate the registry once and reuse it. Consider:

  1. Pass registry as a parameter to _validate_before_execution() (preferred - maintains functional purity)
  2. Cache registry in context object
  3. Use a module-level cached registry instance

Recommendation: Add an optional registry parameter with lazy instantiation:

def _validate_before_execution(
    ctx: click.Context,
    ir_data: dict[str, Any],
    execution_params: dict[str, Any],
    output_format: str,
    verbose: bool,
    metrics_collector: Any | None,
    registry: Registry | None = None,  # Allow caller to provide
) -> None:
    ...
    if registry is None:
        registry = Registry()
    ...

Then pass the registry from the caller if available.


2. Unused metrics_collector Parameter

Location: src/pflow/cli/main.py:2073

def _validate_before_execution(
    ...
    metrics_collector: Any | None,  # <-- Never used
) -> None:

Issue: The metrics_collector parameter is documented and passed but never actually used in the function body. The comment at line 2119 even acknowledges this: "Not including metrics since validation fails before execution starts"

Impact:

  • Misleading function signature
  • Future maintainers may try to use it
  • Violates principle of minimal interfaces

Fix: Either:

  1. Remove the parameter entirely (preferred - YAGNI principle)
  2. If you anticipate needing it for validation metrics in the future, add a TODO comment explaining the plan

Recommendation: Remove it. The function calls ctx.exit(1) on validation failure, so metrics would never be collected anyway.

def _validate_before_execution(
    ctx: click.Context,
    ir_data: dict[str, Any],
    execution_params: dict[str, Any],
    output_format: str,
    verbose: bool,
) -> None:

Update the call site at line 2174 accordingly.


Warnings — Should Be Addressed

3. Error Handling Could Be More Specific

Location: src/pflow/cli/main.py:2101-2107

except Exception as e:
    # Handle unexpected validation errors
    if output_format == "json":
        click.echo(json.dumps({"success": False, "error": f"Validation error: {e}"}))
    else:
        click.echo(f"✗ Validation error: {e}", err=True)
    ctx.exit(1)

Issue: Catching all Exceptions is too broad. If WorkflowValidator.validate() has a bug (e.g., KeyError, AttributeError), this will silently convert it to a user-facing validation error instead of exposing the actual bug.

Impact:

  • Makes debugging harder
  • Hides potential bugs in the validator
  • Users see "Validation error: 'nodes'" instead of a proper traceback

Recommendation: Only catch expected exceptions or let unexpected ones bubble up (they'll be caught by Click's error handler):

# Remove the try-except entirely - let WorkflowValidator raise its own errors
errors, warnings = WorkflowValidator.validate(
    workflow_ir=ir_data,
    extracted_params=execution_params,
    registry=registry,
    skip_node_types=False,
)

If you need to handle specific exceptions from the validator, catch those specifically. But a bare Exception catch is almost always wrong.


4. Test Helper Function Has Overly Complex Setup

Location: tests/test_cli/test_validation_before_execution.py:17-66

The invoke_cli() helper manually manages sys.argv, sys.stdout, sys.stderr, and sys.stdin. While functional, this is fragile and duplicates logic.

Issue:

  • Manual stream management is error-prone
  • If an exception occurs before the finally block, state could leak
  • The custom Result class duplicates Click's testing infrastructure

Recommendation: Use Click's built-in CliRunner for testing:

from click.testing import CliRunner
from pflow.cli.main_wrapper import cli_main

def invoke_cli(args: list[str]) -> Any:
    runner = CliRunner(mix_stderr=False)
    return runner.invoke(cli_main, args)

This is more maintainable and follows Click's recommended testing patterns. The CliRunner handles all stream capture and restoration automatically.


5. Missing Test: Warnings Display in Verbose Mode

Location: src/pflow/cli/main.py:2126-2128

# Warnings are non-blocking, just display them in verbose mode
if warnings and output_format != "json" and verbose:
    for warning in warnings:
        click.echo(f"⚠️  {warning}", err=True)

Issue: This code path is not tested. The test file has 5 tests, but none verify that warnings are displayed correctly in verbose mode.

Recommendation: Add a test case:

def test_warnings_displayed_in_verbose_mode(self, tmp_path: Path) -> None:
    """Warnings should be shown in verbose mode."""
    # Create workflow that triggers a warning (e.g., runtime-validated template)
    workflow = {...}
    
    result = invoke_cli(["--verbose", str(workflow_path)])
    
    assert result.exit_code == 0
    assert "⚠️" in result.stderr  # Verify warning is shown

6. Inconsistent Comment Style

Location: src/pflow/cli/main.py:2171-2174

# Validate before execution (if not using auto-repair)
# Auto-repair mode handles validation inside execute_workflow() with repair capability
if not auto_repair:
    _validate_before_execution(ctx, ir_data, enhanced_params, output_format, verbose, metrics_collector)

Issue: The comment is accurate but verbose. The function name _validate_before_execution already conveys intent. The auto-repair explanation is valuable but could be a docstring or more concise.

Recommendation: Simplify:

# Skip validation if auto-repair enabled (executes inside execute_workflow with repair)
if not auto_repair:
    _validate_before_execution(...)

Or move the explanation to _validate_before_execution's docstring since it explains the relationship between validation and repair.


Suggestions — Optional Improvements

7. Consider Extracting JSON Error Output to Helper

Locations: Lines 2103-2107, 2110-2120

Both error paths (exception and validation errors) manually construct JSON error output. Consider extracting this pattern:

def _create_validation_error_json(error_msg: str, validation_errors: list[str] | None = None, 
                                   metadata: dict | None = None, verbose: bool = False) -> str:
    output = {"success": False, "error": error_msg}
    if validation_errors:
        output["validation_errors"] = validation_errors
    if metadata:
        output["metadata"] = metadata
    return json.dumps(output, indent=2 if verbose else None)

This would reduce duplication and ensure consistent JSON error format across the codebase.


8. Test Coverage: Edge Case for Empty Workflows

The tests don't cover edge cases like:

  • Workflows with zero nodes
  • Workflows with nodes but no edges
  • Workflows with circular dependencies (though data flow validation should catch this)

Consider adding negative test cases to ensure validation handles degenerate inputs gracefully.


9. Performance: Could Cache Registry in Context

Since the registry is needed for both validation and execution, consider caching it in ctx.obj after first instantiation:

registry = ctx.obj.get("registry")
if registry is None:
    registry = Registry()
    ctx.obj["registry"] = registry

This would avoid multiple rescans during a single CLI invocation.


Positive Observations

Excellent test quality - The "proof file" pattern in test_unknown_node_caught_before_any_execution is clever and validates the core contract clearly

Clear documentation - The PR description, code comments, and test docstrings explain the "why" behind the change

Proper separation of concerns - Validation logic lives in WorkflowValidator, CLI just orchestrates

Consistent error formatting - Uses existing format_validation_failure formatter for text output

Dead code removal - Removing _validate_and_handle_workflow_errors and _validate_workflow_structure is the right call

Backwards compatibility - Auto-repair mode continues to work as before


Summary

This is a solid PR that addresses a real usability issue. The critical fixes are straightforward:

  1. Fix registry instantiation (reuse instead of recreate)
  2. Remove unused metrics_collector parameter

The warnings are quality improvements that would prevent future issues. The suggestions are optional optimizations.

After addressing the critical issues, this is ready to merge. Great work on the comprehensive test coverage!


Estimated fix time: 10-15 minutes for critical issues, 30 minutes for warnings.

Risk level: Low - changes are isolated to validation flow, well-tested.

@spinje
Copy link
Copy Markdown
Owner Author

spinje commented Jan 14, 2026

Response to Review

Thanks for the thorough review! I've addressed the critical issues.

Fixed ✅

Critical #2: Unused metrics_collector parameter

  • Removed the parameter from _validate_before_execution() and its call site
  • Commit: 4fb2657

Critical #3: Bare except Exception

  • Removed the try-except block entirely
  • WorkflowValidator.validate() exceptions now propagate naturally, making debugging easier
  • Commit: 4fb2657

Not Addressed (by design)

Critical #1: Registry instantiation

  • Valid optimization, but _perform_validation() (used by --validate-only) follows the same pattern
  • Registry has internal caching (_cached_nodes), so file reads are minimized
  • Would require broader refactoring to pass registry through the call chain
  • Recommendation: Address in a separate optimization PR if performance becomes an issue

Warning #4: Test helper function

  • The invoke_cli() pattern is consistent with test_validate_only.py
  • Changing it would create inconsistency with existing tests
  • Both work correctly, so this is a style preference

Warnings #5-9: Low priority, can be addressed in future cleanup PRs


The PR is ready for merge after CI passes.

@spinje spinje merged commit 243e4dc into main Jan 14, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant