feat: add smart output display for registry run (closes #3)#4
Conversation
Some MCP servers (e.g., klavis-youtube-transcripts) incorrectly return str(dict) instead of json.dumps(dict), producing Python repr format with single quotes that breaks JSON tools like jq. Changes: - Add ast.literal_eval() fallback in _safe_parse_json() when JSON parsing fails for dict/list-like strings - Add type checks to preserve pre-parsed dict/list in content.text - Add dict/list preservation in _extract_unknown_content() and _extract_result() fallback - Add 9 new tests for pre-parsed data and Python repr handling - Add bugfix log entry documenting the fix The fix is backward compatible: JSON is always tried first, and ast.literal_eval() is only used as a fallback for non-compliant servers.
Code Review - PR #4: Smart Output Display for Registry RunSummaryThis PR extends Task 89's structure-only mode with configurable output modes. The implementation is well-structured and follows the project's patterns. I've identified a few critical issues and several suggestions for improvement. ✅ Strengths
🔴 Critical — Must fix before merge1. Missing truncation indicator in string formattingFile: The string truncation logic is inconsistent. When truncating, you add Current: if len(value) > SMART_MAX_STRING_LENGTH:
truncated = value[: SMART_MAX_STRING_LENGTH - 3] + "..."
return f'"{truncated}" (truncated)', TrueIssue: The calculation is off - you're taking 197 chars ( Fix: if len(value) > SMART_MAX_STRING_LENGTH:
# Reserve space for "..." (3) + " (truncated)" (12) = 15 chars
max_content = SMART_MAX_STRING_LENGTH - 15
truncated = value[:max_content] + "..."
return f'"{truncated}" (truncated)', TrueOr if the 200-char limit should apply to the displayed output (not the original value): if len(value) > SMART_MAX_STRING_LENGTH:
truncated = value[:SMART_MAX_STRING_LENGTH - 3] + "..."
return f'"{truncated}" (truncated)', True
return f'"{value}"', False2. Inconsistent truncation tracking for collectionsFile: def _format_collection_smart(value: dict | list) -> tuple[str, bool]:
"""Format dict or list for smart display."""
is_dict = isinstance(value, dict)
count = len(value)
threshold = SMART_MAX_DICT_KEYS if is_dict else SMART_MAX_LIST_ITEMS
summary = f"{{...{count} keys}}" if is_dict else f"[...{count} items]"
if count > threshold:
return summary, False # ❌ Should return True (was truncated/summarized)Issue: When a collection is summarized (e.g., Fix: if count > threshold:
return summary, True # Mark as truncated so hint appears
|
| Category | Score | Notes |
|---|---|---|
| Code Quality | 7/10 | Several bugs in truncation logic |
| Test Coverage | 8/10 | Good coverage, missing edge cases |
| Performance | 9/10 | No major concerns |
| Security | 9/10 | Handles untrusted input safely |
| Documentation | 9/10 | Excellent spec and manual tests |
| Overall | 8/10 | Strong PR with a few critical fixes needed |
Recommended Actions
- Fix truncation calculation (issues docs: Set up Mintlify documentation (Task 93) #1, LLM node swallows exception details; trace shows incorrect success status #6) - ensures 200-char limit is respected
- Fix collection truncation flag (issue feat: unify LLM usage via Simon Willison's llm library (Task 95) #2, docs: separate parallel execution into Task 96 (batch) and Task 39 (fan-out) #9) - ensures hint appears when needed
- Add edge case tests - particularly for
Nonevalues and empty collections - Extract OUTPUT_MODES constant (issue Model suggestions don't work for API keys stored in environment variables or pflow settings #7) - DRY principle
- Improve hint message (issues feat(claude-code): migrate to Claude Agent SDK and streamline parameters #10, feat: add batch processing with sequential and parallel execution #11) - better UX
Once these critical issues are addressed, this PR is ready to merge. Great work on the comprehensive testing and documentation! 🎉
Extends Task 89's structure-only mode with configurable output modes:
- smart (default): Shows template paths WITH values, truncates large values
(>200 chars for strings, >5 items for collections), applies smart filtering
- structure: Original Task 89 behavior - paths only, no values
- full: Shows all paths with full values, no filtering or truncation
Changes:
- Add output_mode field to RegistrySettings with validator
- Add CLI commands: pflow settings registry output-mode [smart|structure|full]
- Implement format_value_for_smart_display() with truncation tracking
- Implement format_smart_paths_with_values() and format_full_paths_with_values()
- Update format_structure_output() to route based on output_mode
- Update CLI and MCP to read setting and pass output_mode
- Add 10 new tests for smart output mode and settings
Output format examples:
- smart: ${stdout} (str) = "hello world"
- smart truncated: ${content} (str) = "first 197 chars..." (truncated)
- smart dict: ${result.data} (dict) = {...5 keys}
- structure: ${stdout} (str)
Code review fixes: - Remove default from _store_output() method signature (warning #4) - Add explicit strip_newline=False for stderr call - Fix type hints: str | bytes with isinstance narrowing (warning #5) - Update docstrings to clarify stdout-only behavior (warning #3) - Add test for binary data + strip_newline interaction (critical #1) - Add test for empty stdout edge case (critical #2) - Update agent instructions to clarify stderr is never modified All code review items addressed except Windows CRLF (documented as Unix-first).
…eview] Review feedback addressed: 1. Identity check pattern fragility (Warning #1): - Changed _coerce_provided_input() to return (value, was_coerced) tuple - Makes intent explicit and prevents subtle bugs from identity assumptions 2. Type alias normalization (Warning #4): - _normalize_type() now preserves case for unknown types - Supports future custom/user-defined types 3. dict/list to string uses Python repr (Review #2): - _coerce_to_string() now uses json.dumps() for containers - Produces valid JSON strings consistent with coerce_to_declared_type() 4. Env/defaults values not coerced (Review #2): - Apply coercion to values from os.environ, settings_env, and defaults - Ensures "respect declared types" works for all input sources 5. None value handling: - Preserve None values (don't coerce to "None" string) - Optional inputs without defaults correctly resolve to None Also: - Added 4 new tests for env/settings coercion - Documented lenient coercion behavior in docstrings - Created Task 120 for future strict input type validation
…49 follow-up) Adversarial verification of the Task 149 refactor found three real regressions the test suite didn't catch (CliRunner intercepts stderr differently from a real subprocess, so logging-based corruption was invisible to the mocked tests). This commit fixes them. Fix #2 — Nested workflow rendering (OutputController partial-line tracking): - OutputController now tracks _partial_line_open so every _handle_node_* path self-terminates any open partial line before writing a fresh line. - _handle_node_complete / _cached / _warning now accept node_id + indent and re-emit a fresh " node_id" lead-in if the partial line was closed between node_start and node_complete (e.g. by a sub-workflow's own progress events or a logger write). - Before: ` nested_call... inner_a... ✓ 0.5s` (parent's partial line concatenated with child's first start, parent's completion orphaned). - After: clean two-level indented rendering, parent re-emitted on its own line as ` nested_call ✓ 1.0s`. Fix #3 — Drop redundant logger.warning("Command failed with exit code N") from shell.py's post() method: - That call wrote directly to stderr via Python logging, bypassing OutputController, and concatenated onto the partial `node_id...` line as `will_fail...WARNING: Command failed with exit code 1\n ✗ Failed`. - The information is redundant — pflow's diagnostic pipeline already surfaces the exact same error via call_completion_callback (which reads shared["exit_code"] and builds the message) and via the post-execution diagnostic block that prints shared["error"]. - Updated test_auto_handling.py's TestLoggingBehavior to assert on the canonical behavior (action="error" + shared["exit_code"] + shared["error"]) instead of the removed logger.warning. Fix #1 — JSON-encode structured outputs in safe_output: - Before: `{'key': 'value', 'n': 42}` (Python repr, single quotes — jq cannot parse). - After: `{"key": "value", "n": 42}` (valid JSON, `pflow foo | jq` works). - Strings still pass through verbatim. Non-JSON-serializable objects fall back to str() so safe_output never raises from the output path. - Updated test_complex_output_types to verify JSON tokens instead of str(value) — Python True/None were pre-existing never-worked cases exposed by the GH #194 routing fix (data now actually lands on stdout where consumers can pipe it). Fix #4 — Real subprocess regression tests: - New tests/test_cli/test_progress_streaming_subprocess.py with two tests that spawn `uv run pflow` in a real subprocess (not CliRunner) so the captured stderr reflects what an agent or CI system actually sees: * test_failing_shell_node_progress_line_is_clean — asserts no `will_fail...WARNING:` or `will_fail...Command failed` corruption. * test_nested_workflow_progress_lines_are_not_concatenated — asserts no `nested_call... inner_a` concatenation + that the parent's completion line is re-emitted attached to its node id. - Guards against both Fix #2 and Fix #3 regressing. Verification: - `make test`: 4641 passed, 9 skipped. - `make check`: ruff + mypy + deptry clean. - Manual subprocess: failing shell, nested workflow, structured outputs, and the original #194 fix all clean.
…kip review] PR #378 review finding #4. Closes the synthetic-builder ↔ production-analyze() fidelity gap in tests/test_core/test_cache_analysis_renderers.py. What landed - _BUILDER_DOCUMENTED_DEFAULTS frozenset (6 entries) names the AnalysisSummary fields the synthetic builder cannot faithfully model. Tests asserting on these MUST drive analyze() end-to-end. - TestMakeAnalysisShapeParity class with two methods: * test_builder_field_set_matches_dataclass_minus_documented_defaults — uses dataclasses.fields() introspection to fail noisily when a new field is added without builder coverage or allowlist documentation. * test_documented_defaults_get_overwritten_by_production — drives analyze() against a contrived IR + trace that triggers each documented-default overwrite, catching the case where production's overwrite logic is deleted while the allowlist stays stale. - Three renderer tests migrated from synthetic to e2e analyze() calls: * test_json_partial_trace_exposes_evidence_scope_and_observed_models * test_json_summary_exposes_projection_exclusions_and_delta_reason * test_render_json_includes_rollup_workflow_paths_and_unavailable_models_by_workflow Reviewer's two non-issues confirmed via inspection - test_text_summary_renders_blocking_errors_categorically (line 272 in PR baseline) only asserts on builder-populated fields. Kept synthetic. - test_json_emits_root_and_sub_workflow_llm_node_counts already drove analyze() end-to-end against the committed 3-deep fixture. No migration needed. Verification - 5 mutation contracts checked by reverting production code: * Add new AnalysisSummary field → parity method 1 fails naming the field. * Delete observed_models_in_trace overwrite → migration #2 + parity method 2 fail with documented diagnostics. * Delete unavailable_models_by_workflow overwrite → migration #4 + parity method 2 fail. - 6,335 tests passing on default suite. - make check clean (ruff + ruff-format + mypy + deptry). - test_golden_baseline_hashes_match (DD#19) green; test_plan_drift.py 34/34. Plan + progress log - Atomic plan at .taskmaster/tasks/task_159/implementation/fix-plans/ renderer-test-fidelity-shape-parity-plan.md. - Consolidated PR #378 review-fix sweep entry appended to implementation-progress-log.md, documenting all four phases (Phase 1 easy bundle / Phase 2 medium bundle / Phase 3 cohort-key correctness / Phase 4 this commit), the five disputed findings (with citations), the GH #380 follow-up filed, and the cross-cutting insights from the sweep. GH issue #380 filed for the deferred test-bloat parametrize-collapse work.
Summary
Extends Task 89's structure-only mode with configurable output modes for
pflow registry run. Users can now see actual values directly in the output, with intelligent truncation for large data.Closes #3
Changes
output_modefield toRegistrySettings(smart|structure|full)pflow settings registry output-mode [mode]to view/set modeformat_value_for_smart_display()- formats values with(truncated)indicatorformat_smart_paths_with_values()- paths with values, showsread-fieldshint when truncatedformat_full_paths_with_values()- all values without truncationformat_structure_output()now routes based onoutput_modeoutput_modeOutput Modes
smartstructurefullExample Output (Smart Mode)
Files Changed
Testing
output_mode="structure"Documentation
scratchpads/smart-output-display/implementation-spec.md- Full specificationscratchpads/smart-output-display/manual-test-plan.md- Manual testing guide