fix: allow arrays/dicts to serialize to JSON in string contexts by spinje · Pull Request #19 · spinje/pflow

spinje · 2025-12-29T20:22:09Z

Summary

Fixes #18 - Arrays and dicts can now be embedded in shell command strings (and other string parameters) by auto-serializing to JSON.

Changes

Type checker update: Added str, string to compatible types for dict, list, array, object
Shell-unsafe JSON warning: New _warn_shell_unsafe_json() method warns when JSON contains characters that could break shell parsing (', `, $()
Documentation: Updated architecture/reference/template-variables.md with new type matrix and "Shell Command Limitations" section
Tests updated: Modified tests that expected dict→str to fail, added new tests for list→str

Behavior

Template	Data	Result
`"echo '${items}'"`	`["a", "b"]`	✅ Works → `echo '["a", "b"]'`
`"${items}"`	`["a", "b"]`	❌ Blocked (simple template, runtime check preserved)
`"echo '${data}'"`	`{"msg": "it's"}`	⚠️ Warning about shell-unsafe chars

Guardrails Preserved

Simple templates (${items} alone) → runtime check still blocks dict/list→str
Shell-unsafe JSON → warning emitted, suggests using stdin
Type mismatches for non-string targets (dict→int) → still caught

File Changes

 architecture/reference/template-variables.md       | 62 ++++++++++-
 src/pflow/nodes/shell/shell.py                     | 44 ++++++++
 src/pflow/runtime/template_validator.py            |  2 +-
 src/pflow/runtime/type_checker.py                  |  8 +-
 tests/test_runtime/test_template_validator_types.py| 73 ++++++++++---
 tests/test_runtime/test_type_checker.py            | 38 ++++---
 6 files changed, 190 insertions(+), 37 deletions(-)

Testing

Run make test to verify all tests pass (3612 passed).
Run make check to verify linting and type checks pass.

Fixes #13 Shell nodes always output text (stdout is string). When that text is valid JSON like `["a","b","c"]`, batch processing now auto-parses it to a Python list, matching the behavior already in node_wrapper.py for node parameters. Changes: - Add JSON auto-parsing in batch_node.py prep() method - Add type coercion helpers for batch config (defense-in-depth) - Add 25 new tests (10 for JSON parsing, 15 for type coercion) The JSON parsing follows the proven pattern from node_wrapper.py: - Strip whitespace (handles shell output newlines) - 10MB size limit for security - Quick check for '[' prefix before parsing - Graceful fallback if parsing fails

Fixes #15 Three related issues fixed: 1. Batch item alias (`${item}` or custom via `batch.as`) now recognized - Added batch alias extraction to `_extract_node_outputs()` - Added batch alias recognition to `validate_data_flow()` 2. Batch outputs (`${node.results}`, `${node.count}`, etc.) now recognized - Added `_BATCH_OUTPUTS` constant with all batch output definitions - Batch nodes register batch outputs instead of inner node outputs 3. Templates in `batch.items` now extracted for validation - Extended `_extract_all_templates()` to scan `node.batch.items` Added 14 new tests covering batch template validation scenarios.

Clarified comments in the cli-agent-instructions.md file to enhance understanding of shell command usage. Updated the example command to demonstrate the use of pflow variables directly, ensuring they resolve before the shell execution. This change aims to prevent common mistakes and improve the clarity of instructions for users.

Documents lesson learned from batch validation bug fix (GitHub #15): When adding runtime wrappers that change node output structure, compile-time validation systems must also be updated.

Introduces a new JSON file to demonstrate the current workaround for passing arrays via stdin in shell commands, addressing the limitation of embedding arrays directly in command strings. Additionally, adds a design issue report outlining the challenges faced when combining multiple arrays in workflows, along with reproduction steps and potential solutions. Includes minimal reproduction and real-world problem examples to illustrate the issue.

Fixes #18 - Update type checker to allow dict/list/array → str compatibility - Add shell-unsafe JSON warning for apostrophes, backticks, $(...) - Document limitation in template-variables.md - Update tests for new behavior - Fix pre-existing mypy error in template_validator.py Behavior: - Complex templates like "echo '${items}'" auto-serialize arrays to JSON - Simple templates like "${items}" still blocked at runtime (preserves guardrail) - Warning emitted if JSON contains shell-unsafe characters

- Fix ReDoS vulnerability in _warn_shell_unsafe_json() by limiting regex to 500 chars and removing DOTALL flag - Clarify bidirectional type compatibility in type_checker.py comments - Improve shell warning message clarity (mention multiple issues may exist)

claude · 2025-12-29T20:26:09Z

Code Review for PR #19: Array/Dict to JSON String Serialization

Summary

This PR enables arrays and dictionaries to be embedded in string parameters by auto-serializing them to JSON. This solves a real problem where shell nodes output JSON strings that need to flow into batch processing or be embedded in commands. The implementation is well-structured with proper guardrails, but there are some areas that need attention before merge.

✅ Strengths

Solves a Real Problem: The webpage-to-markdown example clearly demonstrates the value - shell output is always strings, and needing to parse JSON is a common pattern.
Pattern Reuse: JSON auto-parsing follows the proven pattern from node_wrapper.py (lines 746-781), ensuring consistency across the codebase.
Comprehensive Documentation: The updates to template-variables.md clearly explain the shell command limitations and provide safe alternatives (using stdin).
Safety Guardrails Preserved:
- Simple templates (${items} alone) still block dict/list→str at runtime
- Shell-unsafe JSON triggers warnings
- Type mismatches for non-string targets still caught
Thorough Testing: 25 new tests for batch processing, updated type checker tests - solid coverage.
Excellent Bug Documentation: The bugfix-log entry is exemplary - clear repro steps, root cause analysis, and lessons learned.

⚠️ Critical Issues - Must Fix Before Merge

1. Shell Injection Risk in `_warn_shell_unsafe_json()` ⚠️ SECURITY

Location: src/pflow/nodes/shell/shell.py:306-345

Issue: The regex pattern r"\[.*?\]|\{.*?\}" with re.DOTALL can match across thousands of lines, causing:

ReDoS (Regular Expression Denial of Service) on malicious input
False positives matching unrelated brackets in long commands

Current Code:

json_pattern = r"\[.*?\]|\{.*?\}"
json_matches = re.findall(json_pattern, command, re.DOTALL)

Fix: Limit the pattern to single lines and add size constraints:

# Only match JSON-like patterns on a single line (no DOTALL)
# Limit to reasonable JSON snippet size (500 chars)
json_pattern = r"\[.{0,500}?\]|\{.{0,500}?\}"
json_matches = re.findall(json_pattern, command)

Why This Matters: Without this fix, a malicious workflow with a very long command could hang the validator.

2. Type Checker Matrix Inconsistency

Location: src/pflow/runtime/type_checker.py:30-58

Issue: The type compatibility matrix allows dict → str and list → str, but the comment says "dict can serialize to JSON string" when it should say "dict/list auto-parse FROM JSON strings when coming FROM str".

Current Comment (line 11):

# Note on str → dict/list compatibility (auto-parse feature):

The matrix (lines 54-57):

"dict": ["any", "dict", "object", "str", "string"],  # dict can serialize to JSON string
"list": ["any", "list", "array", "str", "string"],   # list can serialize to JSON string

Issue: This creates confusion - the matrix allows BOTH directions:

str → dict/list (auto-parse JSON strings) ✅ Correct
dict/list → str (serialize to JSON) ✅ Also correct now

But the comments don't clarify this is bidirectional. This needs clearer documentation.

Fix: Update the comments to be explicit:

"dict": ["any", "dict", "object", "str", "string"],  # dict ↔ str bidirectional (serialize/parse JSON)
"list": ["any", "list", "array", "str", "string"],   # list ↔ str bidirectional (serialize/parse JSON)

3. Batch Node JSON Parsing Size Limit Too High

Location: src/pflow/runtime/batch_node.py:247

Issue: 10MB limit for JSON parsing is very generous and could cause memory issues or slow parsing on malicious input.

Current Code:

MAX_JSON_SIZE = 10 * 1024 * 1024  # 10MB limit

Recommendation: Consider 1MB (1024 * 1024) instead. Batch items with 10MB of JSON are likely doing something wrong. If there's a legitimate use case for 10MB arrays, document it in a comment.

Suggested Change:

# Reasonable limit for batch items JSON
# Larger arrays should be processed from files, not inline JSON
MAX_JSON_SIZE = 1 * 1024 * 1024  # 1MB limit

⚠️ Warnings - Should Be Addressed

4. Inconsistent Validation Between Template Validator and Runtime

Location: Multiple files

Issue: The PR adds runtime JSON auto-parsing but the template_validator.py static validation doesn't know about batch wrappers. This was documented in the pitfalls file, but it's not fixed.

Evidence: From .taskmaster/knowledge/pitfalls.md:360+:

Template validation happens at compile-time using static analysis. It has no knowledge of runtime wrappers that transform output structures.

Impact: Valid batch workflows might fail validation with confusing errors like:

${item} reported as "undefined input"
${node.results} fails with suggestion to use ${node.response}

Status: Documented in pitfalls but not fixed in this PR.

Recommendation: Either:

Fix the validation in this PR (add batch-aware output detection to template_validator.py)
File a follow-up issue and link it in the PR description

5. Shell Node Warning Is Too Noisy

Location: src/pflow/nodes/shell/shell.py:336-345

Issue: The warning triggers on the FIRST unsafe char and returns immediately. This is good for avoiding spam, but the warning message doesn't indicate this is just one of potentially many issues.

Current Code:

logger.warning(
    f"Command contains JSON with {name} ('{char}') which may break shell parsing. "
    f"Consider using 'stdin' parameter instead for reliable data passing.",
    ...
)
return  # Only warn once per command

Suggestion: Make it clear this is a general check:

logger.warning(
    f"Command contains JSON with shell-unsafe characters (found: {name} '{char}'). "
    f"This may break shell parsing. Consider using 'stdin' parameter instead.",
    ...
)
return  # Only warn once per command (multiple issues may exist)

6. Test Naming Inconsistency

Location: tests/test_runtime/test_batch_node.py

Issue: New test classes use different naming patterns:

TestItemsJsonAutoParsing (10 tests) - Clear
TestConfigTypeCoercion (15 tests) - Clear

But existing tests use:

TestBatchExecution
TestInputOutputFormats

Minor Issue: Not critical, but consistency helps navigation. Consider prefixing all test classes with TestBatch for easy filtering.

7. Boolean Coercion Edge Case

Location: src/pflow/runtime/batch_node.py:144-157

Issue: The boolean coercion handles "yes"/"no" but not "y"/"n" or "1"/"0" (numeric). This might surprise users.

Current Code:

if lower in ("true", "1", "yes"):
    return True
if lower in ("false", "0", "no", ""):
    return False

Observation: This is actually fine - it handles the common cases. Just noting it for future reference if users request "y"/"n" support.

💡 Suggestions - Optional Improvements

8. Consider Extracting Shell Safety Checks to Utility Module

Location: src/pflow/nodes/shell/shell.py

Observation: The shell node now has THREE safety check methods:

_check_command_template_safety() (lines 257-305)
_warn_shell_unsafe_json() (lines 306-345)
_adapt_stdin_to_string() (lines 347-421)

Suggestion: These could be extracted to src/pflow/nodes/shell/safety_utils.py for:

Better testability (unit test the safety checks in isolation)
Reusability if other nodes need similar checks
Cleaner shell.py file (currently 760 lines)

Not Blocking: This can be a follow-up refactor.

9. Add Integration Test for Real-World Workflow

Location: tests/test_integration/

Suggestion: The webpage-to-markdown-parallel.json workflow is a great real-world example. Consider adding an integration test that:

Mocks the HTTP fetch to return known HTML
Validates the shell→batch→LLM flow works
Verifies JSON array parsing happens correctly

Value: Ensures the real use case that motivated this PR doesn't regress.

10. Document the Bidirectional Nature in Architecture Docs

Location: architecture/reference/template-variables.md

Observation: The doc update explains shell command limitations well, but doesn't explicitly call out that the type system is now bidirectional for dict/list ↔ str.

Suggested Addition (after line 602):

#### Bidirectional Type Compatibility

The type system now supports bidirectional conversion between structured types and strings:

- **dict/list → str**: Automatically serializes to JSON (e.g., for embedding in commands)
- **str → dict/list**: Automatically parses JSON (e.g., shell output into batch items)

Both directions are validated at compile-time and executed at runtime.

📊 Test Quality Assessment

Test Coverage: ✅ Excellent

25 new batch processing tests
Updated type checker tests to reflect bidirectional compatibility
Edge cases covered (None values, empty arrays, large JSON)

Test Clarity: ✅ Good

Test names clearly describe what they verify
Good mix of positive and negative cases

Potential Gaps:

No test for the ReDoS scenario in _warn_shell_unsafe_json() (see Critical Issue docs: Set up Mintlify documentation (Task 93) #1)
No test verifying the 10MB limit in batch JSON parsing actually works
No integration test for the real-world webpage-to-markdown workflow

🎯 Summary & Recommendation

Overall Assessment: This is a well-thought-out feature with good documentation and testing. The core design is sound, but there are security and clarity issues that must be addressed.

Before Merge:

✅ Must Fix: ReDoS vulnerability in _warn_shell_unsafe_json() (Critical docs: Set up Mintlify documentation (Task 93) #1)
✅ Must Fix: Clarify bidirectional nature in type_checker.py comments (Critical feat: unify LLM usage via Simon Willison's llm library (Task 95) #2)
⚠️ Should Fix: Decide on batch JSON size limit (10MB vs 1MB) and document reasoning (Critical feat: Smart output display for registry run #3)
⚠️ Should Fix: Either fix batch validation or file follow-up issue (Warning feat: add smart output display for registry run (closes #3) #4)

After Addressing Critical Issues: ✅ Approve and Merge

The suggestions and warnings are valuable for follow-up work but shouldn't block the PR.

💬 Questions for Author

What's the largest batch items JSON you've seen in practice? Does 10MB limit have a real use case?
Is the batch validation issue (Warning feat: add smart output display for registry run (closes #3) #4) being tracked separately, or should we fix it in this PR?
Did you consider extracting shell safety checks to a separate module (Suggestion feat: detect API keys from env vars and pflow settings #8)?

Great work on solving a real problem with a thoughtful approach! 🚀

spinje · 2025-12-29T22:40:15Z

Review Fixes Applied

Thanks for the thorough review! I've addressed the critical issues in commit 1027630:

✅ Fixed

#1 ReDoS Vulnerability (Critical)

Changed regex from r"\[.*?\]|\{.*?\}" with re.DOTALL to r"\[.{0,500}?\]|\{.{0,500}?\}"
Removed DOTALL flag - now only matches within single lines
Added 500 char limit to prevent catastrophic backtracking
Verified: 10,000 char command completes in <1ms

#2 Type Checker Comment Clarity (Critical)

Added explicit "BIDIRECTIONAL JSON compatibility" header
Documented both directions with examples:
- str → dict/list (auto-parse)
- dict/list → str (auto-serialize)
Updated inline comments to use ↔ notation

#5 Warning Message Clarity (Warning)

Changed to: "shell-unsafe characters (found: {name} '{char}')"
Added comment: # Only warn once per command (multiple issues may exist)

❌ Not Addressed (Pre-existing, out of scope)

#3 Batch Node 10MB Limit

This is pre-existing code, not introduced in this PR
Would require separate discussion about appropriate limits

#4 Batch Validation Inconsistency

Already documented in .taskmaster/knowledge/pitfalls.md
Tracked as known limitation, not a regression from this PR

Answers to Questions

What's the largest batch items JSON you've seen in practice?

In the webpage-to-markdown workflow, batch items are image descriptions - typically a few KB each. 10MB would be extreme. However, since this limit is pre-existing, I'd prefer to address it in a separate PR if we want to change it.

Is the batch validation issue being tracked separately?

Yes, documented in pitfalls.md (lines 360+). It's a fundamental limitation of static validation vs runtime wrappers.

Did you consider extracting shell safety checks?

Good suggestion for follow-up. The shell.py file is getting large (760 lines). A safety_utils.py extraction would improve testability.

…kip review] PR #378 review finding #4. Closes the synthetic-builder ↔ production-analyze() fidelity gap in tests/test_core/test_cache_analysis_renderers.py. What landed - _BUILDER_DOCUMENTED_DEFAULTS frozenset (6 entries) names the AnalysisSummary fields the synthetic builder cannot faithfully model. Tests asserting on these MUST drive analyze() end-to-end. - TestMakeAnalysisShapeParity class with two methods: * test_builder_field_set_matches_dataclass_minus_documented_defaults — uses dataclasses.fields() introspection to fail noisily when a new field is added without builder coverage or allowlist documentation. * test_documented_defaults_get_overwritten_by_production — drives analyze() against a contrived IR + trace that triggers each documented-default overwrite, catching the case where production's overwrite logic is deleted while the allowlist stays stale. - Three renderer tests migrated from synthetic to e2e analyze() calls: * test_json_partial_trace_exposes_evidence_scope_and_observed_models * test_json_summary_exposes_projection_exclusions_and_delta_reason * test_render_json_includes_rollup_workflow_paths_and_unavailable_models_by_workflow Reviewer's two non-issues confirmed via inspection - test_text_summary_renders_blocking_errors_categorically (line 272 in PR baseline) only asserts on builder-populated fields. Kept synthetic. - test_json_emits_root_and_sub_workflow_llm_node_counts already drove analyze() end-to-end against the committed 3-deep fixture. No migration needed. Verification - 5 mutation contracts checked by reverting production code: * Add new AnalysisSummary field → parity method 1 fails naming the field. * Delete observed_models_in_trace overwrite → migration #2 + parity method 2 fail with documented diagnostics. * Delete unavailable_models_by_workflow overwrite → migration #4 + parity method 2 fail. - 6,335 tests passing on default suite. - make check clean (ruff + ruff-format + mypy + deptry). - test_golden_baseline_hashes_match (DD#19) green; test_plan_drift.py 34/34. Plan + progress log - Atomic plan at .taskmaster/tasks/task_159/implementation/fix-plans/ renderer-test-fidelity-shape-parity-plan.md. - Consolidated PR #378 review-fix sweep entry appended to implementation-progress-log.md, documenting all four phases (Phase 1 easy bundle / Phase 2 medium bundle / Phase 3 cohort-key correctness / Phase 4 this commit), the five disputed findings (with citations), the GH #380 follow-up filed, and the cross-cutting insights from the sweep. GH issue #380 filed for the deferred test-bloat parametrize-collapse work.

…le [skip review] Adds `.taskmaster/tasks/task_159/baseline/` as a byte-stable regression oracle for the upcoming Task 160 cache_analysis refactor. Per-case folder: workflow.pflow.md + command.sh + README.md (mutation contract) + expected-{stdout,stderr,exit-code}.txt. Harness wraps each case in `HOME=$BASELINE_HOME` redirect for filesystem isolation; normalize.py redacts non-deterministic content (timestamps, paths, hashes, sub-cent costs) without touching warning IDs / severity / JSON ordering. Surfaces (63 cases total): 01 parser errors 10 02 validator errors 8 03 analyze-cache modes 8 04 warning catalog (one per ID) 20 (15/20 trigger target ID; 5 in F-02) 05 advisory cases 5 12 real-world lyrics-generator 5 (committed snapshot of music workflow tree) 13 happy-path interactions 4 (batch+cache+prewarm, sub-workflow with cache, 3-level nesting) 14 Pitfall #19 defenses 3 (dotted-path through NamespacedSharedStore, file-resolve boundary) `./regenerate.sh` writes expected-* files; `./verify.sh` re-runs and diffs (drift = stop signal). Exit 0 = clean. Trailing-newline + per-line trailing-whitespace handling aligned with pre-commit's trailing-whitespace + end-of-file-fixer hooks so `make check` stays clean. Findings logged in `FINDINGS.md`: - F-01: parser silently splits two-vars-on-one-line (spec promises error) - F-02: 5 catalog warning IDs need more elaborate fixtures than minimal - F-03: `pflow guide <workflow>` auto-detect misses `caching` topic - F-04: `cache.below-min-tokens` false-positive on greenfield when chunks resolve to LLM responses - F-05: `pflow visualize` validates before rendering — blocks on unknown node types in unrelated sub-workflows Audit handoff: `NEXT-AGENT-AUDIT.md` (mission, priorities, 7 worked examples) + `starting-context/braindump-2026-05-08-baseline-audit-handoff.md` (tacit knowledge transfer). Out of scope per user direction: trace format 2.0.0 backcompat (2.0.0 fixture removed); MCP server cases; surfaces 06-11 (deferred).

`pflow analyze-cache --from-trace` previously rendered three delta lines (Actual savings + First-run + Rerun) in trace mode. The first-run-with- cache projection assumes a fresh run with no memo cache hits and no provider implicit caching — neither modeled. When those fire (the common case), the projection diverges from actual cost by an order of magnitude (lyrics-generator: projection said "saves 1%"; actual was 49%) and the two competing numbers anchor agents on the smaller, misleading figure. Option B: drop the first-run delta line in trace mode entirely. Show only the measured number plus the steady-state forward-looking projection. - `_render_summary_deltas` split into 4-line dispatch + two named helpers `_render_trace_deltas` and `_render_greenfield_deltas`. Dispatch on `evidence_scope ∈ {"complete_trace", "truncated_trace_executed_subset"}` rather than `actually_paid_usd is not None` to handle the all-unpriced- trace edge case correctly. - Trace mode renders `Actual savings (this run):` (or `unavailable (projection excludes …)`) + `Rerun delta (projected):`. The `(projected)` suffix on rerun signals "model, not measurement" right at the line. - Greenfield mode unchanged shape: `First-run delta:` + `Rerun delta:`, no `(projected)` suffix (both are projections by construction; the absence of an actual-savings line carries the signal). - The truncated-mode `(executed)` qualifier on rerun was retired — the suppression note already conveys executed-subset context. - `_format_delta` actual-delta label simplified `"actual vs no-cache"` → `"vs no-cache"` (the row label "Actual savings (this run):" already says "actual"; doubled word was a Stage A artifact). - Dead branch `if not in_trace_mode and actual_line:` removed (production-unreachable per searcher #2; only synthetic fixtures hit it). - `_make_analysis` test builder upgraded to set `evidence_scope` based on `actually_paid` (matches production shape; closes a Pitfall #19 gap that searcher #2 specifically warned about under the new dispatch). JSON shape unchanged — all three deltas still on `AnalysisSummary`. Tests: 2 new regression tests with verified mutation contracts; 1 existing test renamed and rewritten for Option B truncated semantics; 1 new mutation guard on the `vs no-cache` label simplification. Three mutation contracts verified by reverting production code and observing the matching test fail. 6,396 default-suite tests pass. 65/65 baselines pass (no drift — the parallel L-12/L-2/L-1 commit had already regenerated the 3 affected baselines anticipating Option B's final shape). Manual smoke confirms trace mode shows actual + rerun (projected) only; greenfield shows first-run + rerun. Closes L-3 from BASELINE-AUDIT Section F.

Five deferred BASELINE-AUDIT findings shipped together as the post-merge-block polish round. Single-render-format changes; no architectural shift. Bundled with the prior in-flight B-9 blocking- errors split work in this commit. - B-6: drop "(ordered by impact)" qualifier when no action has positive savings (render_text.py). - B-11: heterogeneous suffix uses "; plus " separator instead of bare "+" so it doesn't parse as "model X PLUS model Y" (render_text.py). - B-8 + L-7: per-call tokens/cacheable columns widened from :>5 to :>7, (thousands separator + 6-digit alignment) (render_text.py). - L-4: collect skipped sub-workflow paths and emit ONE aggregated note via new _format_skipped_workflows_note helper. Lyrics-generator collapses 15 near-identical "predicted-key matching skipped" notes (~4KB) to one summary line (analyze.py). Tests: 12 new tests including a production-shape integration test (test_predict_cache_keys_aggregates_skip_notes_via_production_path) that drives _predict_cache_keys end-to-end — caught a Pitfall #19 where helper-only tests passed against reverted production code. Each fix has a verified mutation contract. Doc fix: docs/reference/cli/analyze-cache.mdx now reflects the conditional "ordered by impact" claim. Verification: make check clean (ruff + ruff-format + mypy + deptry); make test 6,415 passed, 1 skipped (+11 from baseline); 65/65 baseline cases pass after re-capture of 46 affected cases. Strict-improvement audit confirms all baseline drifts fall into expected classes. Closes B-6, B-8, B-9, B-11, L-4, L-7 from BASELINE-AUDIT.md.

…licates Restructured analyze-cache header for AI-agent readability. The 39-word run-on scale line splits into 3 cohesive lines (`Workflow:` count + invocation status; `Models:` for 2+ models; `Heterogeneous:` for batch sub-workflows declaring `model: ${item.model}`). Single-model workflows keep `using X` inline (pragmatic density). The duplicate `Observed models:` line is dropped — `models_in_use` is a superset of `observed_models_in_trace` in complete trace mode. The sub-workflow breakdown line drops its trailing 15-name CSV; names already appear in `## Per-call cache report` headings and `## Per-child analyze-cache commands`. Latent bug fixed as bonus: static-mode workflows on the no-model-resolved branch previously suppressed the heterogeneous suffix. The static-mode lyrics-generator capture now correctly surfaces `Heterogeneous: generate-chorus-options (model varies per batch item)`. `_format_heterogeneous_line` collapses the old 1-vs-N branches via plain `', '.join`; `_format_heterogeneous_suffix` deleted as dead code. Files: render_text.py (+50/-65 LOC, net -15 simpler), test_cache_analysis_renderers.py (3 helper-direct tests migrated to end-to-end render assertions per Pitfall #19; 2 N-9 CSV literal updates; 1 multi-model test rewritten; 1 new N-2 regression test). 4 mutation contracts verified. 5 baselines regenerated as strict-improvement diffs. 6,430 tests pass (+4). `make check` clean. 65/65 baselines pass. Closes N-2 + N-3 + N-9 from POLISH-PLAN.md Cluster D.

spinje added 9 commits December 29, 2025 12:00

docs: add post-implementation bug fix to Task 96 progress log

8b8370b

docs: add pitfall for wrapper output structure vs validation

9e57091

Documents lesson learned from batch validation bug fix (GitHub #15): When adding runtime wrappers that change node output structure, compile-time validation systems must also be updated.

add webpage to markdown with images pflow example

9efa2a4

spinje mentioned this pull request Dec 29, 2025

Bug: Template validator doesn't recognize batch-specific variables #15

Closed

spinje merged commit 1c3759c into main Dec 29, 2025
7 checks passed

claude Bot mentioned this pull request Apr 17, 2026

fix: auto-refresh registry when node source files change (fixes #295) #296

Merged

This was referenced May 7, 2026

feat: Prompt caching via declarative ## Cache block + pflow analyze-cache #378

Merged

test: collapse parametrize-able clones in cache_analysis test suites (~1.5-2k LOC) #380

Open

spinje mentioned this pull request May 12, 2026

Cache analyzer: unify PerCallRow.*_tokens_estimated unit contract to per-call #394

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: allow arrays/dicts to serialize to JSON in string contexts#19

fix: allow arrays/dicts to serialize to JSON in string contexts#19
spinje merged 9 commits into
mainfrom
fix/template-variable-mismatch

spinje commented Dec 29, 2025

Uh oh!

claude Bot commented Dec 29, 2025

Uh oh!

spinje commented Dec 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

spinje commented Dec 29, 2025

Summary

Changes

Behavior

Guardrails Preserved

File Changes

Testing

Uh oh!

claude Bot commented Dec 29, 2025

Code Review for PR #19: Array/Dict to JSON String Serialization

Summary

✅ Strengths

⚠️ Critical Issues - Must Fix Before Merge

1. Shell Injection Risk in _warn_shell_unsafe_json() ⚠️ SECURITY

2. Type Checker Matrix Inconsistency

3. Batch Node JSON Parsing Size Limit Too High

⚠️ Warnings - Should Be Addressed

4. Inconsistent Validation Between Template Validator and Runtime

5. Shell Node Warning Is Too Noisy

6. Test Naming Inconsistency

7. Boolean Coercion Edge Case

💡 Suggestions - Optional Improvements

8. Consider Extracting Shell Safety Checks to Utility Module

9. Add Integration Test for Real-World Workflow

10. Document the Bidirectional Nature in Architecture Docs

📊 Test Quality Assessment

🎯 Summary & Recommendation

💬 Questions for Author

Uh oh!

spinje commented Dec 29, 2025

Review Fixes Applied

✅ Fixed

❌ Not Addressed (Pre-existing, out of scope)

Answers to Questions

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. Shell Injection Risk in `_warn_shell_unsafe_json()` ⚠️ SECURITY