Skip to content

fix: allow arrays/dicts to serialize to JSON in string contexts#19

Merged
spinje merged 9 commits into
mainfrom
fix/template-variable-mismatch
Dec 29, 2025
Merged

fix: allow arrays/dicts to serialize to JSON in string contexts#19
spinje merged 9 commits into
mainfrom
fix/template-variable-mismatch

Conversation

@spinje
Copy link
Copy Markdown
Owner

@spinje spinje commented Dec 29, 2025

Summary

Fixes #18 - Arrays and dicts can now be embedded in shell command strings (and other string parameters) by auto-serializing to JSON.

Changes

  • Type checker update: Added str, string to compatible types for dict, list, array, object
  • Shell-unsafe JSON warning: New _warn_shell_unsafe_json() method warns when JSON contains characters that could break shell parsing (', `, $()
  • Documentation: Updated architecture/reference/template-variables.md with new type matrix and "Shell Command Limitations" section
  • Tests updated: Modified tests that expected dict→str to fail, added new tests for list→str

Behavior

Template Data Result
"echo '${items}'" ["a", "b"] ✅ Works → echo '["a", "b"]'
"${items}" ["a", "b"] ❌ Blocked (simple template, runtime check preserved)
"echo '${data}'" {"msg": "it's"} ⚠️ Warning about shell-unsafe chars

Guardrails Preserved

  1. Simple templates (${items} alone) → runtime check still blocks dict/list→str
  2. Shell-unsafe JSON → warning emitted, suggests using stdin
  3. Type mismatches for non-string targets (dict→int) → still caught

File Changes

 architecture/reference/template-variables.md       | 62 ++++++++++-
 src/pflow/nodes/shell/shell.py                     | 44 ++++++++
 src/pflow/runtime/template_validator.py            |  2 +-
 src/pflow/runtime/type_checker.py                  |  8 +-
 tests/test_runtime/test_template_validator_types.py| 73 ++++++++++---
 tests/test_runtime/test_type_checker.py            | 38 ++++---
 6 files changed, 190 insertions(+), 37 deletions(-)

Testing

Run make test to verify all tests pass (3612 passed).
Run make check to verify linting and type checks pass.

Fixes #13

Shell nodes always output text (stdout is string). When that text is
valid JSON like `["a","b","c"]`, batch processing now auto-parses it
to a Python list, matching the behavior already in node_wrapper.py
for node parameters.

Changes:
- Add JSON auto-parsing in batch_node.py prep() method
- Add type coercion helpers for batch config (defense-in-depth)
- Add 25 new tests (10 for JSON parsing, 15 for type coercion)

The JSON parsing follows the proven pattern from node_wrapper.py:
- Strip whitespace (handles shell output newlines)
- 10MB size limit for security
- Quick check for '[' prefix before parsing
- Graceful fallback if parsing fails
Fixes #15

Three related issues fixed:

1. Batch item alias (`${item}` or custom via `batch.as`) now recognized
   - Added batch alias extraction to `_extract_node_outputs()`
   - Added batch alias recognition to `validate_data_flow()`

2. Batch outputs (`${node.results}`, `${node.count}`, etc.) now recognized
   - Added `_BATCH_OUTPUTS` constant with all batch output definitions
   - Batch nodes register batch outputs instead of inner node outputs

3. Templates in `batch.items` now extracted for validation
   - Extended `_extract_all_templates()` to scan `node.batch.items`

Added 14 new tests covering batch template validation scenarios.
Clarified comments in the cli-agent-instructions.md file to enhance understanding of shell command usage. Updated the example command to demonstrate the use of pflow variables directly, ensuring they resolve before the shell execution. This change aims to prevent common mistakes and improve the clarity of instructions for users.
Documents lesson learned from batch validation bug fix (GitHub #15):
When adding runtime wrappers that change node output structure,
compile-time validation systems must also be updated.
Introduces a new JSON file to demonstrate the current workaround for passing arrays via stdin in shell commands, addressing the limitation of embedding arrays directly in command strings. Additionally, adds a design issue report outlining the challenges faced when combining multiple arrays in workflows, along with reproduction steps and potential solutions. Includes minimal reproduction and real-world problem examples to illustrate the issue.
Fixes #18

- Update type checker to allow dict/list/array → str compatibility
- Add shell-unsafe JSON warning for apostrophes, backticks, $(...)
- Document limitation in template-variables.md
- Update tests for new behavior
- Fix pre-existing mypy error in template_validator.py

Behavior:
- Complex templates like "echo '${items}'" auto-serialize arrays to JSON
- Simple templates like "${items}" still blocked at runtime (preserves guardrail)
- Warning emitted if JSON contains shell-unsafe characters
- Fix ReDoS vulnerability in _warn_shell_unsafe_json() by limiting regex
  to 500 chars and removing DOTALL flag
- Clarify bidirectional type compatibility in type_checker.py comments
- Improve shell warning message clarity (mention multiple issues may exist)
@claude
Copy link
Copy Markdown

claude Bot commented Dec 29, 2025

Code Review for PR #19: Array/Dict to JSON String Serialization

Summary

This PR enables arrays and dictionaries to be embedded in string parameters by auto-serializing them to JSON. This solves a real problem where shell nodes output JSON strings that need to flow into batch processing or be embedded in commands. The implementation is well-structured with proper guardrails, but there are some areas that need attention before merge.


Strengths

  1. Solves a Real Problem: The webpage-to-markdown example clearly demonstrates the value - shell output is always strings, and needing to parse JSON is a common pattern.

  2. Pattern Reuse: JSON auto-parsing follows the proven pattern from node_wrapper.py (lines 746-781), ensuring consistency across the codebase.

  3. Comprehensive Documentation: The updates to template-variables.md clearly explain the shell command limitations and provide safe alternatives (using stdin).

  4. Safety Guardrails Preserved:

    • Simple templates (${items} alone) still block dict/list→str at runtime
    • Shell-unsafe JSON triggers warnings
    • Type mismatches for non-string targets still caught
  5. Thorough Testing: 25 new tests for batch processing, updated type checker tests - solid coverage.

  6. Excellent Bug Documentation: The bugfix-log entry is exemplary - clear repro steps, root cause analysis, and lessons learned.


⚠️ Critical Issues - Must Fix Before Merge

1. Shell Injection Risk in _warn_shell_unsafe_json() ⚠️ SECURITY

Location: src/pflow/nodes/shell/shell.py:306-345

Issue: The regex pattern r"\[.*?\]|\{.*?\}" with re.DOTALL can match across thousands of lines, causing:

  • ReDoS (Regular Expression Denial of Service) on malicious input
  • False positives matching unrelated brackets in long commands

Current Code:

json_pattern = r"\[.*?\]|\{.*?\}"
json_matches = re.findall(json_pattern, command, re.DOTALL)

Fix: Limit the pattern to single lines and add size constraints:

# Only match JSON-like patterns on a single line (no DOTALL)
# Limit to reasonable JSON snippet size (500 chars)
json_pattern = r"\[.{0,500}?\]|\{.{0,500}?\}"
json_matches = re.findall(json_pattern, command)

Why This Matters: Without this fix, a malicious workflow with a very long command could hang the validator.


2. Type Checker Matrix Inconsistency

Location: src/pflow/runtime/type_checker.py:30-58

Issue: The type compatibility matrix allows dict → str and list → str, but the comment says "dict can serialize to JSON string" when it should say "dict/list auto-parse FROM JSON strings when coming FROM str".

Current Comment (line 11):

# Note on str → dict/list compatibility (auto-parse feature):

The matrix (lines 54-57):

"dict": ["any", "dict", "object", "str", "string"],  # dict can serialize to JSON string
"list": ["any", "list", "array", "str", "string"],   # list can serialize to JSON string

Issue: This creates confusion - the matrix allows BOTH directions:

  • str → dict/list (auto-parse JSON strings) ✅ Correct
  • dict/list → str (serialize to JSON) ✅ Also correct now

But the comments don't clarify this is bidirectional. This needs clearer documentation.

Fix: Update the comments to be explicit:

"dict": ["any", "dict", "object", "str", "string"],  # dict ↔ str bidirectional (serialize/parse JSON)
"list": ["any", "list", "array", "str", "string"],   # list ↔ str bidirectional (serialize/parse JSON)

3. Batch Node JSON Parsing Size Limit Too High

Location: src/pflow/runtime/batch_node.py:247

Issue: 10MB limit for JSON parsing is very generous and could cause memory issues or slow parsing on malicious input.

Current Code:

MAX_JSON_SIZE = 10 * 1024 * 1024  # 10MB limit

Recommendation: Consider 1MB (1024 * 1024) instead. Batch items with 10MB of JSON are likely doing something wrong. If there's a legitimate use case for 10MB arrays, document it in a comment.

Suggested Change:

# Reasonable limit for batch items JSON
# Larger arrays should be processed from files, not inline JSON
MAX_JSON_SIZE = 1 * 1024 * 1024  # 1MB limit

⚠️ Warnings - Should Be Addressed

4. Inconsistent Validation Between Template Validator and Runtime

Location: Multiple files

Issue: The PR adds runtime JSON auto-parsing but the template_validator.py static validation doesn't know about batch wrappers. This was documented in the pitfalls file, but it's not fixed.

Evidence: From .taskmaster/knowledge/pitfalls.md:360+:

Template validation happens at compile-time using static analysis. It has no knowledge of runtime wrappers that transform output structures.

Impact: Valid batch workflows might fail validation with confusing errors like:

  • ${item} reported as "undefined input"
  • ${node.results} fails with suggestion to use ${node.response}

Status: Documented in pitfalls but not fixed in this PR.

Recommendation: Either:

  1. Fix the validation in this PR (add batch-aware output detection to template_validator.py)
  2. File a follow-up issue and link it in the PR description

5. Shell Node Warning Is Too Noisy

Location: src/pflow/nodes/shell/shell.py:336-345

Issue: The warning triggers on the FIRST unsafe char and returns immediately. This is good for avoiding spam, but the warning message doesn't indicate this is just one of potentially many issues.

Current Code:

logger.warning(
    f"Command contains JSON with {name} ('{char}') which may break shell parsing. "
    f"Consider using 'stdin' parameter instead for reliable data passing.",
    ...
)
return  # Only warn once per command

Suggestion: Make it clear this is a general check:

logger.warning(
    f"Command contains JSON with shell-unsafe characters (found: {name} '{char}'). "
    f"This may break shell parsing. Consider using 'stdin' parameter instead.",
    ...
)
return  # Only warn once per command (multiple issues may exist)

6. Test Naming Inconsistency

Location: tests/test_runtime/test_batch_node.py

Issue: New test classes use different naming patterns:

  • TestItemsJsonAutoParsing (10 tests) - Clear
  • TestConfigTypeCoercion (15 tests) - Clear

But existing tests use:

  • TestBatchExecution
  • TestInputOutputFormats

Minor Issue: Not critical, but consistency helps navigation. Consider prefixing all test classes with TestBatch for easy filtering.


7. Boolean Coercion Edge Case

Location: src/pflow/runtime/batch_node.py:144-157

Issue: The boolean coercion handles "yes"/"no" but not "y"/"n" or "1"/"0" (numeric). This might surprise users.

Current Code:

if lower in ("true", "1", "yes"):
    return True
if lower in ("false", "0", "no", ""):
    return False

Observation: This is actually fine - it handles the common cases. Just noting it for future reference if users request "y"/"n" support.


💡 Suggestions - Optional Improvements

8. Consider Extracting Shell Safety Checks to Utility Module

Location: src/pflow/nodes/shell/shell.py

Observation: The shell node now has THREE safety check methods:

  • _check_command_template_safety() (lines 257-305)
  • _warn_shell_unsafe_json() (lines 306-345)
  • _adapt_stdin_to_string() (lines 347-421)

Suggestion: These could be extracted to src/pflow/nodes/shell/safety_utils.py for:

  • Better testability (unit test the safety checks in isolation)
  • Reusability if other nodes need similar checks
  • Cleaner shell.py file (currently 760 lines)

Not Blocking: This can be a follow-up refactor.


9. Add Integration Test for Real-World Workflow

Location: tests/test_integration/

Suggestion: The webpage-to-markdown-parallel.json workflow is a great real-world example. Consider adding an integration test that:

  1. Mocks the HTTP fetch to return known HTML
  2. Validates the shell→batch→LLM flow works
  3. Verifies JSON array parsing happens correctly

Value: Ensures the real use case that motivated this PR doesn't regress.


10. Document the Bidirectional Nature in Architecture Docs

Location: architecture/reference/template-variables.md

Observation: The doc update explains shell command limitations well, but doesn't explicitly call out that the type system is now bidirectional for dict/list ↔ str.

Suggested Addition (after line 602):

#### Bidirectional Type Compatibility

The type system now supports bidirectional conversion between structured types and strings:

- **dict/list → str**: Automatically serializes to JSON (e.g., for embedding in commands)
- **str → dict/list**: Automatically parses JSON (e.g., shell output into batch items)

Both directions are validated at compile-time and executed at runtime.

📊 Test Quality Assessment

Test Coverage: ✅ Excellent

  • 25 new batch processing tests
  • Updated type checker tests to reflect bidirectional compatibility
  • Edge cases covered (None values, empty arrays, large JSON)

Test Clarity: ✅ Good

  • Test names clearly describe what they verify
  • Good mix of positive and negative cases

Potential Gaps:

  1. No test for the ReDoS scenario in _warn_shell_unsafe_json() (see Critical Issue docs: Set up Mintlify documentation (Task 93) #1)
  2. No test verifying the 10MB limit in batch JSON parsing actually works
  3. No integration test for the real-world webpage-to-markdown workflow

🎯 Summary & Recommendation

Overall Assessment: This is a well-thought-out feature with good documentation and testing. The core design is sound, but there are security and clarity issues that must be addressed.

Before Merge:

  1. Must Fix: ReDoS vulnerability in _warn_shell_unsafe_json() (Critical docs: Set up Mintlify documentation (Task 93) #1)
  2. Must Fix: Clarify bidirectional nature in type_checker.py comments (Critical feat: unify LLM usage via Simon Willison's llm library (Task 95) #2)
  3. ⚠️ Should Fix: Decide on batch JSON size limit (10MB vs 1MB) and document reasoning (Critical feat: Smart output display for registry run #3)
  4. ⚠️ Should Fix: Either fix batch validation or file follow-up issue (Warning feat: add smart output display for registry run (closes #3) #4)

After Addressing Critical Issues: ✅ Approve and Merge

The suggestions and warnings are valuable for follow-up work but shouldn't block the PR.


💬 Questions for Author

  1. What's the largest batch items JSON you've seen in practice? Does 10MB limit have a real use case?
  2. Is the batch validation issue (Warning feat: add smart output display for registry run (closes #3) #4) being tracked separately, or should we fix it in this PR?
  3. Did you consider extracting shell safety checks to a separate module (Suggestion feat: detect API keys from env vars and pflow settings #8)?

Great work on solving a real problem with a thoughtful approach! 🚀

@spinje
Copy link
Copy Markdown
Owner Author

spinje commented Dec 29, 2025

Review Fixes Applied

Thanks for the thorough review! I've addressed the critical issues in commit 1027630:

✅ Fixed

#1 ReDoS Vulnerability (Critical)

  • Changed regex from r"\[.*?\]|\{.*?\}" with re.DOTALL to r"\[.{0,500}?\]|\{.{0,500}?\}"
  • Removed DOTALL flag - now only matches within single lines
  • Added 500 char limit to prevent catastrophic backtracking
  • Verified: 10,000 char command completes in <1ms

#2 Type Checker Comment Clarity (Critical)

  • Added explicit "BIDIRECTIONAL JSON compatibility" header
  • Documented both directions with examples:
    • str → dict/list (auto-parse)
    • dict/list → str (auto-serialize)
  • Updated inline comments to use notation

#5 Warning Message Clarity (Warning)

  • Changed to: "shell-unsafe characters (found: {name} '{char}')"
  • Added comment: # Only warn once per command (multiple issues may exist)

❌ Not Addressed (Pre-existing, out of scope)

#3 Batch Node 10MB Limit

  • This is pre-existing code, not introduced in this PR
  • Would require separate discussion about appropriate limits

#4 Batch Validation Inconsistency

  • Already documented in .taskmaster/knowledge/pitfalls.md
  • Tracked as known limitation, not a regression from this PR

Answers to Questions

What's the largest batch items JSON you've seen in practice?

In the webpage-to-markdown workflow, batch items are image descriptions - typically a few KB each. 10MB would be extreme. However, since this limit is pre-existing, I'd prefer to address it in a separate PR if we want to change it.

Is the batch validation issue being tracked separately?

Yes, documented in pitfalls.md (lines 360+). It's a fundamental limitation of static validation vs runtime wrappers.

Did you consider extracting shell safety checks?

Good suggestion for follow-up. The shell.py file is getting large (760 lines). A safety_utils.py extraction would improve testability.

@spinje spinje merged commit 1c3759c into main Dec 29, 2025
7 checks passed
spinje added a commit that referenced this pull request May 8, 2026
…kip review]

PR #378 review finding #4. Closes the synthetic-builder ↔ production-analyze()
fidelity gap in tests/test_core/test_cache_analysis_renderers.py.

What landed
- _BUILDER_DOCUMENTED_DEFAULTS frozenset (6 entries) names the AnalysisSummary
  fields the synthetic builder cannot faithfully model. Tests asserting on
  these MUST drive analyze() end-to-end.
- TestMakeAnalysisShapeParity class with two methods:
  * test_builder_field_set_matches_dataclass_minus_documented_defaults — uses
    dataclasses.fields() introspection to fail noisily when a new field is
    added without builder coverage or allowlist documentation.
  * test_documented_defaults_get_overwritten_by_production — drives analyze()
    against a contrived IR + trace that triggers each documented-default
    overwrite, catching the case where production's overwrite logic is
    deleted while the allowlist stays stale.
- Three renderer tests migrated from synthetic to e2e analyze() calls:
  * test_json_partial_trace_exposes_evidence_scope_and_observed_models
  * test_json_summary_exposes_projection_exclusions_and_delta_reason
  * test_render_json_includes_rollup_workflow_paths_and_unavailable_models_by_workflow

Reviewer's two non-issues confirmed via inspection
- test_text_summary_renders_blocking_errors_categorically (line 272 in PR
  baseline) only asserts on builder-populated fields. Kept synthetic.
- test_json_emits_root_and_sub_workflow_llm_node_counts already drove
  analyze() end-to-end against the committed 3-deep fixture. No migration
  needed.

Verification
- 5 mutation contracts checked by reverting production code:
  * Add new AnalysisSummary field → parity method 1 fails naming the field.
  * Delete observed_models_in_trace overwrite → migration #2 + parity
    method 2 fail with documented diagnostics.
  * Delete unavailable_models_by_workflow overwrite → migration #4 + parity
    method 2 fail.
- 6,335 tests passing on default suite.
- make check clean (ruff + ruff-format + mypy + deptry).
- test_golden_baseline_hashes_match (DD#19) green; test_plan_drift.py 34/34.

Plan + progress log
- Atomic plan at .taskmaster/tasks/task_159/implementation/fix-plans/
  renderer-test-fidelity-shape-parity-plan.md.
- Consolidated PR #378 review-fix sweep entry appended to
  implementation-progress-log.md, documenting all four phases (Phase 1
  easy bundle / Phase 2 medium bundle / Phase 3 cohort-key correctness /
  Phase 4 this commit), the five disputed findings (with citations), the
  GH #380 follow-up filed, and the cross-cutting insights from the sweep.

GH issue #380 filed for the deferred test-bloat parametrize-collapse work.
spinje added a commit that referenced this pull request May 8, 2026
…le [skip review]

Adds `.taskmaster/tasks/task_159/baseline/` as a byte-stable regression
oracle for the upcoming Task 160 cache_analysis refactor. Per-case folder:
workflow.pflow.md + command.sh + README.md (mutation contract) +
expected-{stdout,stderr,exit-code}.txt. Harness wraps each case in
`HOME=$BASELINE_HOME` redirect for filesystem isolation; normalize.py
redacts non-deterministic content (timestamps, paths, hashes, sub-cent
costs) without touching warning IDs / severity / JSON ordering.

Surfaces (63 cases total):
  01 parser errors                  10
  02 validator errors                8
  03 analyze-cache modes             8
  04 warning catalog (one per ID)   20  (15/20 trigger target ID; 5 in F-02)
  05 advisory cases                  5
  12 real-world lyrics-generator     5  (committed snapshot of music workflow tree)
  13 happy-path interactions         4  (batch+cache+prewarm, sub-workflow with cache, 3-level nesting)
  14 Pitfall #19 defenses            3  (dotted-path through NamespacedSharedStore, file-resolve boundary)

`./regenerate.sh` writes expected-* files; `./verify.sh` re-runs and
diffs (drift = stop signal). Exit 0 = clean. Trailing-newline + per-line
trailing-whitespace handling aligned with pre-commit's
trailing-whitespace + end-of-file-fixer hooks so `make check` stays clean.

Findings logged in `FINDINGS.md`:
- F-01: parser silently splits two-vars-on-one-line (spec promises error)
- F-02: 5 catalog warning IDs need more elaborate fixtures than minimal
- F-03: `pflow guide <workflow>` auto-detect misses `caching` topic
- F-04: `cache.below-min-tokens` false-positive on greenfield when chunks
        resolve to LLM responses
- F-05: `pflow visualize` validates before rendering — blocks on unknown
        node types in unrelated sub-workflows

Audit handoff: `NEXT-AGENT-AUDIT.md` (mission, priorities, 7 worked
examples) + `starting-context/braindump-2026-05-08-baseline-audit-handoff.md`
(tacit knowledge transfer).

Out of scope per user direction: trace format 2.0.0 backcompat
(2.0.0 fixture removed); MCP server cases; surfaces 06-11 (deferred).
spinje added a commit that referenced this pull request May 10, 2026
`pflow analyze-cache --from-trace` previously rendered three delta lines
(Actual savings + First-run + Rerun) in trace mode. The first-run-with-
cache projection assumes a fresh run with no memo cache hits and no
provider implicit caching — neither modeled. When those fire (the common
case), the projection diverges from actual cost by an order of magnitude
(lyrics-generator: projection said "saves 1%"; actual was 49%) and the
two competing numbers anchor agents on the smaller, misleading figure.

Option B: drop the first-run delta line in trace mode entirely. Show only
the measured number plus the steady-state forward-looking projection.

- `_render_summary_deltas` split into 4-line dispatch + two named helpers
  `_render_trace_deltas` and `_render_greenfield_deltas`. Dispatch on
  `evidence_scope ∈ {"complete_trace", "truncated_trace_executed_subset"}`
  rather than `actually_paid_usd is not None` to handle the all-unpriced-
  trace edge case correctly.
- Trace mode renders `Actual savings (this run):` (or `unavailable
  (projection excludes …)`) + `Rerun delta (projected):`. The
  `(projected)` suffix on rerun signals "model, not measurement" right
  at the line.
- Greenfield mode unchanged shape: `First-run delta:` + `Rerun delta:`,
  no `(projected)` suffix (both are projections by construction; the
  absence of an actual-savings line carries the signal).
- The truncated-mode `(executed)` qualifier on rerun was retired —
  the suppression note already conveys executed-subset context.
- `_format_delta` actual-delta label simplified `"actual vs no-cache"`
  → `"vs no-cache"` (the row label "Actual savings (this run):" already
  says "actual"; doubled word was a Stage A artifact).
- Dead branch `if not in_trace_mode and actual_line:` removed
  (production-unreachable per searcher #2; only synthetic fixtures hit it).
- `_make_analysis` test builder upgraded to set `evidence_scope` based
  on `actually_paid` (matches production shape; closes a Pitfall #19
  gap that searcher #2 specifically warned about under the new dispatch).

JSON shape unchanged — all three deltas still on `AnalysisSummary`.

Tests: 2 new regression tests with verified mutation contracts; 1
existing test renamed and rewritten for Option B truncated semantics;
1 new mutation guard on the `vs no-cache` label simplification. Three
mutation contracts verified by reverting production code and observing
the matching test fail.

6,396 default-suite tests pass. 65/65 baselines pass (no drift — the
parallel L-12/L-2/L-1 commit had already regenerated the 3 affected
baselines anticipating Option B's final shape). Manual smoke confirms
trace mode shows actual + rerun (projected) only; greenfield shows
first-run + rerun.

Closes L-3 from BASELINE-AUDIT Section F.
spinje added a commit that referenced this pull request May 10, 2026
Five deferred BASELINE-AUDIT findings shipped together as the
post-merge-block polish round. Single-render-format changes; no
architectural shift. Bundled with the prior in-flight B-9 blocking-
errors split work in this commit.

- B-6: drop "(ordered by impact)" qualifier when no action has positive
  savings (render_text.py).
- B-11: heterogeneous suffix uses "; plus " separator instead of bare
  "+" so it doesn't parse as "model X PLUS model Y" (render_text.py).
- B-8 + L-7: per-call tokens/cacheable columns widened from :>5 to
  :>7, (thousands separator + 6-digit alignment) (render_text.py).
- L-4: collect skipped sub-workflow paths and emit ONE aggregated note
  via new _format_skipped_workflows_note helper. Lyrics-generator
  collapses 15 near-identical "predicted-key matching skipped" notes
  (~4KB) to one summary line (analyze.py).

Tests: 12 new tests including a production-shape integration test
(test_predict_cache_keys_aggregates_skip_notes_via_production_path)
that drives _predict_cache_keys end-to-end — caught a Pitfall #19
where helper-only tests passed against reverted production code. Each
fix has a verified mutation contract.

Doc fix: docs/reference/cli/analyze-cache.mdx now reflects the
conditional "ordered by impact" claim.

Verification: make check clean (ruff + ruff-format + mypy + deptry);
make test 6,415 passed, 1 skipped (+11 from baseline); 65/65 baseline
cases pass after re-capture of 46 affected cases. Strict-improvement
audit confirms all baseline drifts fall into expected classes.

Closes B-6, B-8, B-9, B-11, L-4, L-7 from BASELINE-AUDIT.md.
spinje added a commit that referenced this pull request May 10, 2026
…licates

Restructured analyze-cache header for AI-agent readability. The 39-word
run-on scale line splits into 3 cohesive lines (`Workflow:` count +
invocation status; `Models:` for 2+ models; `Heterogeneous:` for batch
sub-workflows declaring `model: ${item.model}`). Single-model workflows
keep `using X` inline (pragmatic density).

The duplicate `Observed models:` line is dropped — `models_in_use` is
a superset of `observed_models_in_trace` in complete trace mode. The
sub-workflow breakdown line drops its trailing 15-name CSV; names
already appear in `## Per-call cache report` headings and
`## Per-child analyze-cache commands`.

Latent bug fixed as bonus: static-mode workflows on the no-model-resolved
branch previously suppressed the heterogeneous suffix. The static-mode
lyrics-generator capture now correctly surfaces `Heterogeneous:
generate-chorus-options (model varies per batch item)`.

`_format_heterogeneous_line` collapses the old 1-vs-N branches via
plain `', '.join`; `_format_heterogeneous_suffix` deleted as dead code.

Files: render_text.py (+50/-65 LOC, net -15 simpler), test_cache_analysis_renderers.py
(3 helper-direct tests migrated to end-to-end render assertions per
Pitfall #19; 2 N-9 CSV literal updates; 1 multi-model test rewritten;
1 new N-2 regression test). 4 mutation contracts verified. 5 baselines
regenerated as strict-improvement diffs.

6,430 tests pass (+4). `make check` clean. 65/65 baselines pass.

Closes N-2 + N-3 + N-9 from POLISH-PLAN.md Cluster D.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Template variables: arrays cannot be embedded in shell command strings

1 participant