Skip to content

Batch processing fails when items come from shell node JSON output #13

@spinje

Description

@spinje

Summary

When using batch.items with output from a shell node that produces a JSON array, the batch processor fails with:

TypeError: Batch items must be an array, got str.
Template '${create-array.stdout}' resolved to: '["item1", "item2", "item3"]'

Root Cause

Shell nodes always output text (stdout is a string). When that text is a valid JSON array like ["a", "b", "c"], the batch processor receives it as a string, not a parsed Python list.

The node parameter system (node_wrapper.py) already handles this case - it auto-parses JSON strings when the target parameter expects dict or list. But batch.items doesn't have this auto-parsing.

Reproduction

{
  "nodes": [
    {
      "id": "create-array",
      "type": "shell",
      "params": {"command": "echo '[\"item1\", \"item2\", \"item3\"]'"}
    },
    {
      "id": "process-items",
      "type": "shell",
      "batch": {"items": "${create-array.stdout}"},
      "params": {"command": "echo Processing: ${item}"}
    }
  ],
  "edges": [{"from": "create-array", "to": "process-items"}]
}

Expected: Batch processes each item (item1, item2, item3)
Actual: TypeError because ${create-array.stdout} is a string, not a list

Real-World Use Case

This blocks a useful parallelization pattern:

  1. Shell node with jq splits HTML into sections: cat file.html | jq -R -s 'split("<section>")'
  2. Batch LLM node processes each section in parallel
  3. Results aggregated

Currently this fails because jq outputs a JSON array as text.

Proposed Solution

Add JSON auto-parsing in batch_node.py:prep() method, following the same pattern as node_wrapper.py:746-781:

  1. Check if resolved items is a string
  2. Strip whitespace (shell outputs often have trailing \n)
  3. If it starts with [, attempt json.loads()
  4. If parsing succeeds and result is a list, use it
  5. Otherwise, keep as string and let existing error handling work

This makes batch.items consistent with how node params already handle JSON strings.

Files to Modify

  • src/pflow/runtime/batch_node.py - Add JSON auto-parsing in prep() method
  • tests/test_runtime/test_batch_node.py - Add tests for JSON string items

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions