Skip to content

Parser silently truncates multi-line - key: | values at the first blank line #387

@spinje

Description

@spinje

Summary

When a node parameter uses a multi-line YAML block scalar (- prompt: |, - command: |, - body: |, - description: |, etc.) with blank lines inside the block, pflow's parser silently truncates the value at the first blank line. Worse: the post-blank-line content is silently absorbed into the entity's purpose (description prose), so the bug is invisible unless the user inspects the parsed IR or trace.

Workflows validate, execute, and produce output. Cost (in $ on the bill) reflects only the truncated value. Following the official pflow guide prompt-caching example verbatim produces a silently truncated prompt where ${item.text} never reaches the LLM.

Severity

High — silent data loss, no warning at validate-time or runtime.

Reproduction

# Repro

## Steps

### check

Checks that the prompt survives blank lines.

- type: llm
- prompt: |
    First line of the prompt.

    Second line, after a blank line. This whole block SHOULD reach the LLM.

    Third line.

Parsed result before fix:

PROMPT FIELD:  'First line of the prompt.'
PURPOSE FIELD: 'Second line, after a blank line. This whole block SHOULD reach the LLM.\nThird line.'

The LLM only receives First line of the prompt. Everything after the first blank line is silently captured as the node's purpose (description).

Scope

The bug affects all multi-line YAML values, not just prompt::

  • - prompt: | (literal block) — broken
  • - prompt: > (folded block) — broken
  • - prompt: followed by indented plain scalar — broken
  • Any other multi-line parameter: body: (HTTP), command: (shell), description: (Inputs/Outputs), etc.

Not affected: fenced code blocks ( ```prompt, ```yaml batch, etc. — different parse path), single-line values, and file references (./foo.prompt.md).

Why this matters

  1. The official pflow guide prompt-caching example uses this exact pattern. A user following the guide verbatim gets a silently truncated prompt.

  2. Silent — no warning, no error. --validate-only passes. Runtime executes. The LLM produces plausible-sounding output from the partial prompt. The user has no signal that anything is wrong.

  3. Hard to diagnose downstream. Symptoms look like a caching/sequencing bug; the actual cause is the parser.

Root cause

src/pflow/core/markdown_parser.py:493-519 — the YAML-continuation state machine treats blank lines as item terminators:

if in_yaml_continuation:
    if line and line.strip() != "":           # blank → falls through
        content_start = len(line) - len(line.lstrip())
        if content_start >= yaml_indent_level:
            yaml_current_item_lines.append(line)
            continue
    _flush_yaml_item()                         # blank ENDS the item

The parser does not inspect the | or > sigil at all. Block-scalar handling is an emergent side effect: lines are collected by the state machine, joined, then yaml.safe_load'd. So PyYAML never sees the blank-line content — by the time the joined string reaches it, post-blank lines are already gone (captured as prose instead).

Suggested fix

Collapse the continuation rule from two conditions to one:

Before After
Continuation Non-blank AND indented ≥ block indent Blank OR indented ≥ block indent
Termination Blank line, dedent, new bullet, heading, code-fence, EOF Dedent, new bullet, heading, code-fence, EOF

_flush_yaml_item strips trailing blanks before joining so single-line items stay on the _coerce_yaml_scalar fast path (preserving the intentional PyYAML divergences for octal/hex/dates already documented in core/CLAUDE.md).

This matches YAML's actual block-scalar termination semantics, removes a wrong rule rather than adding special cases, and applies uniformly to every multi-line YAML value in every section (Inputs, Steps, Outputs).

Workaround until fixed

Use one of:

  • Fenced code block: ```prompt ... ``` (preferred per the guide)
  • File reference: - prompt: ./my.prompt.md
  • Collapse blank lines: keep multi-line content with single newlines only

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions