fix: allow field headers and content on same line in parser #7955

laitifranz · 2025-03-13T12:13:41Z

Problem

The current parser fails to extract field content when it appears on the same line as the field header. This occurs with model outputs in the format:
[[ ## field ## ]] content here

The parser only works when the content starts on a new line after the header:

[[ ## field ## ]]
content here

How I Encountered This

I ran the following code:

import dspy

lm = dspy.LM("openai/llava-hf/llava-onevision-qwen2-7b-ov-hf",
             api_base="http://localhost:8000/v1",
             api_key="local", model_type='chat')
dspy.configure(lm=lm)

class CaptioningSignature(dspy.Signature):
    """Generate a caption for an image."""
    image: dspy.Image = dspy.InputField()
    question: str = dspy.InputField()
    description: str = dspy.OutputField()

class Captioning(dspy.Module):
    def __init__(self) -> None:
        self.predictor = dspy.ChainOfThought(CaptioningSignature)
    
    def __call__(self, **kwargs):
        return self.predictor(**kwargs)

captioning = Captioning()
example = dspy.Example(image=dspy.Image.from_url("https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"), question="Describe this image in one sentence.").with_inputs("image", "question")
print(captioning(**example.inputs()))

Expected output:

Prediction(
    reasoning='The image depicts the Statue of Liberty ...',
    description='The Statue of Liberty, a colossal neoclassical ...'
)

Actual output:

Prediction(
    reasoning='',
    description=''
)

Solution

Modified the parser to:

Extract remaining content after detecting a field header pattern
Handle content that appears on the same line as the header
Maintain compatibility with the existing newline-separated format

Example

Before: Failed to parse

INPUT

[[ ## description ## ]] The Statue of Liberty stands tall

OUTPUT

Prediction(
    description=''
)

After: Successfully parses both formats

INPUT

[[ ## description ## ]] The Statue of Liberty stands tall

or

[[ ## description ## ]]
The Statue of Liberty stands tall

OUTPUT

Prediction(
    description='The Statue of Liberty stands tall'
)

okhat · 2025-03-13T21:29:49Z

Thank you @laitifranz ! Technically,separators being on a line by themselves was intentional, but we weren't matching the full line, so the behavior is indeed odd.

I do think we should relax that constraint, so I do like this PR. I kind of don't feel extremely comfortable about .strip(), we probably only want to strip [at most] one whitespace token after the separator of the field. But that may end up making the code oddly complicated, so I'll probably just merge this.

fix: allow field headers and content on same line in parser

cb269dc

okhat merged commit 4311483 into stanfordnlp:main Mar 13, 2025
4 checks passed

laitifranz deleted the fix/parser-inline-headers branch March 16, 2025 13:32

laitifranz mentioned this pull request Mar 16, 2025

enhance parser to handle various model output scenarios #7969

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: allow field headers and content on same line in parser #7955

fix: allow field headers and content on same line in parser #7955

laitifranz commented Mar 13, 2025

Uh oh!

okhat commented Mar 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: allow field headers and content on same line in parser #7955

fix: allow field headers and content on same line in parser #7955

Conversation

laitifranz commented Mar 13, 2025

Problem

How I Encountered This

Solution

Example

Uh oh!

okhat commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

okhat commented Mar 13, 2025 •

edited

Loading