Skip to content

Conversation

@baughmann
Copy link

@baughmann baughmann commented Nov 26, 2025

Summary

Fixes #9085 - is_last_chunk was never being set to True in StreamResponse objects in DSPy 3.0.4, breaking stream completion detection. I noticed this behaviour both with Qwen 3 Next 80b and with GPT OSS 120b with vLLM. Given that @chenmoneygithub fixed #8890 it would be really nice to get their two cents on this PR if possible. Not sure if the model they used behaved differently to these ones.

Background

I discovered that is_last_chunk was always False when using StreamListener with the built-in ReAct module. This made it impossible to reliably detect when streaming had completed for a given field. My original goal was straightforward: make sure is_last_chunk gets sent.

Why This Fix Couldn't Be More Surgical

The root cause is in _default_handle_stream_chunk():

Original code (line 295)

if token:
    return StreamResponse(...)

The method only returned a StreamResponse when token was non-empty. Since the final chunk is detected by seeing the next field's marker (or end of stream), the token buffer is often empty at that point - so no final chunk was ever emitted.

I considered several approaches:

  1. Minimal fix: Only emit when self.stream_end=True - This would still end up not emitting the last chunk most of the time
  2. Separate final chunk: Basically keep the existing if token: condition as-is, but add logic to track whether we've already emitted a final chunk and emit it if we haven't. This would've added more state to keep track of in the stream listener.
  3. Current approach: Change condition to if token or self.stream_end: - Simpler but changes behavior

I went with option 3 because it ensures is_last_chunk=True is always emitted, makes the behavior predictable, and aligns with the existing finalize() method behavior added in #8890.

Behavioral Changes

This changes user-facing behavior in a significant way.

Before (v3.0.4):

chunks = [
    StreamResponse(chunk='To', is_last_chunk=False),
    StreamResponse(chunk=' get to', is_last_chunk=False),
    StreamResponse(chunk=' the other side', is_last_chunk=False),
]
# No chunk with is_last_chunk=True was emitted

After (this PR):

chunks = [
    StreamResponse(chunk='To', is_last_chunk=False),
    StreamResponse(chunk=' get to', is_last_chunk=False),
    StreamResponse(chunk=' the other side', is_last_chunk=False),
    StreamResponse(chunk='', is_last_chunk=True),  # New: empty final chunk
]

What's different:

  1. Every field now gets a final chunk with chunk="" and is_last_chunk=True
  2. Users will receive one additional StreamResponse per field
  3. The empty string in chunk is intentional - it's an end-of-stream marker

Why I Think This Is Correct

Looking at the history:

  1. Commit ab7ac0b9 (July 2025) initially added is_last_chunk with the same if token: condition
  2. Commit a0b2155a (October 2025) added the finalize() method specifically to handle missing completion markers by emitting empty final chunks
  3. Several tests were updated to expect all_chunks[-2] instead of all_chunks[-1] because of these empty chunks

This suggests empty final chunks were already the intended behavior (maybe?), but the implementation was incomplete. This PR makes it consistent across all streaming scenarios, not just when completion markers are missing.

Changes Made

Meat-and-potatoes

  • Modified dspy/streaming/streaming_listener.py::_default_handle_stream_chunk() (1 line change)
  • Changed condition from if token: to if token or self.stream_end:

Tests

  • Fixed test_stream_listener_returns_correct_chunk_chat_adapter_untokenized_stream to expect empty final chunks
  • Added 8 tests in tests/streaming/test_streaming_is_last_chunk.py (willing to append these to test_streaming.py if desired:
    • Coverage for ChatAdapter, JSONAdapter, XMLAdapter
    • Multiple listeners scenario
    • allow_reuse=True and allow_reuse=False behavior
    • Edge cases (few tokens, varied chunk sizes, order invariance)

Documentation

  • Enhanced StreamResponse docstring to document all 4 fields including is_last_chunk
  • Updated StreamListener docstring with usage examples
  • Added comprehensive section in streaming tutorial explaining when is_last_chunk=True occurs, why the final chunk is empty, and how to use it

Testing

All 41 streaming tests pass (36 passed, 3 skipped). The new tests ensure exactly one is_last_chunk=True per field per streaming session and consistent behavior across all adapters. I think I am unable to run some of the other tests locally, so I am hoping that CI picks up on this and gives me some more answers.

Questions for Reviewers

I'm specifically looking for feedback on:

  1. Is this the right approach? The behavioral change affects all users. Are there backward compatibility concerns I should address?

  2. Alternative solutions? Are there better ways to fix is_last_chunk that would be less invasive? For example:

    • Emitting final chunk only from finalize() instead of in the main flow?
    • Making empty final chunks opt-in via a flag?
    • Different signaling mechanism entirely?
  3. Documentation: Is the explanation of empty final chunks clear enough? Should we add migration guidance?

  4. Edge cases: Are there scenarios where this change could break existing usage patterns that I haven't considered?

Compatibility Notes

I think the impact is minimal because:

  1. The bug made the feature unusable - is_last_chunk was always False, so users couldn't rely on it anyway
  2. Empty chunks are filterable - Users can if chunk.chunk: to skip them if needed
  3. Predictable behavior - The guarantee of exactly one is_last_chunk=True per field makes completion detection reliable

That said, users who iterate over all chunks will see additional entries. If this is a concern, I'm open to exploring other approaches.

Thanks in advance, and please be gentle I'm new here!

Fixes stanfordnlp#9085

- Modified _default_handle_stream_chunk to always set is_last_chunk=self.stream_end
- Ensures every StreamListener emits exactly one chunk with is_last_chunk=True
- Updated test to expect empty final chunks with is_last_chunk=True
- Added 8 comprehensive tests for is_last_chunk behavior across all adapters
- Updated StreamResponse and StreamListener docstrings
- Enhanced streaming tutorial documentation with is_last_chunk explanation
@chenmoneygithub
Copy link
Collaborator

duplicates #9089

Anyway, thanks for the PR! The fix is mostly correct, but it's incorrect to claim that the last chunk is always empty.

@baughmann
Copy link
Author

@chenmoneygithub oops yep good catch thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] is_last_chunk never set

2 participants