Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #9085 -
is_last_chunkwas never being set toTrueinStreamResponseobjects in DSPy 3.0.4, breaking stream completion detection. I noticed this behaviour both with Qwen 3 Next 80b and with GPT OSS 120b with vLLM. Given that @chenmoneygithub fixed #8890 it would be really nice to get their two cents on this PR if possible. Not sure if the model they used behaved differently to these ones.Background
I discovered that
is_last_chunkwas alwaysFalsewhen usingStreamListenerwith the built-inReActmodule. This made it impossible to reliably detect when streaming had completed for a given field. My original goal was straightforward: make sureis_last_chunkgets sent.Why This Fix Couldn't Be More Surgical
The root cause is in
_default_handle_stream_chunk():Original code (line 295)
The method only returned a
StreamResponsewhentokenwas non-empty. Since the final chunk is detected by seeing the next field's marker (or end of stream), the token buffer is often empty at that point - so no final chunk was ever emitted.I considered several approaches:
self.stream_end=True- This would still end up not emitting the last chunk most of the timeif token:condition as-is, but add logic to track whether we've already emitted a final chunk and emit it if we haven't. This would've added more state to keep track of in the stream listener.if token or self.stream_end:- Simpler but changes behaviorI went with option 3 because it ensures
is_last_chunk=Trueis always emitted, makes the behavior predictable, and aligns with the existingfinalize()method behavior added in #8890.Behavioral Changes
This changes user-facing behavior in a significant way.
Before (v3.0.4):
After (this PR):
What's different:
chunk=""andis_last_chunk=TrueStreamResponseper fieldchunkis intentional - it's an end-of-stream markerWhy I Think This Is Correct
Looking at the history:
ab7ac0b9(July 2025) initially addedis_last_chunkwith the sameif token:conditiona0b2155a(October 2025) added thefinalize()method specifically to handle missing completion markers by emitting empty final chunksall_chunks[-2]instead ofall_chunks[-1]because of these empty chunksThis suggests empty final chunks were already the intended behavior (maybe?), but the implementation was incomplete. This PR makes it consistent across all streaming scenarios, not just when completion markers are missing.
Changes Made
Meat-and-potatoes
dspy/streaming/streaming_listener.py::_default_handle_stream_chunk()(1 line change)if token:toif token or self.stream_end:Tests
test_stream_listener_returns_correct_chunk_chat_adapter_untokenized_streamto expect empty final chunkstests/streaming/test_streaming_is_last_chunk.py(willing to append these totest_streaming.pyif desired:allow_reuse=Trueandallow_reuse=FalsebehaviorDocumentation
StreamResponsedocstring to document all 4 fields includingis_last_chunkStreamListenerdocstring with usage examplesis_last_chunk=Trueoccurs, why the final chunk is empty, and how to use itTesting
All 41 streaming tests pass (36 passed, 3 skipped). The new tests ensure exactly one
is_last_chunk=Trueper field per streaming session and consistent behavior across all adapters. I think I am unable to run some of the other tests locally, so I am hoping that CI picks up on this and gives me some more answers.Questions for Reviewers
I'm specifically looking for feedback on:
Is this the right approach? The behavioral change affects all users. Are there backward compatibility concerns I should address?
Alternative solutions? Are there better ways to fix
is_last_chunkthat would be less invasive? For example:finalize()instead of in the main flow?Documentation: Is the explanation of empty final chunks clear enough? Should we add migration guidance?
Edge cases: Are there scenarios where this change could break existing usage patterns that I haven't considered?
Compatibility Notes
I think the impact is minimal because:
is_last_chunkwas alwaysFalse, so users couldn't rely on it anywayif chunk.chunk:to skip them if neededis_last_chunk=Trueper field makes completion detection reliableThat said, users who iterate over all chunks will see additional entries. If this is a concern, I'm open to exploring other approaches.
Thanks in advance, and please be gentle I'm new here!