Skip to content

Conversation

TomeHirata
Copy link
Collaborator

Currently, StreamListener always buffers tokens up to 10 chunks to find the end token. This behavior has an issue that 1) it causes unnecessary delay in token streaming and 2) may change the chunk order for native response chunks. This PR updates the buffer condition to buffer chunks only when it is possible to form the end boilerplate.

@TomeHirata TomeHirata requested a review from Copilot October 6, 2025 14:29
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the StreamListener buffering logic to reduce unnecessary delays in token streaming by implementing more precise buffer conditions. Instead of always buffering up to 10 chunks, the system now intelligently determines when buffering is needed based on whether the current content could potentially form an end identifier pattern.

  • Adds _could_form_end_identifier() method to detect when buffering is necessary
  • Updates buffering logic to yield tokens immediately when they cannot form end patterns
  • Introduces adapter-specific pattern configuration for precise end identifier detection

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
dspy/streaming/streaming_listener.py Implements smart buffering logic with new pattern matching capabilities and updates receive method
tests/streaming/test_streaming.py Adds comprehensive test coverage for the new _could_form_end_identifier method across all adapter types

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@kenctrl
Copy link

kenctrl commented Oct 6, 2025

LGTM, as long as the method matches all potential cases. Thanks Tomu!

Copy link
Collaborator

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

elif not self._could_form_end_identifier(concat_message, adapter_name):
# Buffer cannot form end identifier, safe to yield the oldest token
# Keep at least 1 token in buffer in case next token creates end pattern
if self.field_end_queue.qsize() > 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we just call flush() here? this won't affect the use case we are tackling, but technically a direct flush() can fit here.

@TomeHirata TomeHirata merged commit 6224eb3 into stanfordnlp:main Oct 7, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants