-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Make the buffer condition more precise #8907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the buffer condition more precise #8907
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the StreamListener buffering logic to reduce unnecessary delays in token streaming by implementing more precise buffer conditions. Instead of always buffering up to 10 chunks, the system now intelligently determines when buffering is needed based on whether the current content could potentially form an end identifier pattern.
- Adds
_could_form_end_identifier()
method to detect when buffering is necessary - Updates buffering logic to yield tokens immediately when they cannot form end patterns
- Introduces adapter-specific pattern configuration for precise end identifier detection
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
dspy/streaming/streaming_listener.py | Implements smart buffering logic with new pattern matching capabilities and updates receive method |
tests/streaming/test_streaming.py | Adds comprehensive test coverage for the new _could_form_end_identifier method across all adapter types |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
LGTM, as long as the method matches all potential cases. Thanks Tomu! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
dspy/streaming/streaming_listener.py
Outdated
elif not self._could_form_end_identifier(concat_message, adapter_name): | ||
# Buffer cannot form end identifier, safe to yield the oldest token | ||
# Keep at least 1 token in buffer in case next token creates end pattern | ||
if self.field_end_queue.qsize() > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we just call flush()
here? this won't affect the use case we are tackling, but technically a direct flush()
can fit here.
Currently, StreamListener always buffers tokens up to 10 chunks to find the end token. This behavior has an issue that 1) it causes unnecessary delay in token streaming and 2) may change the chunk order for native response chunks. This PR updates the buffer condition to buffer chunks only when it is possible to form the end boilerplate.