-
Notifications
You must be signed in to change notification settings - Fork 14k
server: improve speed of speculative decoding #17808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
server tests passed locally, this should be ready for review @ggerganov |
|
Just a cosmetic bug: in the |
|
@theo77186 you mean just the stdout/stderr log, right? (which is not the stats returned by API) Edit: I think I need more details on the bug, as well as step-by-step reproduction. Feel free to open a dedicated issue. |
Both of them (logs and UI) were broken in cases when the draft batch was always accepted (e.g. "count from 1 to 100"). I fixed it with f74d1ee. |
Fix #12968
I'm testing with:
So far the results are coherent.
How it works: