You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Noticable for autocomplete but all streaming endpoints are affected
Cody Gateway doesn't flush messages as soon as it receives it but instead seems to buffer (for a bit over a second) until it starts sending the first payload. This causes some issues
Multiline completions are slower because of it (we always have to wait for the first n lines, even if we only use the first line etc)
The dynamic multiline experiment is blocked (because it would need to remove the \n stop character leading to the same issue as the line above)
Chat requires workarounds like the "typewriter effect" to make it appear that we're typing word per word and not multiline lines at the same time.
This can impact time-to-first-token for every streaming endpoint
Expected behavior
The backend relays the first SSE chunk as soon as the inference provider sends it
Closes#2205
This PR adds a new feature flag and experimental config option to enable
_hot streak mode_.
The idea is simple: When generating a completion (and this is mostly for
single-line and works beautifully with dynamic-multilin), we let the LLM
continue to generate more than just the current line and use follow-up
lines to seed a cache.
Then, when a user accepts a completion and moves to the next line by
pressing enter, we can instantly generate another completion (and thus
avoid doing another LLM request and incurring latency). The result is a
UX where monotonous, single-line completions appear a lot faster and
allows you to fall into a tab+enter, tab+enter, … mode in which you can
review a longer completion line by line. It’s also going to be much
faster than multiline-completions because we can show the first line as
soon as it is ready.
⚠️ This is currently limited by #2180
## Test plan
- Enable hot streak mode
- Observe that after accepting a completion, pressing enter will show
the continuation of that completion ✨ instantly ✨
https://github.com/sourcegraph/cody/assets/458591/5f70ab45-18a2-4a4c-8d02-61a45328df47
After the changes on the client and the backend, this indeed seems fixed now! Here's a screenshot of incoming SSE event packages and the timestamp (in ms) between them. Previously, a bunch of messages arrived at the same time. Now, it seems like there's a reasonable delay between messages.
Huge props to @rafax for fixing this! I’m am really curious if we'll see a drop in latency in our E2E metrics now 👀
Version
latest
Describe the bug
Noticable for autocomplete but all streaming endpoints are affected
Cody Gateway doesn't flush messages as soon as it receives it but instead seems to buffer (for a bit over a second) until it starts sending the first payload. This causes some issues
\n
stop character leading to the same issue as the line above)Expected behavior
The backend relays the first SSE chunk as soon as the inference provider sends it
Additional context
c.f. https://sourcegraph.slack.com/archives/C05AGQYD528/p1701791953787829
The text was updated successfully, but these errors were encountered: