bug: Cody Gateway batches streaming requests #2180

philipp-spiess · 2023-12-07T12:12:50Z

Version

latest

Describe the bug

Noticable for autocomplete but all streaming endpoints are affected

Cody Gateway doesn't flush messages as soon as it receives it but instead seems to buffer (for a bit over a second) until it starts sending the first payload. This causes some issues

Multiline completions are slower because of it (we always have to wait for the first n lines, even if we only use the first line etc)
The dynamic multiline experiment is blocked (because it would need to remove the \n stop character leading to the same issue as the line above)
Chat requires workarounds like the "typewriter effect" to make it appear that we're typing word per word and not multiline lines at the same time.
This can impact time-to-first-token for every streaming endpoint

Expected behavior

The backend relays the first SSE chunk as soon as the inference provider sends it

Additional context

c.f. https://sourcegraph.slack.com/archives/C05AGQYD528/p1701791953787829

The text was updated successfully, but these errors were encountered:

Closes #2205 This PR adds a new feature flag and experimental config option to enable _hot streak mode_. The idea is simple: When generating a completion (and this is mostly for single-line and works beautifully with dynamic-multilin), we let the LLM continue to generate more than just the current line and use follow-up lines to seed a cache. Then, when a user accepts a completion and moves to the next line by pressing enter, we can instantly generate another completion (and thus avoid doing another LLM request and incurring latency). The result is a UX where monotonous, single-line completions appear a lot faster and allows you to fall into a tab+enter, tab+enter, … mode in which you can review a longer completion line by line. It’s also going to be much faster than multiline-completions because we can show the first line as soon as it is ready. ⚠️ This is currently limited by #2180 ## Test plan - Enable hot streak mode - Observe that after accepting a completion, pressing enter will show the continuation of that completion ✨ instantly ✨ https://github.com/sourcegraph/cody/assets/458591/5f70ab45-18a2-4a4c-8d02-61a45328df47

philipp-spiess · 2023-12-11T18:50:12Z

After the changes on the client and the backend, this indeed seems fixed now! Here's a screenshot of incoming SSE event packages and the timestamp (in ms) between them. Previously, a bunch of messages arrived at the same time. Now, it seems like there's a reasonable delay between messages.

Huge props to @rafax for fixing this! I’m am really curious if we'll see a drop in latency in our E2E metrics now 👀

philipp-spiess added bug Something isn't working clients/vscode labels Dec 7, 2023

philipp-spiess assigned philipp-spiess and rafax Dec 7, 2023

rafax mentioned this issue Dec 8, 2023

cody-gateway: For streaming requests, flush response after every write sourcegraph/sourcegraph-public-snapshot#58859

Merged

This was referenced Dec 8, 2023

Autocomplete: Add Experimental hot streak mode #2118

Merged

bug: Streaming APIs should not use gzip #2223

Closed

rafax mentioned this issue Dec 9, 2023

Measure time to first event for streaming completions sourcegraph/sourcegraph-public-snapshot#58875

Merged

philipp-spiess closed this as completed Dec 11, 2023

rafax mentioned this issue Jan 17, 2024

Add an option for disabling HTTP/2 when communicating with Cody Gateway sourcegraph/sourcegraph-public-snapshot#59668

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Cody Gateway batches streaming requests #2180

bug: Cody Gateway batches streaming requests #2180

philipp-spiess commented Dec 7, 2023

philipp-spiess commented Dec 11, 2023

bug: Cody Gateway batches streaming requests #2180

bug: Cody Gateway batches streaming requests #2180

Comments

philipp-spiess commented Dec 7, 2023

Version

Describe the bug

Expected behavior

Additional context

philipp-spiess commented Dec 11, 2023