Autocomplete: Use streaming to early-terminate Anthropic requests #723

philipp-spiess · 2023-08-17T13:10:39Z

This adds a backward-compatible support for using the new streaming backend added in sourcegraph/sourcegraph#55967.

It currently only works in Node.js because the node-fetch polyfill we use returns native Node streams and not unified browser streams. However given that browser support is not official right now, I decided that for now (especially until we know this experiment is giving us wins) it's not worth to add a separate implementation.

This all works by the new stream bool option on the server. When set and the server has the changes in sourcegraph/sourcegraph#55967, we return an SSE event stream. On the client, we can check the header values for wether we got a streaming response and use that.

Architecture wise this is not super nice yet and duplicates some code of post-processing to find out if the request is already finished. There are three econditions to detect wether a streaming response is "fullfilled" right now:

A single-line completion can be terminated if we have one non-empty full line of response
A multi-line completion can be terminated if the multi-line truncation logic (which currently uses indentation but might later use tree sitter) starts to truncate the answer
1. A multi-line completion can be terminated if a line inside the new completion matches one that is already inside the document

Test plan

Tested in combination with sourcegraph/sourcegraph#55967 and sourcegraph.com (which is behind this change)

Run sourcegraph/sourcegraph#55967 as a local SG server
Set up Cody VS Code to connect to localhost
Check out Autocomplete: Use streaming to early-terminate Anthropic requests #723
Start a dev instance and see that completions will work and might be faster
- I've tested this with our completion testing tool but since we only have a handful of examples and don't do a bazillion test runs, there's a log of variance in the early results.
- We can roll this out as an A/B test very easily (cc @taras-yemets)

philipp-spiess · 2023-08-17T14:03:15Z

lib/shared/src/sourcegraph-api/completions/client.ts

+    data: string
+}
+
+async function* createSSEDecoder(iterator: AsyncIterableIterator<BufferSource>): AsyncGenerator<SSEMessage> {


Unimportant information: I think this is the first time I’m using an async generator function 😅

Going advanced!

valerybugakov

Took some time to understand the details of the new streaming logic. Looks great, and it should significantly reduce the latency for multiline autocomplete! 💪

valerybugakov · 2023-08-18T09:37:08Z

lib/shared/src/sourcegraph-api/completions/client.ts

+
+        if (isStreamingResponse) {
+            try {
+                const iterator = createSSEDecoder(response.body as any)


Based on this couple of discussions, it looks like we can do the following to avoid any casting:

import type { ReadableStream } from 'node:stream/web' const iterator = createSSEDecoder(response.body as ReadableStream<Uint8Array>) async function* createSSEDecoder(iterator: ReadableStream<Uint8Array>): AsyncGenerator<SSEMessage> {}

Oh nice! I also think I need to add some feature checking here. I just thought of that: If Node 18 has native fetch, does that mean we'll have a proper web stream here instead of the node stream and would that mean that e.g. in Agent or in a future VS Code release we'll have issues?

This polyfill is such a mess haha

@valerybugakov I think there's some confusion here, the web ReadableStream is not available for us in node 16 (VS Code) and the node-fetch polyfill returns a Node Stream instead of a compatible ReadableStream (https://github.com/node-fetch/node-fetch#streams) that's why I went with the iterator pattern which apparently is available on Node streams but not native web streams.

The type cast is necessary because our environment thinks that fetch is a standards compliant implementation which it is not. That's also why I’m gating this feature to only work on node for now.

I changed the code now to not rely on the global fetch but instead get fetch from isomorphic-fetch. In Node environments, this will now always resolve to node-fetch and the incompatible buffer polyfill, so we only have to handle this case for now (I was worried that not monkey patching global.fetch could lead to issues in newer nodejs versions due to them having added proper fetch support since. But I tested with the completions generation script (using the local Node 20) and it worked fine.

I'll add a comment

lib/shared/src/sourcegraph-api/completions/client.ts

valerybugakov · 2023-08-18T10:12:51Z

lib/shared/src/sourcegraph-api/completions/client.ts

+        let index: number
+        while ((index = buffer.indexOf('\n\n')) >= 0) {
+            const message = buffer.slice(0, index)
+            buffer = buffer.slice(index + 2)


Why do we add two here? 🤔

we check for two newlines and then slice it. I can make a const SSE_BREAK and use SSE_BREAK.length so it's clearer :)

Nah, I figured it's related to new lines, but my brain processed them as '\n\n'.length === 4, so I was confused 😛

vscode/src/completions/providers/anthropic.ts

valerybugakov · 2023-08-18T10:46:34Z

vscode/src/completions/providers/anthropic.ts

+        // eslint-disable-next-line @typescript-eslint/no-misused-promises, no-async-promise-executor
+        return new Promise(async (resolve, reject) => {
+            try {
+                const abortController = forkSignal(abortSignal)


We already have a network controller one level higher in the call stack:

cody/vscode/src/completions/request-manager.ts

Lines 59 to 62 in 8ca07b9

// We forward a different abort controller to the network request so we

// can cancel the network request independently of the user cancelling

// the completion.

const networkRequestAbortController = new AbortController()

Would it make sense to pass it down the call tree and rely on networkRequestAbortController.abort() to reduce the number of entities instead of forking it here?

Yeah I thought about this, the issue here is that for the request manager, there is just one multi-line request with n results where as for this logic, we only want to cancel one of the n requests, hence the forking. If one request resolves early, the others might still need to sample a few more tokens.

If the other requests are helpful, they will be accessible via the UI — that makes sense. Let's add your comment to sources then 👍

Yep. There's also a follow-up consideration to not wait for all three results for the UI (where we only show one). That was a bit out of scope of this PR though but I'll make sure to comment.

valerybugakov · 2023-08-18T10:46:57Z

vscode/src/completions/streaming.ts

+/**
+ * Evaluates a partial completion response and returns true when we can already use it. This is used
+ * to terminate any streaming responses where we can get a token-by-token access to the result and
+ * want to terminate as soon as stop conditions are triggered.
+ *
+ * Right now this handles two cases:
+ *  1. When a single line completion is requested, it terminates after the first full line was
+ *     received.
+ *  2. For a multi-line completion, it terminates when the completion will be truncated based on the
+ *     multi-line indentation logic or an eventual match with a line already in the editor.
+ */
+export function canUsePartialCompletion(


Comments 💜

…opic-streaming

Co-authored-by: Valery Bugakov <skymk1@gmail.com>

…opic-streaming

…cegraph/cody into ps/autocomplete-anthropic-streaming

philipp-spiess · 2023-08-18T15:27:17Z

lib/shared/src/sourcegraph-api/completions/client.ts

+        const isFeatureFlagEnabled = this.featureFlagProvider
+            ? await this.featureFlagProvider.evaluateFeatureFlag(FeatureFlag.CodyAutocompleteStreamingResponse)
+            : false


I validated this works by some console.log. We have to create the test on dotcom yet though, but we'll have to wair for the server to support streaming anyways (that PR is already merged) cc @taras-yemets

Autocomplete: Use streaming to early-terminate Anthropic requests

557c427

philipp-spiess mentioned this pull request Aug 17, 2023

Autocomplete: Add option to stream code completion results sourcegraph/sourcegraph#55967

Closed

philipp-spiess added 4 commits August 17, 2023 15:26

Fix tests and avoid unnecessary work

ed40109

Clarify gating

44a8cfa

Also terminate if the suffix truncation is matching

37723db

TS fixes

9aecc80

philipp-spiess requested review from valerybugakov and a team August 17, 2023 14:00

philipp-spiess self-assigned this Aug 17, 2023

philipp-spiess marked this pull request as ready for review August 17, 2023 14:00

Add change log entry

340a533

philipp-spiess commented Aug 17, 2023

View reviewed changes

Typos

1934967

philipp-spiess mentioned this pull request Aug 17, 2023

vscode: add feature flag provider #703

Merged

philipp-spiess requested a review from vovakulikov August 17, 2023 17:27

valerybugakov approved these changes Aug 18, 2023

View reviewed changes

philipp-spiess and others added 8 commits August 18, 2023 13:47

Extract SSE terminator

470ea12

Merge remote-tracking branch 'origin/main' into ps/autocomplete-anthr…

4b81f57

…opic-streaming

Make sure we always use node-fetch in node environments

aac2a8f

Test SSE handler

99c6453

Update vscode/src/completions/providers/anthropic.ts

2b9b44b

Co-authored-by: Valery Bugakov <skymk1@gmail.com>

Merge remote-tracking branch 'origin/main' into ps/autocomplete-anthr…

bb3c839

…opic-streaming

Wire up feature test provider to us it here

6025dc4

Merge branch 'ps/autocomplete-anthropic-streaming' of github.com:sour…

e55acd9

…cegraph/cody into ps/autocomplete-anthropic-streaming

philipp-spiess commented Aug 18, 2023

View reviewed changes

philipp-spiess merged commit f469aad into main Aug 18, 2023
9 checks passed

philipp-spiess deleted the ps/autocomplete-anthropic-streaming branch August 18, 2023 16:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autocomplete: Use streaming to early-terminate Anthropic requests #723

Autocomplete: Use streaming to early-terminate Anthropic requests #723

philipp-spiess commented Aug 17, 2023 •

edited

Loading

philipp-spiess Aug 17, 2023

valerybugakov Aug 18, 2023

valerybugakov left a comment

valerybugakov Aug 18, 2023

philipp-spiess Aug 18, 2023

philipp-spiess Aug 18, 2023

valerybugakov Aug 18, 2023

philipp-spiess Aug 18, 2023

valerybugakov Aug 18, 2023

valerybugakov Aug 18, 2023

philipp-spiess Aug 18, 2023

valerybugakov Aug 18, 2023

philipp-spiess Aug 18, 2023

valerybugakov Aug 18, 2023

philipp-spiess Aug 18, 2023

	// We forward a different abort controller to the network request so we
	// can cancel the network request independently of the user cancelling
	// the completion.
	const networkRequestAbortController = new AbortController()

Autocomplete: Use streaming to early-terminate Anthropic requests #723

Autocomplete: Use streaming to early-terminate Anthropic requests #723

Conversation

philipp-spiess commented Aug 17, 2023 • edited Loading

Test plan

Choose a reason for hiding this comment

Choose a reason for hiding this comment

valerybugakov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philipp-spiess commented Aug 17, 2023 •

edited

Loading