Autocomplete: Add Experimental hot streak mode #2118

philipp-spiess · 2023-12-05T17:42:36Z

This PR adds a new feature flag and experimental config option to enable hot streak mode.

The idea is simple: When generating a completion (and this is mostly for single-line and works beautifully with dynamic-multilin), we let the LLM continue to generate more than just the current line and use follow-up lines to seed a cache.

Then, when a user accepts a completion and moves to the next line by pressing enter, we can instantly generate another completion (and thus avoid doing another LLM request and incurring latency). The result is a UX where monotonous, single-line completions appear a lot faster and allows you to fall into a tab+enter, tab+enter, … mode in which you can review a longer completion line by line. It’s also going to be much faster than multiline-completions because we can show the first line as soon as it is ready.

⚠️ This is currently limited by #2180

Test plan

Enable hot streak mode
Observe that after accepting a completion, pressing enter will show the continuation of that completion ✨ instantly ✨

Screen.Recording.2023-12-08.at.13.00.52.mov

dominiccooney

Approving to unblock. This is very cool.

The code will be more robust over time if we tested .length === 0 instead of <= 0, and if we threw when the caller is obviously buggy (for example onChunk after onEnd noted inline.)

Some grammar nits inline. Kudos for cleaning up the spelling of multiline in many places.

Most confusing part for me is the document context updating "position." What that is and means isn't clear to me, the casual reader.

vscode/src/completions/get-current-doc-context.test.ts

vscode/src/completions/get-current-doc-context.ts

vscode/src/completions/providers/hot-streak.ts

vscode/src/completions/text-processing/utils.ts

philipp-spiess · 2023-12-11T14:48:06Z

Changes are behind a flag anyways.

philipp-spiess · 2023-12-11T15:16:44Z

vscode/src/completions/providers/fireworks.ts

    maxTokensToSample: MAX_RESPONSE_TOKENS,
-    stopSequences: undefined,
-    timeoutMs: 15_000,
+    stopSequences: ['\n\n', '\n\r\n'],


@valerybugakov Was there a specific reason that previously, in the dynamic multiline branch, we would continue to generating even after two newlines?

Yep, there were cases where two newlines were present in function/class bodies. Keeping these stop sequences cut them off too early.

Can you give a concrete example? I feel like multi-line completions that try to implement more than one method in a class are always a bit too long for my personal preference but maybe I miss something?

e.g. this case:

class Foo { bar() { // ... } baz() { // ... } }

Feels like it's too much for a single completion, wdyt?

I should have noticed in my review that this change applies to dynamic multiline completions. I think we should revert it because it leads to multiline completions being truncated too early.

There's a \n\n after this line, and the function implementation continues.

PR to revert this change for dynamic multiline completions: #2326

valerybugakov · 2023-12-12T03:09:28Z

vscode/src/completions/providers/anthropic.ts


        const requestParams: CodeCompletionsParams = {
+            ...(useExtendedGeneration ? MULTI_LINE_COMPLETION_ARGS : SINGLE_LINE_COMPLETION_ARGS),


valerybugakov · 2023-12-12T03:19:42Z

vscode/src/completions/providers/fetch-and-process-completions.ts

+                params.onCompletionReady({ ...completionItem, stopReason: 'streaming-truncation' })
+                resolve()


Do we use onCompletionReady instead of resolve to return the completion item to the caller only to make the naming explicit, or are there other reasons?

It's also so we can keep the Promise running while the network request is still running. It's not great - I was thinking to switching to streaming interfaces - but it does feel simpler this way. The Promise will run until the request is either cancelled or completed, however the completion will be yielded as ready as soon as it is ready to be displayed.

It's also so we can keep the Promise running while the network request is still running

It's not apparent why we want this behavior. Could you elaborate?

valerybugakov · 2023-12-12T03:20:29Z

vscode/src/completions/providers/fetch-and-process-completions.ts

@@ -90,21 +109,21 @@ export async function fetchAndProcessDynamicMultilineCompletions(
                        }
                    } else {
                        /**
-                         * This completion was started without the multline trigger at the end of current line.
-                         * Check if the the first completion line ends with the multline trigger. If that's the case
+                         * This completion was started without the multiline trigger at the end of current line.


valerybugakov · 2023-12-12T03:22:47Z

vscode/src/completions/providers/fetch-and-process-completions.ts

                        obj: {
                            multiline,
                        },
                    })

+                    if (completedCompletion) {
+                        hotStreakExtractor?.extract(rawCompletion, false)


Having an object as the only argument here to explain the meaning of the second parameter would be helpful.

valerybugakov · 2023-12-12T03:26:47Z

vscode/src/completions/providers/fireworks.ts

        const prompt = this.createPrompt(snippets)

        const model =
            this.model === 'starcoder-hybrid'
                ? MODEL_MAP[multiline ? 'starcoder-16b' : 'starcoder-7b']
                : MODEL_MAP[this.model]

-        const timeoutMs: number = multiline
+        const useExtendedGeneration = multiline || dynamicMultilineCompletions || hotStreak


We can extract the shared bits from provider.generateCompletions into a separate function.

valerybugakov

Great work, @philipp-spiess! I left some comments mainly focused on understanding the updated data flow.

valerybugakov · 2023-12-12T03:46:04Z

vscode/src/completions/providers/hot-streak.ts

+            const completion = canUsePartialCompletion(unprocessedCompletion, eventualDynamicMultilineProviderOptions)
+            if (completion) {


@philipp-spiess, should we continue sampling tokens only if the next sampled line is not empty? Consider the following scenario:

const value = █ 'foo' function veryLongFunction() { ... }

We continue sampling tokens until canUsePartialCompletion returns a completion (or the request ends, but we're not interested in this case). This means that after generating 'foo', we will sample the whole function declaration, which might be too much. It won't hurt the UX but can affect our spending significantly. WDYT?

I think in this case the stopSequence of \n\n should stop it. 😅 Do you think it's better if we add explicit checks for empty lines in the code?

I missed the part where you changed stop sequences for dynamic multline completions. I think we should remove stop sequences for dynamic multiline; then my concern there would be valid if we have both hot-streak and dynamic multiline enabled.

valerybugakov · 2023-12-12T03:50:38Z

vscode/src/completions/request-manager.ts

+            providers.map(provider => {
+                const completionReadyPromise = new Promise<InlineCompletionItemWithAnalytics[]>((resolve, reject) => {
+                    provider
+                        .generateCompletions(


Named object argument would be helpful to understand the meaning of the params.

valerybugakov · 2023-12-12T03:52:15Z

vscode/src/completions/request-manager.ts

-                )
-            )
+            providers.map(provider => {
+                const completionReadyPromise = new Promise<InlineCompletionItemWithAnalytics[]>((resolve, reject) => {


We must create another promise wrapper to make this call compatible with onCompletionReady. I would like to understand why relying on the native promise resolve mechanism is insufficient here. E.g., await provider.generateCompletions(...)

Hmm, I guess you're right. I thought that conceptually it would be better if we can communicate to the caller how long the network request is open. I had a future optimization in place where we start generation until one of the completions got rejected. If that's the case and the request is still running, we know that we're emitting hot streaks only and the generation can be terminated.

Another option is to return two promises: one for the main completion and one for the network request. WDYT?

Another option is to return two promises: one for the main completion and one for the network request. WDYT?

I like this option more because it's more explicit. It's unclear why this mechanism (long-running promise + callback) was chosen without comments explaining the rationale you shared above. Having two promises would be more self-explanatory

#2326) - Stop sequences were intentionally removed from the dynamic multiline completion request params. - This PR reverts relevant changes from #2118 to preserve this change.

philipp-spiess added 2 commits December 8, 2023 11:07

Experimental hot streak

500f368

WHY IS IT SO HARD TO WRAP MY HEAD AROUND THAT

67fe3be

philipp-spiess force-pushed the ps/ac-hotstreak-mode branch from 6bedbb7 to 67fe3be Compare December 8, 2023 10:07

philipp-spiess added 3 commits December 8, 2023 11:32

Clean up request manager

2beb0c7

Lots of fixes and updates

66f11da

Make it work with dynamic multiline and add feature flag

48d58d1

philipp-spiess changed the title ~~Experimental hot streak~~ Autocomplete: Add Experimental hot streak mode Dec 8, 2023

philipp-spiess requested review from valerybugakov and a team December 8, 2023 17:49

philipp-spiess self-assigned this Dec 8, 2023

philipp-spiess marked this pull request as ready for review December 8, 2023 17:53

Fixes

ba50236

dominiccooney approved these changes Dec 10, 2023

View reviewed changes

philipp-spiess added 2 commits December 11, 2023 14:50

Adress PR feedback

2229c6b

Merge remote-tracking branch 'origin/main' into ps/ac-hotstreak-mode

173ff87

Use params based on feature flags

7a53fa8

philipp-spiess commented Dec 11, 2023

View reviewed changes

philipp-spiess added 4 commits December 11, 2023 16:18

Rm whitespace

3268e3f

Whoopsie

329a860

Merge remote-tracking branch 'origin/main' into ps/ac-hotstreak-mode

8b7b673

Update snapshots

f80a18d

philipp-spiess merged commit a0a665e into main Dec 11, 2023
15 checks passed

philipp-spiess deleted the ps/ac-hotstreak-mode branch December 11, 2023 15:44

valerybugakov reviewed Dec 12, 2023

View reviewed changes

valerybugakov mentioned this pull request Dec 13, 2023

Autocomplete: remove stop sequences for dynamuc multliline completions #2326

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autocomplete: Add Experimental hot streak mode #2118

Autocomplete: Add Experimental hot streak mode #2118

philipp-spiess commented Dec 5, 2023 •

edited

dominiccooney left a comment •

edited

philipp-spiess commented Dec 11, 2023

philipp-spiess Dec 11, 2023

valerybugakov Dec 12, 2023

philipp-spiess Dec 12, 2023

valerybugakov Dec 13, 2023 •

edited

valerybugakov Dec 13, 2023

valerybugakov Dec 12, 2023

valerybugakov Dec 12, 2023

philipp-spiess Dec 12, 2023

valerybugakov Dec 13, 2023

valerybugakov Dec 12, 2023 •

edited

valerybugakov Dec 12, 2023 •

edited

valerybugakov Dec 12, 2023

valerybugakov left a comment

valerybugakov Dec 12, 2023

philipp-spiess Dec 12, 2023

valerybugakov Dec 13, 2023

valerybugakov Dec 12, 2023

valerybugakov Dec 12, 2023

philipp-spiess Dec 12, 2023

valerybugakov Dec 13, 2023 •

edited


		const requestParams: CodeCompletionsParams = {
		...(useExtendedGeneration ? MULTI_LINE_COMPLETION_ARGS : SINGLE_LINE_COMPLETION_ARGS),

		params.onCompletionReady({ ...completionItem, stopReason: 'streaming-truncation' })
		resolve()

		const completion = canUsePartialCompletion(unprocessedCompletion, eventualDynamicMultilineProviderOptions)
		if (completion) {

Autocomplete: Add Experimental hot streak mode #2118

Autocomplete: Add Experimental hot streak mode #2118

Conversation

philipp-spiess commented Dec 5, 2023 • edited

Test plan

dominiccooney left a comment • edited

Choose a reason for hiding this comment

philipp-spiess commented Dec 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

valerybugakov Dec 13, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

valerybugakov Dec 12, 2023 • edited

Choose a reason for hiding this comment

valerybugakov Dec 12, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

valerybugakov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

valerybugakov Dec 13, 2023 • edited

Choose a reason for hiding this comment

philipp-spiess commented Dec 5, 2023 •

edited

dominiccooney left a comment •

edited

valerybugakov Dec 13, 2023 •

edited

valerybugakov Dec 12, 2023 •

edited

valerybugakov Dec 12, 2023 •

edited

valerybugakov Dec 13, 2023 •

edited