Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autocomplete: Add Experimental hot streak mode #2118

Merged
merged 13 commits into from
Dec 11, 2023

Conversation

philipp-spiess
Copy link
Contributor

@philipp-spiess philipp-spiess commented Dec 5, 2023

Closes #2205

This PR adds a new feature flag and experimental config option to enable hot streak mode.

The idea is simple: When generating a completion (and this is mostly for single-line and works beautifully with dynamic-multilin), we let the LLM continue to generate more than just the current line and use follow-up lines to seed a cache.

Then, when a user accepts a completion and moves to the next line by pressing enter, we can instantly generate another completion (and thus avoid doing another LLM request and incurring latency). The result is a UX where monotonous, single-line completions appear a lot faster and allows you to fall into a tab+enter, tab+enter, … mode in which you can review a longer completion line by line. It’s also going to be much faster than multiline-completions because we can show the first line as soon as it is ready.

⚠️ This is currently limited by #2180

Test plan

  • Enable hot streak mode
  • Observe that after accepting a completion, pressing enter will show the continuation of that completion ✨ instantly ✨
Screen.Recording.2023-12-08.at.13.00.52.mov

@philipp-spiess philipp-spiess changed the title Experimental hot streak Autocomplete: Add Experimental hot streak mode Dec 8, 2023
@philipp-spiess philipp-spiess requested review from valerybugakov and a team December 8, 2023 17:49
@philipp-spiess philipp-spiess self-assigned this Dec 8, 2023
@philipp-spiess philipp-spiess marked this pull request as ready for review December 8, 2023 17:53
Copy link
Contributor

@dominiccooney dominiccooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock. This is very cool.

The code will be more robust over time if we tested .length === 0 instead of <= 0, and if we threw when the caller is obviously buggy (for example onChunk after onEnd noted inline.)

Some grammar nits inline. Kudos for cleaning up the spelling of multiline in many places.

Most confusing part for me is the document context updating "position." What that is and means isn't clear to me, the casual reader.

vscode/src/completions/get-current-doc-context.test.ts Outdated Show resolved Hide resolved
vscode/src/completions/get-current-doc-context.test.ts Outdated Show resolved Hide resolved
vscode/src/completions/get-current-doc-context.ts Outdated Show resolved Hide resolved
vscode/src/completions/get-current-doc-context.ts Outdated Show resolved Hide resolved
vscode/src/completions/providers/hot-streak.ts Outdated Show resolved Hide resolved
vscode/src/completions/providers/hot-streak.ts Outdated Show resolved Hide resolved
vscode/src/completions/text-processing/utils.ts Outdated Show resolved Hide resolved
vscode/src/completions/text-processing/utils.ts Outdated Show resolved Hide resolved
vscode/src/completions/text-processing/utils.ts Outdated Show resolved Hide resolved
@philipp-spiess
Copy link
Contributor Author

Changes are behind a flag anyways. :shipit:

maxTokensToSample: MAX_RESPONSE_TOKENS,
stopSequences: undefined,
timeoutMs: 15_000,
stopSequences: ['\n\n', '\n\r\n'],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@valerybugakov Was there a specific reason that previously, in the dynamic multiline branch, we would continue to generating even after two newlines?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, there were cases where two newlines were present in function/class bodies. Keeping these stop sequences cut them off too early.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give a concrete example? I feel like multi-line completions that try to implement more than one method in a class are always a bit too long for my personal preference but maybe I miss something?

e.g. this case:

class Foo {
  bar() {
    // ...
   }

  baz() {
    // ...
   }
}

Feels like it's too much for a single completion, wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have noticed in my review that this change applies to dynamic multiline completions. I think we should revert it because it leads to multiline completions being truncated too early.

Screenshot 2023-12-13 at 14 32 16

There's a \n\n after this line, and the function implementation continues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR to revert this change for dynamic multiline completions: #2326

@philipp-spiess philipp-spiess merged commit a0a665e into main Dec 11, 2023
15 checks passed
@philipp-spiess philipp-spiess deleted the ps/ac-hotstreak-mode branch December 11, 2023 15:44

const requestParams: CodeCompletionsParams = {
...(useExtendedGeneration ? MULTI_LINE_COMPLETION_ARGS : SINGLE_LINE_COMPLETION_ARGS),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Comment on lines +50 to +51
params.onCompletionReady({ ...completionItem, stopReason: 'streaming-truncation' })
resolve()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use onCompletionReady instead of resolve to return the completion item to the caller only to make the naming explicit, or are there other reasons?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also so we can keep the Promise running while the network request is still running. It's not great - I was thinking to switching to streaming interfaces - but it does feel simpler this way. The Promise will run until the request is either cancelled or completed, however the completion will be yielded as ready as soon as it is ready to be displayed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also so we can keep the Promise running while the network request is still running

It's not apparent why we want this behavior. Could you elaborate?

@@ -90,21 +109,21 @@ export async function fetchAndProcessDynamicMultilineCompletions(
}
} else {
/**
* This completion was started without the multline trigger at the end of current line.
* Check if the the first completion line ends with the multline trigger. If that's the case
* This completion was started without the multiline trigger at the end of current line.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty 🙃

obj: {
multiline,
},
})

if (completedCompletion) {
hotStreakExtractor?.extract(rawCompletion, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an object as the only argument here to explain the meaning of the second parameter would be helpful.

const prompt = this.createPrompt(snippets)

const model =
this.model === 'starcoder-hybrid'
? MODEL_MAP[multiline ? 'starcoder-16b' : 'starcoder-7b']
: MODEL_MAP[this.model]

const timeoutMs: number = multiline
const useExtendedGeneration = multiline || dynamicMultilineCompletions || hotStreak
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can extract the shared bits from provider.generateCompletions into a separate function.

Copy link
Member

@valerybugakov valerybugakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @philipp-spiess! I left some comments mainly focused on understanding the updated data flow.

Comment on lines +66 to +67
const completion = canUsePartialCompletion(unprocessedCompletion, eventualDynamicMultilineProviderOptions)
if (completion) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philipp-spiess, should we continue sampling tokens only if the next sampled line is not empty? Consider the following scenario:

const value =  'foo'

function veryLongFunction() {
  ...
}

We continue sampling tokens until canUsePartialCompletion returns a completion (or the request ends, but we're not interested in this case). This means that after generating 'foo', we will sample the whole function declaration, which might be too much. It won't hurt the UX but can affect our spending significantly. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case the stopSequence of \n\n should stop it. 😅 Do you think it's better if we add explicit checks for empty lines in the code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the part where you changed stop sequences for dynamic multline completions. I think we should remove stop sequences for dynamic multiline; then my concern there would be valid if we have both hot-streak and dynamic multiline enabled.

providers.map(provider => {
const completionReadyPromise = new Promise<InlineCompletionItemWithAnalytics[]>((resolve, reject) => {
provider
.generateCompletions(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Named object argument would be helpful to understand the meaning of the params.

)
)
providers.map(provider => {
const completionReadyPromise = new Promise<InlineCompletionItemWithAnalytics[]>((resolve, reject) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must create another promise wrapper to make this call compatible with onCompletionReady. I would like to understand why relying on the native promise resolve mechanism is insufficient here. E.g., await provider.generateCompletions(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess you're right. I thought that conceptually it would be better if we can communicate to the caller how long the network request is open. I had a future optimization in place where we start generation until one of the completions got rejected. If that's the case and the request is still running, we know that we're emitting hot streaks only and the generation can be terminated.

Another option is to return two promises: one for the main completion and one for the network request. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is to return two promises: one for the main completion and one for the network request. WDYT?

I like this option more because it's more explicit. It's unclear why this mechanism (long-running promise + callback) was chosen without comments explaining the rationale you shared above. Having two promises would be more self-explanatory

valerybugakov added a commit that referenced this pull request Dec 13, 2023
#2326)

- Stop sequences were intentionally removed from the dynamic multiline
completion request params.
- This PR reverts relevant changes from
#2118 to preserve this change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pre-caching follow-up completions
3 participants