-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autocomplete: Add Experimental hot streak mode #2118
Conversation
6bedbb7
to
67fe3be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock. This is very cool.
The code will be more robust over time if we tested .length === 0 instead of <= 0, and if we threw when the caller is obviously buggy (for example onChunk after onEnd noted inline.)
Some grammar nits inline. Kudos for cleaning up the spelling of multiline in many places.
Most confusing part for me is the document context updating "position." What that is and means isn't clear to me, the casual reader.
Changes are behind a flag anyways. |
maxTokensToSample: MAX_RESPONSE_TOKENS, | ||
stopSequences: undefined, | ||
timeoutMs: 15_000, | ||
stopSequences: ['\n\n', '\n\r\n'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@valerybugakov Was there a specific reason that previously, in the dynamic multiline branch, we would continue to generating even after two newlines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, there were cases where two newlines were present in function/class bodies. Keeping these stop sequences cut them off too early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give a concrete example? I feel like multi-line completions that try to implement more than one method in a class are always a bit too long for my personal preference but maybe I miss something?
e.g. this case:
class Foo {
bar() {
// ...
}
baz() {
// ...
}
}
Feels like it's too much for a single completion, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR to revert this change for dynamic multiline completions: #2326
|
||
const requestParams: CodeCompletionsParams = { | ||
...(useExtendedGeneration ? MULTI_LINE_COMPLETION_ARGS : SINGLE_LINE_COMPLETION_ARGS), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
params.onCompletionReady({ ...completionItem, stopReason: 'streaming-truncation' }) | ||
resolve() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use onCompletionReady
instead of resolve
to return the completion item to the caller only to make the naming explicit, or are there other reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also so we can keep the Promise
running while the network request is still running. It's not great - I was thinking to switching to streaming interfaces - but it does feel simpler this way. The Promise will run until the request is either cancelled or completed, however the completion will be yielded as ready as soon as it is ready to be displayed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also so we can keep the Promise running while the network request is still running
It's not apparent why we want this behavior. Could you elaborate?
@@ -90,21 +109,21 @@ export async function fetchAndProcessDynamicMultilineCompletions( | |||
} | |||
} else { | |||
/** | |||
* This completion was started without the multline trigger at the end of current line. | |||
* Check if the the first completion line ends with the multline trigger. If that's the case | |||
* This completion was started without the multiline trigger at the end of current line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ty 🙃
obj: { | ||
multiline, | ||
}, | ||
}) | ||
|
||
if (completedCompletion) { | ||
hotStreakExtractor?.extract(rawCompletion, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having an object as the only argument here to explain the meaning of the second parameter would be helpful.
const prompt = this.createPrompt(snippets) | ||
|
||
const model = | ||
this.model === 'starcoder-hybrid' | ||
? MODEL_MAP[multiline ? 'starcoder-16b' : 'starcoder-7b'] | ||
: MODEL_MAP[this.model] | ||
|
||
const timeoutMs: number = multiline | ||
const useExtendedGeneration = multiline || dynamicMultilineCompletions || hotStreak |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can extract the shared bits from provider.generateCompletions
into a separate function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, @philipp-spiess! I left some comments mainly focused on understanding the updated data flow.
const completion = canUsePartialCompletion(unprocessedCompletion, eventualDynamicMultilineProviderOptions) | ||
if (completion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@philipp-spiess, should we continue sampling tokens only if the next sampled line is not empty? Consider the following scenario:
const value = █ 'foo'
function veryLongFunction() {
...
}
We continue sampling tokens until canUsePartialCompletion
returns a completion (or the request ends, but we're not interested in this case). This means that after generating 'foo'
, we will sample the whole function declaration, which might be too much. It won't hurt the UX but can affect our spending significantly. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case the stopSequence
of \n\n
should stop it. 😅 Do you think it's better if we add explicit checks for empty lines in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed the part where you changed stop sequences for dynamic multline completions. I think we should remove stop sequences for dynamic multiline; then my concern there would be valid if we have both hot-streak and dynamic multiline enabled.
providers.map(provider => { | ||
const completionReadyPromise = new Promise<InlineCompletionItemWithAnalytics[]>((resolve, reject) => { | ||
provider | ||
.generateCompletions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Named object argument would be helpful to understand the meaning of the params.
) | ||
) | ||
providers.map(provider => { | ||
const completionReadyPromise = new Promise<InlineCompletionItemWithAnalytics[]>((resolve, reject) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must create another promise wrapper to make this call compatible with onCompletionReady
. I would like to understand why relying on the native promise resolve mechanism is insufficient here. E.g., await provider.generateCompletions(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess you're right. I thought that conceptually it would be better if we can communicate to the caller how long the network request is open. I had a future optimization in place where we start generation until one of the completions got rejected. If that's the case and the request is still running, we know that we're emitting hot streaks only and the generation can be terminated.
Another option is to return two promises: one for the main completion and one for the network request. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is to return two promises: one for the main completion and one for the network request. WDYT?
I like this option more because it's more explicit. It's unclear why this mechanism (long-running promise + callback) was chosen without comments explaining the rationale you shared above. Having two promises would be more self-explanatory
Closes #2205
This PR adds a new feature flag and experimental config option to enable hot streak mode.
The idea is simple: When generating a completion (and this is mostly for single-line and works beautifully with dynamic-multilin), we let the LLM continue to generate more than just the current line and use follow-up lines to seed a cache.
Then, when a user accepts a completion and moves to the next line by pressing enter, we can instantly generate another completion (and thus avoid doing another LLM request and incurring latency). The result is a UX where monotonous, single-line completions appear a lot faster and allows you to fall into a tab+enter, tab+enter, … mode in which you can review a longer completion line by line. It’s also going to be much faster than multiline-completions because we can show the first line as soon as it is ready.
Test plan
Screen.Recording.2023-12-08.at.13.00.52.mov