-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLM-enhanced keyword context #52815
LLM-enhanced keyword context #52815
Conversation
5c05cfc
to
858e798
Compare
404aafa
to
8b096d9
Compare
5f2ce76
to
3d69d7f
Compare
@@ -13,10 +15,17 @@ const DEFAULT_CHAT_COMPLETION_PARAMETERS: Omit<CompletionParameters, 'messages'> | |||
export class ChatClient { | |||
constructor(private completions: SourcegraphCompletionsClient) {} | |||
|
|||
public chat(messages: Message[], cb: CompletionCallbacks): () => void { | |||
public chat(messages: Message[], cb: CompletionCallbacks, params?: Partial<ChatParameters>): () => void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass through the underlying completions parameters, so we can set things like temperature.
@@ -20,18 +21,19 @@ export interface TranscriptJSON { | |||
} | |||
|
|||
/** | |||
* A transcript of a conversation between a human and an assistant. | |||
* The "model" class that tracks the call and response of the Cody chat box. | |||
* Any "controller" logic belongs outside of this class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We simplify the Transcript
class to be a pure model class and avoid having seemingly non-mutative methods like toPrompt
(below) trigger surprising mutations of internal state.
@@ -1,97 +1,68 @@ | |||
import { ContextMessage, ContextFile } from '../../codebase-context/messages' | |||
import { PromptMixin } from '../../prompt/prompt-mixin' | |||
import { Message } from '../../sourcegraph-api' | |||
|
|||
import { ChatMessage, InteractionMessage } from './messages' | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, we also make the Interaction
class a simple model class, without the need to hackily trigger the computation of cachedContextFiles
on creation.
|
||
import { ChatMessage, InteractionMessage } from './messages' | ||
|
||
export interface InteractionJSON { | ||
humanMessage: InteractionMessage | ||
assistantMessage: InteractionMessage | ||
context: ContextMessage[] | ||
fullContext: ContextMessage[] | ||
usedContextFiles: ContextFile[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having context
and cachedContextFiles
, where it's confusing when to use either field, we have the following fields:
fullContext
: the complete set of context messages we'd read if we had an infinite context window. This is set when the interaction is first created, before the prompt is computed.usedContextFiles
is the set of context files we actually read into the actual finite context window. This is set after we've computed the prompt (and therefore determined how many context files fit into the prompt context window).
@@ -21,7 +21,9 @@ export class CodebaseContext { | |||
private codebase: string | undefined, | |||
private embeddings: EmbeddingsSearch | null, | |||
private keywords: KeywordContextFetcher | null, | |||
private unifiedContextFetcher?: UnifiedContextFetcher | null | |||
private filenames: FilenameContextFetcher | null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introducing a new type of local context fetcher that looks at the filename.
private unifiedContextFetcher?: UnifiedContextFetcher | null | ||
private filenames: FilenameContextFetcher | null, | ||
private unifiedContextFetcher?: UnifiedContextFetcher | null, | ||
private rerank?: (query: string, results: ContextResult[]) => Promise<ContextResult[]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduce a reranking mechanism, to rerank results from different context providers.
'--max-filesize', | ||
'1M', | ||
'10K', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limit to smaller files.
'1M', | ||
'10K', | ||
'--max-depth', | ||
'10', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limit search depth to 10
return messagePairs.reverse().flat() | ||
} | ||
|
||
private async userQueryToExpandedKeywords(query: string): Promise<Map<string, Term>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have modified keyword search to use an LLM to generate a keyword query, rather than stemming/lemmatizing every word in the user query. This has the following benefits:
- The LLM can include synonyms if it deems appropriate
- The number of keywords is restricted to 3-5. Given how the keyword search is implemented (a regex OR query with all keywords), this greatly reduces the cost of the search for long user queries.
db1d9ce
to
1724f50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question: how does perf look? We're adding a few LLM roundtrips here (generate keywords, generate filename fragments, rerank). These should all be using the fast model, but they're also on the critical path.
if completionsConfig.FastChatModel == "" { | ||
completionsConfig.FastChatModel = completionsConfig.ChatModel | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we actually default to the slow chat model? This adds a few roundtrips to the LLM, and with most slow models, that is >2 seconds each. If a fast model is not configured, I expect the UX to be too slow to be usable and we should fall back to the current, single-LLM-call pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance impact is mixed. Here are the factors at play:
- Slower: 2 additional "fast" LLM round trips (one for generating the keyword query, the other for reranking)
- Faster: For longer user queries, the keyword query generated by the LLM is smaller and therefore faster to execute given our ripgrep-based implementation.
Data points
"What does this project do?"
Time-to-first-character:
- Previous keyword search: 6s
- With claude: 12s
- With claude-instant: 9s
Response quality:
Before | After |
![]() |
![]() |
"Which files implement saml auth?"
Time-to-first-character:
- Previous keyword search: 21s
- With claude: 14s
- With claude-instant: 12s
Response quality:
Before | After |
![]() |
![]() |
"which files should I edit to create a new cody recipe?"
Time-to-first-character:
- Previous keyword search: 19s
- With claude: 17s
- With claude-instant: 15s
Response quality:
Before | After |
![]() |
![]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize ripgrep was this slow, ~10-15 sec still feels really long to be waiting for context!
As a data point, the search "Which files implement saml auth?" takes ~500 ms using S2 keyword search. It feels like App should eventually power keyword search (using Zoekt or some simple inverted index?) so we can really speed this up.
// When Fast is true, then it is used as a hint to prefer a model | ||
// that is faster (but probably "dumber"). | ||
Fast bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this usually determined by the model name (which is included in CompletionRequestParameters
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our API actually throws away the client-specified model name currently:
I think I agree with this decision currently. The client should be agnostic of the underlying LLM. The server should provide the endpoints for the different LLMs that are used, which currently are:
- Chat endpoint
- Code completion endpoint
- (added in this PR) Fast chat endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice. I agree that seems ideal. In that case, I'd think we should remove the model name from the params entirely then. Will log it as a followup.
Responded inline on the perf questions: #52815 (comment) |
I really like the LLM-based reranker approach, it's something I've been wanting to test more generally to improve result quality. Using an LLM for "query rewriting" also makes sense to me, but I'm really curious what it ends up doing in practice. Does it just pick out 3-5 key words? Does it add synonyms too? I wonder how much the output differs from our approach here where we just use an aggressive stopwords list: #52233. What are our plans to test this? It'd be great to figure out what parts are most helpful, and should be integrated into Sourcegraph keyword search (or even embeddings, if we find reranking is super useful!) |
* Adds `fast` parameter to completions endpoint and `fastChatModel` param to site config. This is intended for faster chat models that are useful for simple generations. * Use the fast chat model to generate a local keyword search. This replaces the old keyword search mechanism, which stemmed/lemmatized every word in the user query. * Use the fast chat model to generate a small set of file fragments to search for. This is mainly useful for surfacing READMEs for questions like "What does this project do?" * Update the set of "files read" presented in the UI to include only those files actually read into the context window. Previously, we were showing all files returned by the context fetcher, but in reality, only a subset of these would fit into the context window.
This PR is best reviewed commit by commit. Explanatory comments have been added to the diff.
Summary of changes:
fast
parameter to completions endpoint andfastChatModel
param to site config. This is intended for faster chat models that are useful for simple generations.Pre-merge TODO
Post-merge TODO
fastChatModel": "claude-instant-v1"
Test plan
Updated unit and integration tests. Tested locally on repositories of several sizes.