Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM-enhanced keyword context #52815

Merged
merged 12 commits into from
Jun 12, 2023
Merged

LLM-enhanced keyword context #52815

merged 12 commits into from
Jun 12, 2023

Conversation

beyang
Copy link
Member

@beyang beyang commented Jun 2, 2023

This PR is best reviewed commit by commit. Explanatory comments have been added to the diff.

Summary of changes:

  • Adds fast parameter to completions endpoint and fastChatModel param to site config. This is intended for faster chat models that are useful for simple generations.
  • Use the fast chat model to generate a local keyword search. This replaces the old keyword search mechanism, which stemmed/lemmatized every word in the user query.
  • Use the fast chat model to generate a small set of file fragments to search for. This is mainly useful for surfacing READMEs for questions like "What does this project do?"
  • Update the set of "files read" presented in the UI to include only those files actually read into the context window. Previously, we were showing all files returned by the context fetcher, but in reality, only a subset of these would fit into the context window.

Pre-merge TODO

  • Verify temperature is 0 on all chat requests that should be deterministic

Post-merge TODO

  • update config for sourcegraph.com to set fastChatModel": "claude-instant-v1"

Test plan

Updated unit and integration tests. Tested locally on repositories of several sizes.

@cla-bot cla-bot bot added the cla-signed label Jun 2, 2023
@beyang beyang changed the title Bl/cody expanded keyword LLM-enhanced keyword context Jun 2, 2023
@beyang beyang force-pushed the bl/cody-expanded-keyword branch 7 times, most recently from 5c05cfc to 858e798 Compare June 6, 2023 00:17
@beyang beyang requested a review from a team June 6, 2023 15:16
@beyang beyang force-pushed the bl/cody-expanded-keyword branch 7 times, most recently from 404aafa to 8b096d9 Compare June 9, 2023 23:43
@@ -13,10 +15,17 @@ const DEFAULT_CHAT_COMPLETION_PARAMETERS: Omit<CompletionParameters, 'messages'>
export class ChatClient {
constructor(private completions: SourcegraphCompletionsClient) {}

public chat(messages: Message[], cb: CompletionCallbacks): () => void {
public chat(messages: Message[], cb: CompletionCallbacks, params?: Partial<ChatParameters>): () => void {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass through the underlying completions parameters, so we can set things like temperature.

@@ -20,18 +21,19 @@ export interface TranscriptJSON {
}

/**
* A transcript of a conversation between a human and an assistant.
* The "model" class that tracks the call and response of the Cody chat box.
* Any "controller" logic belongs outside of this class.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We simplify the Transcript class to be a pure model class and avoid having seemingly non-mutative methods like toPrompt (below) trigger surprising mutations of internal state.

@@ -1,97 +1,68 @@
import { ContextMessage, ContextFile } from '../../codebase-context/messages'
import { PromptMixin } from '../../prompt/prompt-mixin'
import { Message } from '../../sourcegraph-api'

import { ChatMessage, InteractionMessage } from './messages'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, we also make the Interaction class a simple model class, without the need to hackily trigger the computation of cachedContextFiles on creation.


import { ChatMessage, InteractionMessage } from './messages'

export interface InteractionJSON {
humanMessage: InteractionMessage
assistantMessage: InteractionMessage
context: ContextMessage[]
fullContext: ContextMessage[]
usedContextFiles: ContextFile[]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having context and cachedContextFiles, where it's confusing when to use either field, we have the following fields:

  • fullContext: the complete set of context messages we'd read if we had an infinite context window. This is set when the interaction is first created, before the prompt is computed.
  • usedContextFiles is the set of context files we actually read into the actual finite context window. This is set after we've computed the prompt (and therefore determined how many context files fit into the prompt context window).

@@ -21,7 +21,9 @@ export class CodebaseContext {
private codebase: string | undefined,
private embeddings: EmbeddingsSearch | null,
private keywords: KeywordContextFetcher | null,
private unifiedContextFetcher?: UnifiedContextFetcher | null
private filenames: FilenameContextFetcher | null,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing a new type of local context fetcher that looks at the filename.

private unifiedContextFetcher?: UnifiedContextFetcher | null
private filenames: FilenameContextFetcher | null,
private unifiedContextFetcher?: UnifiedContextFetcher | null,
private rerank?: (query: string, results: ContextResult[]) => Promise<ContextResult[]>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce a reranking mechanism, to rerank results from different context providers.

'--max-filesize',
'1M',
'10K',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limit to smaller files.

'1M',
'10K',
'--max-depth',
'10',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limit search depth to 10

return messagePairs.reverse().flat()
}

private async userQueryToExpandedKeywords(query: string): Promise<Map<string, Term>> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have modified keyword search to use an LLM to generate a keyword query, rather than stemming/lemmatizing every word in the user query. This has the following benefits:

  • The LLM can include synonyms if it deems appropriate
  • The number of keywords is restricted to 3-5. Given how the keyword search is implemented (a regex OR query with all keywords), this greatly reduces the cost of the search for long user queries.

@beyang beyang marked this pull request as ready for review June 10, 2023 05:22
@sourcegraph-bot
Copy link
Contributor

sourcegraph-bot commented Jun 10, 2023

📖 Storybook live preview

Copy link
Member

@camdencheek camdencheek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question: how does perf look? We're adding a few LLM roundtrips here (generate keywords, generate filename fragments, rerank). These should all be using the fast model, but they're also on the critical path.

Comment on lines +82 to +84
if completionsConfig.FastChatModel == "" {
completionsConfig.FastChatModel = completionsConfig.ChatModel
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we actually default to the slow chat model? This adds a few roundtrips to the LLM, and with most slow models, that is >2 seconds each. If a fast model is not configured, I expect the UX to be too slow to be usable and we should fall back to the current, single-LLM-call pattern.

Copy link
Member Author

@beyang beyang Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance impact is mixed. Here are the factors at play:

  • Slower: 2 additional "fast" LLM round trips (one for generating the keyword query, the other for reranking)
  • Faster: For longer user queries, the keyword query generated by the LLM is smaller and therefore faster to execute given our ripgrep-based implementation.

Data points

"What does this project do?"

Time-to-first-character:

  • Previous keyword search: 6s
  • With claude: 12s
  • With claude-instant: 9s

Response quality:

BeforeAfter

"Which files implement saml auth?"

Time-to-first-character:

  • Previous keyword search: 21s
  • With claude: 14s
  • With claude-instant: 12s

Response quality:

BeforeAfter

"which files should I edit to create a new cody recipe?"

Time-to-first-character:

  • Previous keyword search: 19s
  • With claude: 17s
  • With claude-instant: 15s

Response quality:

BeforeAfter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize ripgrep was this slow, ~10-15 sec still feels really long to be waiting for context!

As a data point, the search "Which files implement saml auth?" takes ~500 ms using S2 keyword search. It feels like App should eventually power keyword search (using Zoekt or some simple inverted index?) so we can really speed this up.

Comment on lines +43 to +45
// When Fast is true, then it is used as a hint to prefer a model
// that is faster (but probably "dumber").
Fast bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this usually determined by the model name (which is included in CompletionRequestParameters)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our API actually throws away the client-specified model name currently:

I think I agree with this decision currently. The client should be agnostic of the underlying LLM. The server should provide the endpoints for the different LLMs that are used, which currently are:

  • Chat endpoint
  • Code completion endpoint
  • (added in this PR) Fast chat endpoint

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice. I agree that seems ideal. In that case, I'd think we should remove the model name from the params entirely then. Will log it as a followup.

@beyang
Copy link
Member Author

beyang commented Jun 12, 2023

Responded inline on the perf questions: #52815 (comment)

@beyang beyang merged commit dd528f3 into main Jun 12, 2023
25 of 26 checks passed
@beyang beyang deleted the bl/cody-expanded-keyword branch June 12, 2023 18:56
@jtibshirani
Copy link
Member

I really like the LLM-based reranker approach, it's something I've been wanting to test more generally to improve result quality.

Using an LLM for "query rewriting" also makes sense to me, but I'm really curious what it ends up doing in practice. Does it just pick out 3-5 key words? Does it add synonyms too? I wonder how much the output differs from our approach here where we just use an aggressive stopwords list: #52233.

What are our plans to test this? It'd be great to figure out what parts are most helpful, and should be integrated into Sourcegraph keyword search (or even embeddings, if we find reranking is super useful!)

ErikaRS pushed a commit that referenced this pull request Jun 22, 2023
* Adds `fast` parameter to completions endpoint and `fastChatModel`
param to site config. This is intended for faster chat models that are
useful for simple generations.
* Use the fast chat model to generate a local keyword search. This
replaces the old keyword search mechanism, which stemmed/lemmatized
every word in the user query.
* Use the fast chat model to generate a small set of file fragments to
search for. This is mainly useful for surfacing READMEs for questions
like "What does this project do?"
* Update the set of "files read" presented in the UI to include only
those files actually read into the context window. Previously, we were
showing all files returned by the context fetcher, but in reality, only
a subset of these would fit into the context window.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants