LLM-enhanced keyword context #52815

beyang · 2023-06-02T04:53:11Z

This PR is best reviewed commit by commit. Explanatory comments have been added to the diff.

Summary of changes:

Adds fast parameter to completions endpoint and fastChatModel param to site config. This is intended for faster chat models that are useful for simple generations.
Use the fast chat model to generate a local keyword search. This replaces the old keyword search mechanism, which stemmed/lemmatized every word in the user query.
Use the fast chat model to generate a small set of file fragments to search for. This is mainly useful for surfacing READMEs for questions like "What does this project do?"
Update the set of "files read" presented in the UI to include only those files actually read into the context window. Previously, we were showing all files returned by the context fetcher, but in reality, only a subset of these would fit into the context window.

Pre-merge TODO

Verify temperature is 0 on all chat requests that should be deterministic

Post-merge TODO

update config for sourcegraph.com to set fastChatModel": "claude-instant-v1"

Test plan

Updated unit and integration tests. Tested locally on repositories of several sizes.

beyang · 2023-06-10T04:55:55Z

client/cody-shared/src/chat/chat.ts

@@ -13,10 +15,17 @@ const DEFAULT_CHAT_COMPLETION_PARAMETERS: Omit<CompletionParameters, 'messages'>
 export class ChatClient {
    constructor(private completions: SourcegraphCompletionsClient) {}

-    public chat(messages: Message[], cb: CompletionCallbacks): () => void {
+    public chat(messages: Message[], cb: CompletionCallbacks, params?: Partial<ChatParameters>): () => void {


Pass through the underlying completions parameters, so we can set things like temperature.

beyang · 2023-06-10T04:58:16Z

client/cody-shared/src/chat/transcript/index.ts

@@ -20,18 +21,19 @@ export interface TranscriptJSON {
 }

 /**
- * A transcript of a conversation between a human and an assistant.
+ * The "model" class that tracks the call and response of the Cody chat box.
+ * Any "controller" logic belongs outside of this class.


We simplify the Transcript class to be a pure model class and avoid having seemingly non-mutative methods like toPrompt (below) trigger surprising mutations of internal state.

beyang · 2023-06-10T05:00:32Z

client/cody-shared/src/chat/transcript/interaction.ts

@@ -1,97 +1,68 @@
 import { ContextMessage, ContextFile } from '../../codebase-context/messages'
-import { PromptMixin } from '../../prompt/prompt-mixin'
-import { Message } from '../../sourcegraph-api'

 import { ChatMessage, InteractionMessage } from './messages'



Likewise, we also make the Interaction class a simple model class, without the need to hackily trigger the computation of cachedContextFiles on creation.

beyang · 2023-06-10T05:03:32Z

client/cody-shared/src/chat/transcript/interaction.ts


 import { ChatMessage, InteractionMessage } from './messages'

 export interface InteractionJSON {
    humanMessage: InteractionMessage
    assistantMessage: InteractionMessage
-    context: ContextMessage[]
+    fullContext: ContextMessage[]
+    usedContextFiles: ContextFile[]


Instead of having context and cachedContextFiles, where it's confusing when to use either field, we have the following fields:

fullContext: the complete set of context messages we'd read if we had an infinite context window. This is set when the interaction is first created, before the prompt is computed.

usedContextFiles is the set of context files we actually read into the actual finite context window. This is set after we've computed the prompt (and therefore determined how many context files fit into the prompt context window).

beyang · 2023-06-10T05:04:49Z

client/cody-shared/src/codebase-context/index.ts

@@ -21,7 +21,9 @@ export class CodebaseContext {
        private codebase: string | undefined,
        private embeddings: EmbeddingsSearch | null,
        private keywords: KeywordContextFetcher | null,
-        private unifiedContextFetcher?: UnifiedContextFetcher | null
+        private filenames: FilenameContextFetcher | null,


Introducing a new type of local context fetcher that looks at the filename.

beyang · 2023-06-10T05:05:06Z

client/cody-shared/src/codebase-context/index.ts

-        private unifiedContextFetcher?: UnifiedContextFetcher | null
+        private filenames: FilenameContextFetcher | null,
+        private unifiedContextFetcher?: UnifiedContextFetcher | null,
+        private rerank?: (query: string, results: ContextResult[]) => Promise<ContextResult[]>


Introduce a reranking mechanism, to rerank results from different context providers.

beyang · 2023-06-10T05:12:32Z

client/cody/src/local-context/local-keyword-context-fetcher.ts

    '--max-filesize',
-    '1M',
+    '10K',


Limit to smaller files.

beyang · 2023-06-10T05:12:40Z

client/cody/src/local-context/local-keyword-context-fetcher.ts

-    '1M',
+    '10K',
+    '--max-depth',
+    '10',


Limit search depth to 10

beyang · 2023-06-10T05:16:52Z

client/cody/src/local-context/local-keyword-context-fetcher.ts

        return messagePairs.reverse().flat()
    }

+    private async userQueryToExpandedKeywords(query: string): Promise<Map<string, Term>> {


Have modified keyword search to use an LLM to generate a keyword query, rather than stemming/lemmatizing every word in the user query. This has the following benefits:

The LLM can include synonyms if it deems appropriate

The number of keywords is restricted to 3-5. Given how the keyword search is implemented (a regex OR query with all keywords), this greatly reduces the cost of the search for long user queries.

sourcegraph-bot · 2023-06-10T05:34:45Z

📖 Storybook live preview

camdencheek

General question: how does perf look? We're adding a few LLM roundtrips here (generate keywords, generate filename fragments, rerank). These should all be using the fast model, but they're also on the critical path.

camdencheek · 2023-06-12T15:58:26Z

enterprise/internal/completions/client/client.go

+		if completionsConfig.FastChatModel == "" {
+			completionsConfig.FastChatModel = completionsConfig.ChatModel
+		}


Should we actually default to the slow chat model? This adds a few roundtrips to the LLM, and with most slow models, that is >2 seconds each. If a fast model is not configured, I expect the UX to be too slow to be usable and we should fall back to the current, single-LLM-call pattern.

The performance impact is mixed. Here are the factors at play:

Slower: 2 additional "fast" LLM round trips (one for generating the keyword query, the other for reranking)

Faster: For longer user queries, the keyword query generated by the LLM is smaller and therefore faster to execute given our ripgrep-based implementation.

Data points

"What does this project do?"

Time-to-first-character:

Previous keyword search: 6s

With claude: 12s

With claude-instant: 9s

Response quality:

Before After

"Which files implement saml auth?"

Time-to-first-character:

Previous keyword search: 21s

With claude: 14s

With claude-instant: 12s

Response quality:

Before After

"which files should I edit to create a new cody recipe?"

Time-to-first-character:

Previous keyword search: 19s

With claude: 17s

With claude-instant: 15s

Response quality:

Before After

I didn't realize ripgrep was this slow, ~10-15 sec still feels really long to be waiting for context!

As a data point, the search "Which files implement saml auth?" takes ~500 ms using S2 keyword search. It feels like App should eventually power keyword search (using Zoekt or some simple inverted index?) so we can really speed this up.

camdencheek · 2023-06-12T16:00:15Z

enterprise/internal/completions/types/types.go

+	// When Fast is true, then it is used as a hint to prefer a model
+	// that is faster (but probably "dumber").
+	Fast bool


Isn't this usually determined by the model name (which is included in CompletionRequestParameters)?

Our API actually throws away the client-specified model name currently:

streaming api

non-streaming api

I think I agree with this decision currently. The client should be agnostic of the underlying LLM. The server should provide the endpoints for the different LLMs that are used, which currently are:

Chat endpoint

Code completion endpoint

(added in this PR) Fast chat endpoint

Ah nice. I agree that seems ideal. In that case, I'd think we should remove the model name from the params entirely then. Will log it as a followup.

client/cody-shared/src/codebase-context/index.ts

client/cody-shared/src/codebase-context/rerank.ts

beyang · 2023-06-12T18:37:04Z

Responded inline on the perf questions: #52815 (comment)

jtibshirani · 2023-06-13T01:20:03Z

I really like the LLM-based reranker approach, it's something I've been wanting to test more generally to improve result quality.

Using an LLM for "query rewriting" also makes sense to me, but I'm really curious what it ends up doing in practice. Does it just pick out 3-5 key words? Does it add synonyms too? I wonder how much the output differs from our approach here where we just use an aggressive stopwords list: #52233.

What are our plans to test this? It'd be great to figure out what parts are most helpful, and should be integrated into Sourcegraph keyword search (or even embeddings, if we find reranking is super useful!)

* Adds `fast` parameter to completions endpoint and `fastChatModel` param to site config. This is intended for faster chat models that are useful for simple generations. * Use the fast chat model to generate a local keyword search. This replaces the old keyword search mechanism, which stemmed/lemmatized every word in the user query. * Use the fast chat model to generate a small set of file fragments to search for. This is mainly useful for surfacing READMEs for questions like "What does this project do?" * Update the set of "files read" presented in the UI to include only those files actually read into the context window. Previously, we were showing all files returned by the context fetcher, but in reality, only a subset of these would fit into the context window.

cla-bot bot added the cla-signed label Jun 2, 2023

beyang changed the title ~~Bl/cody expanded keyword~~ LLM-enhanced keyword context Jun 2, 2023

beyang force-pushed the bl/cody-expanded-keyword branch 7 times, most recently from 5c05cfc to 858e798 Compare June 6, 2023 00:17

beyang requested a review from a team June 6, 2023 15:16

beyang force-pushed the bl/cody-expanded-keyword branch 7 times, most recently from 404aafa to 8b096d9 Compare June 9, 2023 23:43

beyang added 3 commits June 9, 2023 21:47

completions: add fast chat option

8959f62

cody-shared: pnpm install xml2js

b76fc86

cody: update local context fetching

3d69d7f

beyang force-pushed the bl/cody-expanded-keyword branch from 5f2ce76 to 3d69d7f Compare June 10, 2023 04:52

beyang commented Jun 10, 2023

View reviewed changes

client/cody/src/local-context/local-keyword-context-fetcher.ts

'--max-filesize',

'1M',

'10K',

Copy link

Member Author

beyang Jun 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limit to smaller files.

beyang commented Jun 10, 2023

View reviewed changes

client/cody/src/local-context/local-keyword-context-fetcher.ts

'1M',

'10K',

'--max-depth',

'10',

Copy link

Member Author

beyang Jun 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limit search depth to 10

beyang commented Jun 10, 2023

View reviewed changes

add comments

d80de1f

beyang marked this pull request as ready for review June 10, 2023 05:22

beyang requested review from eseliger and philipp-spiess June 10, 2023 05:25

beyang added 7 commits June 10, 2023 11:28

fix

e14f183

format

b16a27b

lint

0b12296

fix test

16d1eef

fix test

e1f8075

fix test

764829d

fix test

1724f50

beyang force-pushed the bl/cody-expanded-keyword branch from db1d9ce to 1724f50 Compare June 12, 2023 01:15

fix lint

15952a8

camdencheek approved these changes Jun 12, 2023

View reviewed changes

beyang merged commit dd528f3 into main Jun 12, 2023
25 of 26 checks passed

beyang deleted the bl/cody-expanded-keyword branch June 12, 2023 18:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM-enhanced keyword context #52815

LLM-enhanced keyword context #52815

beyang commented Jun 2, 2023 •

edited

Loading

beyang Jun 10, 2023

beyang Jun 10, 2023

beyang Jun 10, 2023

beyang Jun 10, 2023

beyang Jun 10, 2023

beyang Jun 10, 2023

beyang Jun 10, 2023

beyang Jun 10, 2023

beyang Jun 10, 2023

sourcegraph-bot commented Jun 10, 2023 •

edited

Loading

camdencheek left a comment

camdencheek Jun 12, 2023

beyang Jun 12, 2023 •

edited

Loading

jtibshirani Jun 13, 2023

camdencheek Jun 12, 2023

beyang Jun 12, 2023

camdencheek Jun 12, 2023 •

edited

Loading

beyang commented Jun 12, 2023

jtibshirani commented Jun 13, 2023

LLM-enhanced keyword context #52815

LLM-enhanced keyword context #52815

Conversation

beyang commented Jun 2, 2023 • edited Loading

Pre-merge TODO

Post-merge TODO

Test plan

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sourcegraph-bot commented Jun 10, 2023 • edited Loading

camdencheek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beyang Jun 12, 2023 • edited Loading

Choose a reason for hiding this comment

Data points

"What does this project do?"

"Which files implement saml auth?"

"which files should I edit to create a new cody recipe?"

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

camdencheek Jun 12, 2023 • edited Loading

Choose a reason for hiding this comment

beyang commented Jun 12, 2023

jtibshirani commented Jun 13, 2023

beyang commented Jun 2, 2023 •

edited

Loading

sourcegraph-bot commented Jun 10, 2023 •

edited

Loading

beyang Jun 12, 2023 •

edited

Loading

camdencheek Jun 12, 2023 •

edited

Loading