Add basic github lexical search by mjbvz · Pull Request #312228 · microsoft/vscode

mjbvz · 2026-04-23T21:24:18Z

This is using the old search endpoint which is limited. We'll start switching over the new one once it's ready. We should be able to support this for external ingest workspaces too

Fixes microsoft#312210 This is using the old search endpoint. We'll start switching over the new one once it's ready Co-authored-by: Copilot <copilot@github.com>

Copilot

Pull request overview

Adds a first-party GitHub lexical (keyword) code search path to the Copilot extension, alongside the existing semantic repo search, to enable out-of-the-box text search over GitHub repos/orgs.

Changes:

Introduces lexicalSearch to IGithubCodeSearchService and implements it via GitHub’s REST search/code endpoint.
Splits remote search results into SemanticCodeSearchResult vs LexicalCodeSearchResult, and threads the semantic type through existing semantic callers.
Adds a new github_text_search tool (and updates/renames the semantic repo tool wiring + localization).

Show a summary per file

File	Description
extensions/copilot/test/base/simuliationWorkspaceChunkSearch.ts	Updates simulation code search service to new semantic/lexical API surface.
extensions/copilot/src/platform/workspaceChunkSearch/node/codeSearch/codeSearchRepo.ts	Narrows repo search result type to semantic results.
extensions/copilot/src/platform/workspaceChunkSearch/node/codeSearch/codeSearchChunkSearch.ts	Propagates semantic result typing through chunk search flow.
extensions/copilot/src/platform/remoteCodeSearch/common/remoteCodeSearch.ts	Splits result types into semantic vs lexical and adds optional repo/ref metadata.
extensions/copilot/src/platform/remoteCodeSearch/common/githubCodeSearchService.ts	Adds `lexicalSearch` implementation using GitHub REST search; extends semantic parsing with remote URL + ref extraction.
extensions/copilot/src/platform/remoteCodeSearch/common/adoCodeSearchService.ts	Updates ADO code search typing to semantic result type.
extensions/copilot/src/platform/github/common/githubAPI.ts	Extends REST request helper to support custom `Accept` and extra headers.
extensions/copilot/src/extension/tools/node/githubTextSearchTool.tsx	New tool that exposes GitHub lexical search to prompts and renders results.
extensions/copilot/src/extension/tools/node/githubRepoSemanticSearchTool.tsx	Renames/updates semantic repo tool and uses returned remote URL/ref to build links.
extensions/copilot/src/extension/tools/node/allTools.ts	Registers new tool modules for runtime discovery.
extensions/copilot/src/extension/tools/common/toolNames.ts	Adds tool names for GitHub text search and renames semantic repo tool enum entries.
extensions/copilot/package.nls.json	Updates localized tool names/descriptions; adds text-search strings.
extensions/copilot/package.json	Contributes the new `copilot_githubTextSearch` tool and exposes it in the web tool set.

Copilot's findings

Comments suppressed due to low confidence (1)

extensions/copilot/src/extension/tools/node/githubRepoSemanticSearchTool.tsx:72

URI.joinPath(…, entry.chunk.file.path) is unsafe here because entry.chunk.file.path is an absolute path (starts with /). URI.joinPath will treat that as an absolute segment and discard the preceding tree/{ref} segments, producing an incorrect GitHub URL. Strip the leading slash (or join using path segments) before calling URI.joinPath.

Files reviewed: 13/13 changed files
Comments generated: 8

Copilot · 2026-04-23T21:31:56Z

+		const renderedChunks = this.props.chunks
+			.filter(x => x.chunk.text)
+			.map(chunk => {
+				const fileKey = chunk.chunk.file.toString();
+				if (!seenFiles.has(fileKey)) {
+					seenFiles.add(fileKey);
+					references.push(new PromptReference(chunk.chunk.file));
+				}
+
+				const githubInfo = parseGithubFileUrl(chunk.chunk.file);
+				const displayPath = githubInfo?.path ?? chunk.chunk.file.toString();
+				const nwoLabel = githubInfo?.nwo;
+
+				const lineInfo = ` starting at line ${chunk.chunk.range.startLineNumber}`;
+
+				const headerText = nwoLabel
+					? `Text match excerpt from \`${nwoLabel}\` in \`${displayPath}\`${lineInfo}:`
+					: `Text match excerpt in \`${displayPath}\`${lineInfo}:`;
+
+				return <TextChunk>
+					{headerText}<br />
+					{createFencedCodeBlock(getLanguageId(chunk.chunk.file), chunk.chunk.text)}<br /><br />
+				</TextChunk>;
+			});
+


references (and therefore toolResultMessage) is derived only from chunks with non-empty chunk.text because of .filter(x => x.chunk.text). But parseLexicalSearchResponse can produce isFullFile: true entries with text: '' when the API omits text_matches, which would make the tool report “no results” even though files matched. Consider rendering (or at least referencing) those file-only matches so the tool result accurately reflects hits.

Suggested change

const renderedChunks = this.props.chunks

.filter(x => x.chunk.text)

.map(chunk => {

const fileKey = chunk.chunk.file.toString();

if (!seenFiles.has(fileKey)) {

seenFiles.add(fileKey);

references.push(new PromptReference(chunk.chunk.file));

}

const githubInfo = parseGithubFileUrl(chunk.chunk.file);

const displayPath = githubInfo?.path ?? chunk.chunk.file.toString();

const nwoLabel = githubInfo?.nwo;

const lineInfo = ` starting at line ${chunk.chunk.range.startLineNumber}`;

const headerText = nwoLabel

? `Text match excerpt from \`${nwoLabel}\` in \`${displayPath}\`${lineInfo}:`

: `Text match excerpt in \`${displayPath}\`${lineInfo}:`;

return <TextChunk>

{headerText} 

{createFencedCodeBlock(getLanguageId(chunk.chunk.file), chunk.chunk.text)} 

</TextChunk>;

});

const renderedChunks = this.props.chunks.map(chunk => {

const fileKey = chunk.chunk.file.toString();

if (!seenFiles.has(fileKey)) {

seenFiles.add(fileKey);

references.push(new PromptReference(chunk.chunk.file));

}

const githubInfo = parseGithubFileUrl(chunk.chunk.file);

const displayPath = githubInfo?.path ?? chunk.chunk.file.toString();

const nwoLabel = githubInfo?.nwo;

if (!chunk.chunk.text) {

if (!chunk.chunk.isFullFile) {

return undefined;

}

const headerText = nwoLabel

? l10n.t("Match found in `{0}` in `{1}`. GitHub did not provide a text excerpt.", nwoLabel, displayPath)

: l10n.t("Match found in `{0}`. GitHub did not provide a text excerpt.", displayPath);

return <TextChunk>

{headerText} 

</TextChunk>;

}

const lineInfo = ` starting at line ${chunk.chunk.range.startLineNumber}`;

const headerText = nwoLabel

? `Text match excerpt from \`${nwoLabel}\` in \`${displayPath}\`${lineInfo}:`

: `Text match excerpt in \`${displayPath}\`${lineInfo}:`;

return <TextChunk>

{headerText} 

{createFencedCodeBlock(getLanguageId(chunk.chunk.file), chunk.chunk.text)} 

</TextChunk>;

});

Copilot · 2026-04-23T21:31:57Z

+	"github.copilot.tools.githubRepo.name": "Semantic Search GitHub Repository",
+	"github.copilot.tools.githubRepo.userDescription": "Semantic Search a GitHub repository for relevant source code snippets. You can specify a repository using `owner/repo`",


User-facing text: “Semantic Search a GitHub repository …” is grammatically incorrect. Consider changing it to “Semantically search a GitHub repository …” (same applies to the userDescription).

Suggested change

"github.copilot.tools.githubRepo.name": "Semantic Search GitHub Repository",

"github.copilot.tools.githubRepo.userDescription": "Semantic Search a GitHub repository for relevant source code snippets. You can specify a repository using `owner/repo`",

"github.copilot.tools.githubRepo.name": "Semantically Search GitHub Repository",

"github.copilot.tools.githubRepo.userDescription": "Semantically search a GitHub repository for relevant source code snippets. You can specify a repository using `owner/repo`",

Copilot · 2026-04-23T21:31:57Z

 import './getNotebookCellOutputTool';
 import './getSearchViewResultsTool';
-import './githubRepoTool';
+import './githubRepoSemanticSearchTool.tsx';


import './githubRepoSemanticSearchTool.tsx' will fail to compile under the extension's tsconfig because allowImportingTsExtensions is not enabled (TypeScript disallows importing .ts/.tsx extensions by default). Drop the explicit .tsx extension and import the module the same way as the other tool imports in this file.

Suggested change

import './githubRepoSemanticSearchTool.tsx';

import './githubRepoSemanticSearchTool';

Copilot · 2026-04-23T21:31:58Z

+				if (parts.length >= 3) {
+					const nwo = `${parts[0]}/${parts[1]}`;
+					const relativePath = parts.slice(2).join('/');
+					file = URI.joinPath(URI.parse('https://github.com'), nwo, 'tree', 'main', '/' + relativePath).with({


URI.joinPath uses path.posix.join, so passing a segment that starts with / resets the path. Using '/' + relativePath here will drop the owner/repo/tree/main segments and produce an incorrect GitHub URL. Pass relativePath without a leading slash (or split it into path segments) when calling URI.joinPath.

Suggested change

file = URI.joinPath(URI.parse('https://github.com'), nwo, 'tree', 'main', '/' + relativePath).with({

file = URI.joinPath(URI.parse('https://github.com'), nwo, 'tree', 'main', relativePath).with({

Copilot · 2026-04-23T21:31:58Z

+					const relativePath = parts.slice(2).join('/');
+					file = URI.joinPath(URI.parse('https://github.com'), nwo, 'tree', 'main', '/' + relativePath).with({
+						fragment: entry.range.startLineNumber > 0
+							? `L${entry.range.startLineNumber}-L${entry.range.endLineNumber}`
+							: undefined,
+					});


This tool hard-codes the branch name to main when building GitHub URLs. That will generate broken links for repos whose default branch is not main (common for master, release branches, or protected default branches). Consider using the html_url returned by GitHub's search API (or fetching the repo default branch once) instead of hardcoding main.

Copilot · 2026-04-23T21:31:58Z

+
+				const lineInfo = ` starting at line ${chunk.chunk.range.startLineNumber}`;
+
+				const headerText = nwoLabel
+					? `Text match excerpt from \`${nwoLabel}\` in \`${displayPath}\`${lineInfo}:`
+					: `Text match excerpt in \`${displayPath}\`${lineInfo}:`;
+
+				return <TextChunk>
+					{headerText}<br />
+					{createFencedCodeBlock(getLanguageId(chunk.chunk.file), chunk.chunk.text)}<br /><br />


lineInfo is always rendered as starting at line ${chunk.chunk.range.startLineNumber}. For lexical search results the range is currently new Range(0, 0, 0, 0), which will display “starting at line 0” (and also prevents fragments from being added). Consider omitting the line info (and fragment) when the range is unknown/0 so the output is not misleading.

Suggested change

const lineInfo = ` starting at line ${chunk.chunk.range.startLineNumber}`;

const headerText = nwoLabel

? `Text match excerpt from \`${nwoLabel}\` in \`${displayPath}\`${lineInfo}:`

: `Text match excerpt in \`${displayPath}\`${lineInfo}:`;

return <TextChunk>

{headerText} 

{createFencedCodeBlock(getLanguageId(chunk.chunk.file), chunk.chunk.text)} 

const hasKnownRange = chunk.chunk.range.startLineNumber > 0;

const lineInfo = hasKnownRange ? ` starting at line ${chunk.chunk.range.startLineNumber}` : '';

const headerText = nwoLabel

? hasKnownRange

? `Text match excerpt from \`${nwoLabel}\` in \`${displayPath}\`${lineInfo}:`

: `Text match from \`${nwoLabel}\` in \`${displayPath}\`:`

: hasKnownRange

? `Text match excerpt in \`${displayPath}\`${lineInfo}:`

: `Text match in \`${displayPath}\`:`;

return <TextChunk>

{headerText} 

{hasKnownRange ? <>{createFencedCodeBlock(getLanguageId(chunk.chunk.file), chunk.chunk.text)} </> : undefined}

Copilot · 2026-04-23T21:31:58Z

+
+		const scopeQualifier = scope.kind === 'org' ? `org:${scope.org}` : `repo:${toGithubNwo(scope.githubRepoId)}`;
+		const searchQuery = `${query} ${scopeQualifier}`;
+		const routeSlug = `search/code?q=${encodeURIComponent(searchQuery)}&per_page=${maxResults}`;


GitHub’s REST code search API only supports per_page up to 100. maxResults is passed through directly, so callers can accidentally generate invalid requests by providing values > 100. Consider clamping maxResults to 100 (and/or paging) before building the routeSlug.

Suggested change

const routeSlug = `search/code?q=${encodeURIComponent(searchQuery)}&per_page=${maxResults}`;

const perPage = Math.min(maxResults, 100);

const routeSlug = `search/code?q=${encodeURIComponent(searchQuery)}&per_page=${perPage}`;

Copilot · 2026-04-23T21:31:59Z


-	async searchRepo(authOptions: { silent: boolean }, embeddingType: EmbeddingType, repo: GithubCodeSearchRepoInfo, query: string, maxResults: number, options: WorkspaceChunkSearchOptions, _telemetryInfo: TelemetryCorrelationId, token: CancellationToken): Promise<CodeSearchResult> {
+	async lexicalSearch(_authOptions: { silent: boolean }, _scope: GithubCodeSearchScope, _query: string, _maxResults: number, _options: WorkspaceChunkSearchOptions, _telemetryInfo: TelemetryCorrelationId, _token: CancellationToken): Promise<LexicalCodeSearchResult> {
+		throw new Error('Method not implemented.');


The simulation implementation of IGithubCodeSearchService.lexicalSearch throws Method not implemented. With lexical search now used by a first-party tool, this makes the simulation workspace brittle (any future test/tool usage will crash at runtime). Consider returning an empty LexicalCodeSearchResult (or a simple simulated response) instead of throwing.

Suggested change

throw new Error('Method not implemented.');

this._logService.trace('SimulationGithubCodeSearchService::lexicalSearch returning empty simulated result');

return {} as LexicalCodeSearchResult;

mjbvz and others added 2 commits April 23, 2026 14:16

Add basic github lexical search

cf82f85

Fixes microsoft#312210 This is using the old search endpoint. We'll start switching over the new one once it's ready Co-authored-by: Copilot <copilot@github.com>

Include line number

669a8d3

Copilot AI review requested due to automatic review settings April 23, 2026 21:24

mjbvz enabled auto-merge April 23, 2026 21:24

Copilot started reviewing on behalf of mjbvz April 23, 2026 21:25 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Update snapshots

f67a993

TylerLeonhardt approved these changes Apr 23, 2026

View reviewed changes

mjbvz merged commit fe56e84 into microsoft:main Apr 23, 2026
26 checks passed

vs-code-engineering Bot added this to the 1.118.0 milestone Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add basic github lexical search#312228

Add basic github lexical search#312228
mjbvz merged 3 commits intomicrosoft:mainfrom
mjbvz:dev/mjbvz/boring-python

mjbvz commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		"github.copilot.tools.githubRepo.name": "Semantic Search GitHub Repository",
		"github.copilot.tools.githubRepo.userDescription": "Semantic Search a GitHub repository for relevant source code snippets. You can specify a repository using `owner/repo`",

	import './githubRepoSemanticSearchTool.tsx';
	import './githubRepoSemanticSearchTool';

	file = URI.joinPath(URI.parse('https://github.com'), nwo, 'tree', 'main', '/' + relativePath).with({
	file = URI.joinPath(URI.parse('https://github.com'), nwo, 'tree', 'main', relativePath).with({

	const routeSlug = `search/code?q=${encodeURIComponent(searchQuery)}&per_page=${maxResults}`;
	const perPage = Math.min(maxResults, 100);
	const routeSlug = `search/code?q=${encodeURIComponent(searchQuery)}&per_page=${perPage}`;

	throw new Error('Method not implemented.');
	this._logService.trace('SimulationGithubCodeSearchService::lexicalSearch returning empty simulated result');
	return {} as LexicalCodeSearchResult;

Conversation

mjbvz commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants