Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search Provider extension API #47058

Closed
roblourens opened this issue Apr 1, 2018 · 20 comments
Closed

Search Provider extension API #47058

roblourens opened this issue Apr 1, 2018 · 20 comments
Assignees
Labels
search Search widget and operation issues under-discussion Issue is under discussion for relevance, priority, approach

Comments

@roblourens
Copy link
Member

roblourens commented Apr 1, 2018

I looked into what it will take to run search in the extension host using extension APIs.

  • Ripgrep search
    • Just need to port ripgrepTextSearch.ts and ripgrepFileSearch.ts
    • Currently it downloads the correct ripgrep binary at build time. A builtin extension in the core repo can still do that. A marketplace extension would have to ship all binaries and pick one at runtime.
    • Dependencies
      • Has lots of trivial dependencies from vs/base/common. It's ok to copy them, move to native node versions, or native ES versions
      • glob.ts, but we should get rid of that dependency. It's only used for sibling clauses and those should be handled on the frontend
    • API
      • Doesn't expose all features that builtin search needs
        • Whether to use gitignore file
        • Follow symlinks
        • Max filesize
        • Encoding (can be read from files.encoding, do we expect all
          extension authors to think of that?)
        • Result limit count (handled by frontend using CancellationToken?)
        • Searching outside of the workspace
        • Smart case (front end)
        • wordSeparators (front end)
        • Cache search (for file search)
        • Sort by score (for file search)
        • Multiroot search (change it to search per single folder)
      • Need to read RelativePattern, instead of the current IFolderQuery
    • Logging
      • Use extensionhost logger and output pane
    • Frontend changes - SearchService just need to talk to search provider API
  • Non-rg search is harder
    • Need glob.ts, and all of its dependencies
    • Mime
    • Iconv and the stuff we have built around it
    • IPC stuff from core for spawning worker processes
    • All of the dependencies would have to be moved to a separate node module so they can be shared with search. We probably don't want to do that.
@roblourens roblourens self-assigned this Apr 1, 2018
@roblourens roblourens added search Search widget and operation issues under-discussion Issue is under discussion for relevance, priority, approach labels Apr 1, 2018
@roblourens roblourens added this to the April 2018 milestone Apr 1, 2018
@roblourens
Copy link
Member Author

roblourens commented Apr 17, 2018

  • Add extension activation event for search
  • SearchModel caches results per file uri so only one is shown per file. Make it merge IFileMatches
  • For running in dev, download all ripgrep builds and pick the correct one for the platform. Publish a second vscode-ripgrep package to do this?
    • We can also just run 'yarn' from the host machine every time we set up
  • Invoke the search provider once per folder in a multiroot workspace
  • Handle some features on the frontend
    • Sibling clauses in exclude patterns
    • Result count limit
    • Smart case
    • wordSeparators (rg search didn't respect these before...)
  • I think that we should address the search API now and change it to handle all the cases that it needs to for builtin search. What I'm working with currently:
export interface TextSearchQuery {
	pattern: string;
	isRegExp?: boolean;
	isCaseSensitive?: boolean;
	isWordMatch?: boolean;
}

export interface TextSearchOptions {
	folder: Uri;
	includes: GlobPattern[];
	excludes: GlobPattern[];
	maxFileSize?: number;
	disregardIgnoreFiles?: boolean;
	ignoreSymlinks?: boolean;
	encoding?: string;
}

export interface TextSearchResult {
	uri: Uri;
	range: Range;
	preview: { leading: string, matching: string, trailing: string };
}
  • Make file search work
  • Figure out how cacheKey and sortByScore fit into the extension API

@jrieken
Copy link
Member

jrieken commented Apr 18, 2018

In the TextSearchOptions how does folder and includes work together? Is the former an implicit prefix for all include-patterns?

@roblourens
Copy link
Member Author

Yes that's what I'm imagining, and I think includes/excludes maybe should actually just be strings of globs relative to folder, rather than GlobPatterns, which can be relative to some other absolute path, since I'm not sure how to interpret that along with folder.

Was the original intention to have the extension read workspaceFolders, or to include the folder in includes? I think it's easier for implementors to separate "folder to search in" and "glob patterns to apply" and to have all the relevant info on the query object.

This also doesn't handle searching in extra open files. I'm not sure how to express that.

folder?: Uri;
files?: Uri[];

where just one of those is set.

Or maybe they don't need to be differentiated.

searchTargets?: Uri|Uri[];

@jrieken jrieken modified the milestones: April 2018, May 2018 Apr 23, 2018
@roblourens
Copy link
Member Author

roblourens commented May 3, 2018

export interface SearchProvider {
	provideFileSearchResults?(options: SearchOptions, progress: Progress<Uri>, token: CancellationToken): Thenable<void>;
	provideTextSearchResults?(query: TextSearchQuery, options: TextSearchOptions, progress: Progress<TextSearchResult>, token: CancellationToken): Thenable<void>;
}

export interface TextSearchQuery {
	pattern: string;
	isRegExp?: boolean;
	isCaseSensitive?: boolean;
	isWordMatch?: boolean;
}

export interface SearchOptions {
	folder: Uri;
	includes: string[]; // paths relative to folder
	excludes: string[];
	useIgnoreFiles?: boolean;
	followSymlinks?: boolean;
    previewOptions: TBD; // total length? # of context lines? leading and trailing # of chars?
}

export interface TextSearchOptions extends SearchOptions {
	maxFileSize?: number;
	encoding?: string;
}

export interface TextSearchResult {
	uri: Uri;
	range: Range;

    // For now, preview must be a single line of text
	preview: { text: string, match: Range };
}
  • How to express searching files vs folders?
  • Long lines
    • This API lets us handle long lines with one result more efficiently, but if there are many matches on a line, we will send lots of duplicate line texts
    • Could allow multiple matches on a line in one TextSearchResult
  • File search - extHost will handle fuzzy matching with the search string, sorting, etc. The extension just returns all files in the folder.
  • Need extension activation event for search

@roblourens
Copy link
Member Author

Need to decide whether the search provider API options include maxResults.

When maxResults is hit, the expensive search should terminate as quickly as possible, so it makes sense for the provider to know about it. Especially important in the workspaceContains case where maxResults = 1.

But the search service needs to know whether maxResults would have been exceeded, so it can show a warning in the text search viewlet. (What else do we use limitHit for?)

We could pass maxResult+1 to the provider so we know whether the provider has exactly 10,000 results, or would have produced more. Not good for the maxResults=1 case because some providers may have an optimization for just finding whether a pattern exists.

So, we could pass maxResults=1 in that case and maxResults+1 in all other cases. This is assuming limitHit doesn't really matter when maxResults=1.

@bolinfest
Copy link
Contributor

As someone in the FileSearchProvider camp, I would also be concerned about performance for those who implement the FileIndexProvider API. Alternatively, if there were a good client library that made it possible to implement the FileSearchProvider API relatively easily (that handled the file watching, ranking, etc.), then perhaps everyone could use the FileSearchProvider API (avoiding the dual APIs).

@bolinfest
Copy link
Contributor

From e69e4d3, it appears that it was a recent+deliberate change to no longer make provideFileSearchResults() a streaming API. The asymmetry with provideTextSearchResults() seems a little strange.

@roblourens
Copy link
Member Author

That's correct, but this is because provideFileSearchResults is expected to return results in order sorted by relevance, and are only displayed when the full result set has loaded, but text results are always sorted by file path and displayed incrementally. I think that this makes the intent of how these are supposed to use more clear, even though the asymmetry bugs me too.

@roblourens
Copy link
Member Author

re: FileIndexProvider - Did some basic perf testing and realized that URI-ifying every file in the workspace is... very expensive. I've always thought of this as a string-based API.

Can we change it to return strings of relative paths? I think it's justifiable even if we usually use URIs to refer to files. We are in a clear context of asking for files in a particular folder.

@jrieken
Copy link
Member

jrieken commented Aug 22, 2018

Can we change it to return strings of relative paths?

No - the API uses Uri

@roblourens
Copy link
Member Author

roblourens commented Aug 22, 2018

Then FileIndexProvider is ☠️ 😢

@sebastian-lenz
Copy link

Playing around with the insider build, I just noted that the results returned by a FileSearchProvider are filtered by VS Code against the include and exclude patterns. Also open text editors seem to emit search results on their own bypassing the implementation in a custom search provider. I think both behaviors are quite limiting when implementing a custom search.

In my current project the language internally organizes files into virtual folders, it does this as the language allows files to appear at multiple locations within a virtual tree. I was hoping to implement a custom search that allows filtering by include / exclude, but as VS Code always filters the results based on the uri which does not reflect this structure, I always end up with empty result sets.

@roblourens
Copy link
Member Author

This issue is old and a bit stale, I am closing it and opening more specific issues.

@roblourens roblourens removed this from the October 2018 milestone Oct 3, 2018
@vscodebot vscodebot bot locked and limited conversation to collaborators Nov 17, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
search Search widget and operation issues under-discussion Issue is under discussion for relevance, priority, approach
Projects
None yet
Development

No branches or pull requests

5 participants