Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize TextSearchProvider API #59921

Open
roblourens opened this issue Oct 3, 2018 · 70 comments
Open

Stabilize TextSearchProvider API #59921

roblourens opened this issue Oct 3, 2018 · 70 comments
Assignees
Labels
api-finalization feature-request Request for new features or functionality on-testplan search Search widget and operation issues

Comments

@roblourens
Copy link
Member

roblourens commented Oct 3, 2018

Master issue to track stabilizing the TextSearchProvider extension API...

Forked from #47058

Depends on

@roblourens roblourens added search Search widget and operation issues api-finalization labels Oct 3, 2018
@roblourens roblourens self-assigned this Oct 3, 2018
@roblourens roblourens added the feature-request Request for new features or functionality label Nov 9, 2018
@suntobright
Copy link

looking forward to progress

@mostafaeweda
Copy link

Any progress on moving this to the stable API?

@gjsjohnmurray
Copy link
Contributor

Is there anything preventing this from happening? Both of the issues it's shown as depending on are already closed.

@roblourens
Copy link
Member Author

This still isn't ready to be stabilized, I don't have any ETA, sorry.

@eamodio
Copy link
Contributor

eamodio commented Jun 7, 2019

There needs to be a 😢 reaction

@gjsjohnmurray
Copy link
Contributor

Any progress to report on getting this finalized?

@roblourens roblourens added this to the Backlog milestone Oct 27, 2019
@gjsjohnmurray
Copy link
Contributor

Please put this on the 2020 roadmap, preferably sooner rather than later.

@gjsjohnmurray
Copy link
Contributor

@roblourens 🙏 can this get some love soon? Those of us building extensions that implement FileSystemProvider can only offer search if we get our users to (a) use Insiders, (b) download and install a VSIX, and (c) launch Insiders with the correct --enable-proposed-api argument.

It is a small relief that (c) can now be achieved using argv.json, but what we really need is the API finalized.

@NightRa
Copy link

NightRa commented Aug 21, 2020

I'm interested in the stabilization of this as well.
What's blocking this, and how can we help?

@roblourens
Copy link
Member Author

Sorry, it still needs a lot of thinking and I won't get to it in the near future

@gjsjohnmurray
Copy link
Contributor

Sorry, it still needs a lot of thinking and I won't get to it in the near future

@roblourens this is disheartening news for me, and likely for others trying to leverage FileSystemProvider and take VSCode into new domains. Is there any way we can help? What do you see as the outstanding problems with the proposed API?

@CompuIves
Copy link

Heyo! I've been following this issue for a while, and I was wondering if by now there's a better idea on when this API will be stabilized. Seeing that there are already quite some extensions using it, even including Microsoft extensions in the marketplace.

@gjsjohnmurray
Copy link
Contributor

@roblourens please can we blow the dust off this and get it into Stable? Or else get an understanding of what's holding it up? IMO as more and more FileSystemProvider implementations show up (aka virtual filesystems) the more important it becomes to resolve this.

@andreamah
Copy link
Contributor

Hmm, that's a good argument to keep some type of base Uri around if the original call had it. If you currently call findTextInFiles with your base URI with an alternative authority and query in its include, does it appear correctly at the provider? I can always test this later too.

@isc-bsaviano
Copy link

@andreamah I've never tried that API but I can confirm that when using the Search view in a multi-root workspace folder and adding a "files to include" glob that only matches a single folder, the TextSearchProvider is only called for that folder, and that the glob pattern is normalized to have the workspace folder root as the base. This applies to the duplicate entry with the /** suffix as well.

@andreamah
Copy link
Contributor

This applies to the duplicate entry with the /** suffix as well.

I'm not too sure what you mean by this statement?

@isc-bsaviano
Copy link

Sorry that I wasn't clear. When you have a glob pattern present in the UI, the options.includes array includes two elements for each pattern, one with an one without a globstar suffix:

Screenshot 2024-06-24 at 12 02 19 PM

@andreamah
Copy link
Contributor

After a bit more of API review, a question came up: how important is message?: TextSearchCompleteMessage on TextSearchComplete? To some team members, it seemed like it was more of a UI add-on that we usually don't see being used in built-in text search. We use it in vscode.dev, but we can have another conversation about whether we can just change the behavior to not use the completion message.

@ the people who are using this API, do you use the message field on TextSearchComplete, and what is your use case?

If not many people are using it, we might consider keeping this part in particular as proposed to simplify the overall finalization process.

@isc-bsaviano
Copy link

@andreamah I use it to notify the user that there were errors for a certain number of documents and to check our Output channel for more info. I could easily replace this with a call to vscode.window.showErrorMessage(), or by just showing the Output channel.

@andreamah
Copy link
Contributor

andreamah commented Jul 16, 2024

An update. These are the latest changes to the API via diagram:

Image

Image

Some major-ish changes compared to the last time i shared:

  • Overall, a more flat approach with more duplication. This is preferred, since the options are clearer upon first glance and they are easier to modify if necessary later on.
  • Previously, provideTextSearchResults/provideFileSearchResults would be called once per workspace folder for a particular scheme. Now, there will be only one call with folder-specific fields being part of an object that can be put in an array.
    • There will be one folder URI per array element. This is meant to be what all includes patterns are positioned relative to. However, the excludes can each have different baseURI to act upon. By doing so, we can support multiple 'include' patterns if applicable, but it keeps it clear which workspace folder we are 'running' from.
  • Finders have a custom enum for whether to follow search.exclude and files.exclude settings.
  • FindTextInFiles has an AsyncIterable instead of a callback to get results
  • TextSearchComplete's message customization will be a part of a different proposal that will not be finalized with these APIs.
  • session for FileSearchProvider will be an unknown that we encourage users to put in a WeakMap to track for garbage collection. Its lifespan will match that of the cache.

As usual, the changes are here. I'm making all the changes at once due to the intertwined nature of the changes between the search APIs.

Also, please note that these changes might take some time to clean up. I am finalizing the shape of the API, but I also need to make appropriate internal changes and do testing. So this might take a while to actually finalize.

This was referenced Jul 16, 2024
@isc-bsaviano
Copy link

Thanks for the update @andreamah! I have a few questions:

  1. When TextSearchProvider called in a multi-root workspace and the "files to include" string only matches a single workspace folder, will the folderOptions array contain only one element (similar to the current behavior where the provider is only called for the one folder)?
  2. Is it guaranteed that GlobPatterns in the folderOptions.excludes array will have a baseURI that is in workspace folder folderOptions.folder?

@andreamah
Copy link
Contributor

@isc-bsaviano thanks for reviewing my blurb!

  1. When TextSearchProvider called in a multi-root workspace and the "files to include" string only matches a single workspace folder, will the folderOptions array contain only one element (similar to the current behavior where the provider is only called for the one folder)?

Yup!

  1. Is it guaranteed that GlobPatterns in the folderOptions.excludes array will have a baseURI that is in workspace folder folderOptions.folder?

No, this is not guaranteed. You will need to see whether, if the exclude has a baseURI, whether it actually changes the search (or whether it's invalid). When you get the search call from the UI, it should be a valid baseURI. However, when you get arguments from a findFiles or findTextInFiles API call, we can't guarantee that the includes/excludes make sense.

@isc-bsaviano
Copy link

Thanks @andreamah, that's all good to know! Is there a reason why VS Code will no longer be "pre-processing" the exclude globs (filtering out ones that don't apply and turning the rest into strings relative to the workspace folder root)? It's still being done for includes. I will keep my eye out for when this is available so I can update my implementations.

@andreamah
Copy link
Contributor

@isc-bsaviano do you mean having all of the strings relative to one workspace root? The reason was because findFiles/findTextInFiles can possibly return multiple includes and excludes, with different baseURI. If we were to normalize all of these to be relative to the same URI, we'd need to modify the glob patterns, which becomes hard in complicated globs (if the baseURI is very different for the different includes/excludes, should we have them all be patterns relative to the root of the filesystem in the worst case?).

In the case of doing a search from the UI, we will try to keep the options as clean as possible, which likely will involve filtering out and modifying exclude patterns to be friendly with the current folder (since the includes should only be relative to one URI if there is any). However, once we get arguments from an API call (findFiles/findTextInFiles), we can't guarantee anything.

With findTextInFiles, we currently erroneously ignored the exclude baseURI, which is wrong. However, to make this info flow correctly through to the provider, we needed to change things up.

@isc-bsaviano
Copy link

Thanks @andreamah, that makes sense

@andreamah andreamah modified the milestones: July 2024, August 2024 Jul 22, 2024
@isc-bsaviano
Copy link

Hi @andreamah, one more quick question about the new APIs. Does maxResults apply to each folder, or to the entire workspace?

@andreamah
Copy link
Contributor

Hi @andreamah, one more quick question about the new APIs. Does maxResults apply to each folder, or to the entire workspace?

Good question! maxResults should apply to the entire workspace.

@isc-bsaviano
Copy link

@andreamah In version 1.92 I see new proposed APIs textSearchProviderNew and fileSearchProviderNew. Are these ready for extension authors to adopt so we can evaluate the new API surface?

@caleb-allen
Copy link

caleb-allen commented Aug 2, 2024

Hey @andreamah, thanks for your great work on this.

I've been developing a "semantics" search system called TSearch, and I'm curious if it would qualify as a TextSearchProvider, or if it perhaps falls outside the scope of this API. Would love to hear your thoughts.

Rather than taking the "code words" and generating an index of text, it constructs an index where each code word is encoded with its semantic "context".

Take, for example, this javascript snippet:

function hello(name) {
   console.log("Hello", name);
}

In addition to including the word hello, the index also encodes the semantic use of hello—in this case, it is a function name. The point of it all is to let somebody search not just for textual instances of hello, but specifically for functions called hello. Or for variables, string literals, etc., and to do so across very large projects.

Anyways, the reason I'm describing all this is because the "query language" used to search this index isn't exactly text, it's got a few simple rules and operators in order to distinguish which parts of a query are for context ("function") and which are for content ("hello"). It's not as gnarly as regex, not even close, but it's definitely not "just" text.

I see that TextSearchQuery has the property isRegExp, explicitly including at least one instance of search which isn't strictly text. My question is this: is this API seen as something which supports a more general function of "pattern matching", where text and regex are simply two implementations? Or if not, is such a design something that might be desired? As a (rough, uninformed) idea, perhaps indicating a patternEngine property in TextSearchQuery would suffice to open up the API to a much larger feature set. patternEngine would default to ["text", "regex"], and support the same behavior and API that exists today, e.g. isRegExp = (patternEngine == "regex"), but would explicitly tease out regex as simply a default implementation of a more general concept—that of a pattern engine.

Regardless, I don't think my question needs an answer before the TextSearchProvider API can be stabilized (I don't think it'd be a breaking change anyways). I'm primarily interested in the conversation about whether this lower-level part of search is of interest as a surface for extension.

Let me know if any of this needs clarification. Thanks!

@andreamah
Copy link
Contributor

@andreamah In version 1.92 I see new proposed APIs textSearchProviderNew and fileSearchProviderNew. Are these ready for extension authors to adopt so we can evaluate the new API surface?

This isn't ready to be consumed- I'm just creating these so that I can start changing the internal implementation without affecting the existing proposed APIs (since some internal extensions currently use it).

@isc-bsaviano
Copy link

Thanks for the clarification and for all of your hard work to help make this a reality!

@andreamah
Copy link
Contributor

andreamah commented Aug 2, 2024

@caleb-allen Great question!

For the most part, this API is meant to be consumed to simply search for 'text' as-is. For example, if you create a custom filesystem, this API helps with actually understanding what it takes to get search results from your project. This being said, it was not necessarily created to facilitate a special or 'intelligent' search that requires custom options. The reason why the options have things like isRegex is because the UI (aka the search view on the sidebar) will have a button for that, which will drive what info we send to the API. If we introduced more/alternative options, this would preferably match changes in the UI. We want to keep the options simple, as that is what the user expects out of the search view (for now). Also, we only allow one provider per file scheme, so regular text would lose the traditional ripgrep text results if you overwrote our default provider for text with your own.

@caleb-allen
Copy link

If we introduced more/alternative options, this would preferably match changes in the UI. We want to keep the options simple, as that is what the user expects out of the search view (for now). Also, we only allow one provider per file scheme, so regular text would lose the traditional ripgrep text results if you overwrote our default provider for text with your own.

I see, this clarifies things a lot, thank you!

It seems that the behavior I'm trying to construct may be better achieved with other APIs, an "enhancement" on search, rather than modifying the deeper plumbing of all search.

Thanks for your answer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-finalization feature-request Request for new features or functionality on-testplan search Search widget and operation issues
Projects
None yet
Development

No branches or pull requests