Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add !-prefix parsing to glob util #134415

Open
JacksonKearl opened this issue Oct 5, 2021 · 17 comments
Open

Add !-prefix parsing to glob util #134415

JacksonKearl opened this issue Oct 5, 2021 · 17 comments
Assignees
Labels
feature-request Request for new features or functionality file-glob File glob engine
Milestone

Comments

@JacksonKearl
Copy link
Contributor

JacksonKearl commented Oct 5, 2021

It'd be potentially useful to have ! as a prefix added to our glob.ts utils. This could function similarly to how we already handle .gitignore files in the webworker search here:

  1. if a file matches a normal line and doesn't match an !-line, exclude it (we do this today)
  2. If it matches both a normal line and a !-line, include it
  3. Otherwise, include it (we do this today)

This isn't quite as powerful as a real.gitignore as there are only two "layers" of include/exclude, whereas in gitignore each line introduces a new "layer", but I believe it would handle the 99% of cases well enough, for instance every example given to-date in #869.

To handle the many-layer case requires either:

  • migrating to an array to configure files.exclude and search.exclude, which is undesirable from a backwards/forwards compatibility perspective
  • taking care to ensure the existing *.exclude objects are handled in a key-order sensitive way, which is potentially undesirable from a JSON/JS object semantics compatibility perspective

cc @bpasero for any and all input on this

@JacksonKearl JacksonKearl self-assigned this Oct 5, 2021
@JacksonKearl JacksonKearl added the under-discussion Issue is under discussion for relevance, priority, approach label Oct 5, 2021
@bpasero bpasero added the file-glob File glob engine label Oct 5, 2021
@bpasero
Copy link
Member

bpasero commented Oct 5, 2021

Somewhat related: #869

I think the biggest challenge is how to support conflicting includes / excludes. Maybe some inspiration could be taken from how search handles this where you can configure both include and exclude patterns at the same time.

@JacksonKearl
Copy link
Contributor Author

JacksonKearl commented Oct 7, 2021

Ah, so search is exclude-dominated, in search:

  • If the path matches an exclude it's excluded, no exceptions
  • Otherwise, if there are includes and the path matches an include it's included
  • Otherwise, if there are includes and the path doesn't match an include it's excluded
  • Otherwise, it's included

Versus what I described above and what our .gitignore parser does is more include-dominated:

  • If the path matches an include it's included, no exceptions
  • Otherwise, if the path matches an exclude it's excluded
  • Otherwise, it's included

Concretely, given the config:

"files.exclude": {
   "node_modules/**": true,
   "!node_modules/@types": true,
}

The Search approach would exclude all of node_modules, even the @types directory, because it matches the exclude, and include everything else.
The .gitignore approach would include node_modules/@types because it matches the exclude, exclude the other entries in node_modules/** because they match the exclude, and include everything else.

On the other hand, given this config:

"files.exclude": {
   "foo.ts": true,
   "!*.ts": true,
}

The Search approach would exclude foo.ts, include the other *.ts files, and exclude everything else, because if an include is present all files must match the include
The .gitignore approach would exclude foo.ts, and include everything else.

Not really sure what the best option is here. The Search strategy makes sense when you want to exclude the bulk of paths but include some specific ones, whereas the .gitignore strategy makes more sense when you want to include the bulk of paths but exclude some specific ones.

I lean towards using the .gitignore strategy because the search.exclude and files.exclude settings are historically more about excluding a specific set of files and including the rest (and the ask in #869 is to allow including files that match existing excludes), whereas search is more about including a specific set of files and excluding the rest (and we already support excluding files that match existing includes)

cc @roblourens for any thoughts here

@roblourens
Copy link
Member

I would definitely not expect a negative exclude to work the same as an include. If I put *.ts in "files to include", I definitely don't want to see .js files, but a negative exclude for ts files is different.

The .gitignore approach would exclude foo.ts, and include everything else.

Does that mean that if I also have an exclude for *.js, it would be overridden by !*.ts? I can't remember how this works in gitignore except that I find it hard to get right.

Also, do you expect that I'd be able to have negated patterns in settings that override patterns in the gitignore? Last I looked into this area, ripgrep makes gitignore always take precedence.

@JacksonKearl JacksonKearl modified the milestones: Backlog, On Deck Oct 25, 2021
@bpasero
Copy link
Member

bpasero commented Mar 6, 2022

Hm, I think maybe this issue needs to split into two issues:

  • one that allows to use ! in the beginning of a pattern to negate it
  • another one for how to handle this when multiple patterns apply

I am not sure the intent of this issue, so asking @JacksonKearl to clarify: is this about making glob patterns aware of a leading ! to negate the entire pattern? We already have limited support for ! in character ranges it seems:

/**
* Simplified glob matching. Supports a subset of glob patterns:
* * `*` to match one or more characters in a path segment
* * `?` to match on one character in a path segment
* * `**` to match any number of path segments, including none
* * `{}` to group conditions (e.g. *.{ts,js} matches all TypeScript and JavaScript files)
* * `[]` to declare a range of characters to match in a path segment (e.g., `example.[0-9]` to match on `example.0`, `example.1`, …)
* * `[!...]` to negate a range of characters to match in a path segment (e.g., `example.[!0-9]` to match on `example.a`, `example.b`, but not `example.0`)
*/

@JacksonKearl
Copy link
Contributor Author

is this about making glob patterns aware of a leading ! to negate the entire pattern?

yes

@bpasero
Copy link
Member

bpasero commented Mar 7, 2022

It is somewhat easy to add global negate support for glob, see efa5122 for a first stab at it.

However, there are a few questions:

  • splitGlobAware probably needs to change to account for the leading ! (e.g. change the return type to { segments: string[]; negated: boolean; }) [1]
  • how to properly handle glob expressions that are not just a simple pattern [2]

[1]

Not sure the ripples or adoption cost where it is used, @roblourens in search. Does RipGrep even support these patterns?

[2]

Take this example:

let globExpression = {
	'!*.js': true,
	'!*.ts': true
};

Since glob expressions are basically a OR combination of all patterns, the above pattern will match on everything, even though it may look as if this will match every file that is not .js or .ts. Not sure how realistic that scenario is though, but wanted to bring it up anyway.

@roblourens
Copy link
Member

roblourens commented Mar 7, 2022

Ripgrep supports them, I think the support there will be the same as in gitignore files

@bpasero
Copy link
Member

bpasero commented Mar 8, 2022

So how is our glob util used in search component then? Are we even requiring it there or are we just passing on patterns to RipGrep and let it deal with it?

@roblourens
Copy link
Member

We pass the patterns to ripgrep, and for editors that are open, we interpret the glob patterns. For patterns with sibling clauses, we interpret the patterns.

@bpasero
Copy link
Member

bpasero commented Mar 29, 2022

After some discussion with Christoph, I concluded that support for ! to negate glob patterns does not work well with our support for glob expressions that have the form of pattern: enablement in an object.

I see no practical way how to parse an expression such as:

{
    "!workspace/*.js": true,
    "!workspace/*.ts": true,
    "workspace/*.html": true,
    "workspace/*.xml": true
}

Our normal algorithm is to report a match overall when any of the enabled patterns match. With negated patterns, that doesn't really work anymore because negated patterns match on almost everything and as such, 2 negated patterns in an expression will match everything.

Then my naive solution to this was to group all negated patterns together and require them ALL to match and then check the remaining patterns for matches, basically for the above example: <*.html> || <*.xml> || (<not *.js> && <not *.ts>).

Can someone come up with a meaningful expression that combines negated patterns and normal patterns that makes any sense with this rule? So far I could not come up with any example...

@roblourens
Copy link
Member

From the gitignore manpage,

any matching file excluded by a previous pattern will become included again

In that example, I don't think the negated patterns should change the overall result since they don't intersect with the html/xml patterns. Another example would be like

{
  "foo/*": true,
  "bar/*": true,
  "!foo/file.txt": true
}

So I think you would check the positive patterns, then check the negative patterns, like (<foo/*> || <bar/*>) && <not foo/file.txt>

The : true feels confusing here too...

In gitignore, order matters, and I'm not sure whether we would run into trouble by assuming that the negated patterns are always resolved after the positive ones.

@bpasero
Copy link
Member

bpasero commented Mar 30, 2022

If I understand correctly a path matches the expression when it matches any of the non-negated patterns and then all of the negated patterns?

In the example above:

  • foo/test.html: match
  • bar/test.html: match
  • foo/file.txt: no match
  • bar/file.txt: match

That could work. It is probably not very intuitive because we do not really support processing the expression in an ordered way like .gitignore does...

@bpasero
Copy link
Member

bpasero commented Mar 31, 2022

@JacksonKearl can you maybe chime in as stake holder, you mention it would solve #869 so we should maybe look at examples and build this feature around the needs. If I understand correctly, users would configure files.exclude with the new support for negated patterns?

@roblourens
Copy link
Member

If I understand correctly a path matches the expression when it matches any of the non-negated patterns and then all of the negated patterns?

I guess, but it would probably be worth looking at ripgrep or git source before we get bit by edge cases

@JacksonKearl
Copy link
Contributor Author

If I understand correctly, users would configure files.exclude with the new support for negated patterns?

Yes exactly.

I guess, but it would probably be worth looking at ripgrep or git source before we get bit by edge cases

We mostly implement their logic here:

const isFileIncluded = this.gitignoreLinesToExpression(fileIncludeLines, dirPath, false);
but it'd require some significant changes to get that working for excludes, for instance handling filtering specifically directories vs files and following parent pointers differently from how we merge the settings today. Technically we may need to handle ordering as well, but the code above doesn't, and that wouldn't really work with our object based model anyways.

Ideally, the following would work:

"files.exclude": {
    "node_modules/*": true,
    "!node_modules/@types": true,
    "node_modules/@types/*": true,
    "!node_modules/@types/*.js": true,
}

However that requires some understanding of the ordering that we may not want to rely on, especially given things like settings scope merging.

Absent that, it'd be helpful to have this:

"files.exclude": {
    "node_modules/*": true,
    "!node_modules/@types": true,
}

where if a file matches some bare expression and no negated expression, it matches the glob and is included. This way ordering isn't important, a simple 2 pass approach works. This is actually what we do already in the ignore file parser.

@dbagley1
Copy link

dbagley1 commented Aug 3, 2023

Support for negate patterns is essential to fixing a bug with the "Explorer: Exclude Git Ignore" setting. Negate patterns are ignored when hiding files in the explore pane. As mentioned in #175066. That issue was marked as a duplicate of this issue and closed.

@lramos15 lramos15 added feature-request Request for new features or functionality and removed under-discussion Issue is under discussion for relevance, priority, approach labels Dec 15, 2023
@Thaina
Copy link

Thaina commented May 28, 2024

I try to use

{
    ...,
    "files.exclude" : {
        "Library/" : true,
        ...
    },
    "search.exclude" : {
        "Library/PackageCache/**" : false,
        ...
    },
    ...
}

And it's not work

This below work for some reason

{
    ...,
    "files.exclude" : {
        "Library/[!(PackageCache)]*/": true,
        "library/[!(PackageCache)]*/": true,
        "Library/PackageCache/": true,
        "library/PackageCache/": true,
        ...
    },
    "search.exclude" : {
        "Library/PackageCache/": false,
        "library/PackageCache/": false
    },
    ...
}

Also though it seem the negate syntax only work with one character only

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Request for new features or functionality file-glob File glob engine
Projects
None yet
Development

No branches or pull requests

6 participants