Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

negative matches problem for .min.js like paths #65

Closed
gera2ld opened this issue May 20, 2015 · 11 comments
Closed

negative matches problem for .min.js like paths #65

gera2ld opened this issue May 20, 2015 · 11 comments

Comments

@gera2ld
Copy link

gera2ld commented May 20, 2015

I am trying to filter files by extnames:

var minimatch = require('minimatch');
console.log(minimatch('foo/bar.min.js', '*.!(js|css)', {matchBase: true})); // true
console.log(minimatch('foo/bar.min.js', '!*.+(js|css)', {matchBase: true}));    // false

Are the results expected?

@mik01aj
Copy link

mik01aj commented Jul 2, 2015

I have a silimar issue, possibly related:

> var m = require('minimatch')
> m('foo-integration-test.js', '*-integration-test.js')
true
> m('foo-integration-test.js', '!(*-integration-test.js)')
true

So the second pattern is a negation of the first one, but they both match! This is madness.

@mik01aj
Copy link

mik01aj commented Jul 2, 2015

I spotted this in your unit tests:

['*.!(js)', ['foo.bar', 'foo.', 'boo.js.boo'] ],

I checked this pattern and it found some problems:

> m('asd.jss', '*.!(js)')
false
> m('foo.jszzz.js', '*.!(js).js')
false

This is NOT expected behaviour.

@mik01aj
Copy link

mik01aj commented Jul 2, 2015

This is also very, very strange:

> m('a-integration-test.js', '*-!(integration-)test.js')
true
> m('a-integration-test.js', '*-!(integration)-test.js')
false
> m('a-integration-test.js', '*!(-integration)-test.js')
true
> m('a-integration-test.js', '*!(-integration-)test.js')
true
> m('a-integration-test.js', '*!(integration)-test.js')
true
> m('a-integration-test.js', '*!(integration-test).js')
true
> m('a-integration-test.js', '*-!(integration-test).js')
true
> m('a-integration-test.js', '*-!(integration-test.js)')
true
> m('a-integration-test.js', '*-!(integra)tion-test.js')
false
> m('a-integration-test.js', '*-integr!(ation)-test.js')
false
> m('a-integration-test.js', '*-integr!(ation-t)est.js')
false
> m('a-integration-test.js', '*-i!(ntegration-)test.js')
false
> m('a-integration-test.js', '*i!(ntegration-)test.js')
true
> m('a-integration-test.js', '*te!(gration-te)st.js')
true
> m('a-integration-test.js', '*-!(integration)?test.js')
false
> m('a-integration-test.js', '*?!(integration)?test.js')
true

They should all return false.

@gera2ld
Copy link
Author

gera2ld commented Jul 2, 2015

👍

@Burnett01
Copy link

Same issues here. Please fix

@isaacs
Copy link
Owner

isaacs commented Jul 5, 2015

There are three things going on here.

First of all, I think that "negated patterns" is just really problematic to reason about. I'd like to deprecate the feature, for the same reason that it was deprecated in node-glob, and replaced with an explicit set of "ignore" patterns. (You can of course do glob("**", {ignore:["foo.js"]}) instead of glob("!foo.js"), but it's easier to reason about.)

Specifically, this is problematic because it means that extglob patterns like !(foo|bar) at the start of a pattern is read as a negated form of (foo|bar), which is not the same thing at all. In this case, (foo|bar) is not a valid extglob pattern, so it would only match a file named literally "(foo|bar)", and thus a file "foo" will not match (foo|bar), and the negated value is true. If you wanted to use it as a negated pattern, then everything after the leading ! needs to be a valid glob, so you could do something like !@(foo|bar), which is just weird. It's easier to always set {nonegate:true} and use globs as Glob intended.

The first tricky thing about doing that here is that it means reviewing how that is used in fstream-ignore, which is important because it's how npm parses .npmignore files. Since it is already using flipNegate and some other weird options, we can just as easily make this a breaking change, and have it use nonegate:false (or imply nonegate:false if flipNegate is turned on).

The second tricky thing about deprecating negation is that, like we did with node-glob, we'd have to allow someone to pass in a set of ignore or exclude patterns, and fail on any files that match any of those patterns. Its doable, but more work.

When running without negation enabled (that is, with {nonegate:true} as the options param), most of the cases are identical to the intended bash 4.3 behavior. Observe: https://gist.github.com/isaacs/f29fe3c07b8ccee66c3a

The second issue is that we're being a little overly aggressive in the negative extglob patterns. So, for example, a pattern like foo.!(j) is failing to match foo.js. Every failing test is this class of error, so that's the bona fide bug.

The third issue I'm seeing is that negative extglobs like !(foo|bar) used in combination with a * will be easy to get wrong. For example, the pattern *!(.js|.css) will match foo.js. Try it:

$ [[ foo.js == *!(.js|.css) ]] && echo yup
yup

The reason is that the * matches f and then the oo.js is neither .js nor .css. So, it's important to always have some unambiguous character in there. *.!(js|css) will do the right thing.

Additionally, a negated pattern can match the empty string! So, for example:

$ [[ foo. == *.!(js|css) ]] && echo yup
yup

The * matches foo and then the . matches the ., and then the '' is not equal to either js or css, so it's a match. This is relevant in cases like *!(.min).js.

$ [[ foo.min.js == *!(.min).js ]] && echo yup
yup

foo.min matches the *, the empty string matches the !(min), and then the .js matches .js. In that case, I think what we really want is an explicit way to say "exclude any matches that also match any of these other patterns", which leads back to the first issue in this comment: we should deprecate negation, in favor of explicit exclusion. It's a good idea here for all the same reasons why it was a good idea for node-glob.

@mik01aj
Copy link

mik01aj commented Jul 6, 2015

@isaacs, thanks for your explanation :)

Every failing test is this class of error, so that's the bona fide bug.

Imho not. This doesn't explain the difference between these 2 examples:

> m('a-integration-test.js', '*-!(integration-)test.js')
true
> m('a-integration-test.js', '*-!(integration)-test.js')
false

It seems to me that minus has some special meaning here.

Anyway, I totally agree with deprecating the whole negation "feature" in minimatch. It causes more problems than benefit.

@gera2ld
Copy link
Author

gera2ld commented Jul 6, 2015

Thanks a lot for @isaacs 's explaination, I think it's a good idea to use explicit rules instead of the negated patterns too.

@mik01aj ,

  1. a-integration matches *, the empty string matches !(integration-), thus true.
  2. The two dashes (-) have limited the matches, thus false.

@mik01aj
Copy link

mik01aj commented Jul 6, 2015

Ah damn, you're right. Thanks @gera2ld :)

@isaacs
Copy link
Owner

isaacs commented Jul 15, 2015

Found the problem. The way that negative lookaheads are being used is not sufficient. If the negated part of the pattern appears at the start of a longer pattern, it's failing because it does match the negative lookahead.

For example, the pattern *.!(js).!(xy) turns into the regexp /^(?!\.)(?=.)[^\/]*?\.(?:(?!js)[^\/]*?)\.(?:(?!xy)[^\/]*?)$/.

Annotated:

  Not starting with dot
  |
  |     positive lookahead, must contain some value
  |     |
  |     |    any number of not-/ chars (non-greedy)
  |     |    |
  |     |    |      a literal .
  |     |    |      |
  |     |    |      | start negative lookahead group
  |     |    |      | |
  |     |    |      | |  negative lookahead pattern (js)
  |     |    |      | |  |
  |     |    |      | |  |     some number of not-/ chars
  |     |    |      | |  |     |
  |     |    |      | |  |     |       another literal .
  |     |    |      | |  |     |       |
  |     |    |      | |  |     |       |    neg LA pattern (xy)
  |     |    |      | |  |     |       |    |
  |     |    |      | |  |     |       |    |     more non-/
  |     |    |      | |  |     |       |    |     |
  |     |    |      | |  |     |       |    |     |       end
  |     |    |      | |  |     |       |    |     |       |
  v     v    v      v v  v     v       v    v     v       v
/^(?!\.)(?=.)[^\/]*?\.(?:(?!js)[^\/]*?)\.(?:(?!xy)[^\/]*?)$/

The mistake here is that the negative lookahead portion needs to cover the entire pattern after the lookahead pattern, not just the bit within it's capture group. That means, effectively, any negative group needs to include the rest of the regexp after that point within the (?!...) section.

If so, then this is the regexp that it should produce in that case:

  Not starting with dot
  |
  |     positive lookahead, must contain some value
  |     |
  |     |    any number of not-/ chars (non-greedy)
  |     |    |
  |     |    |      a literal .
  |     |    |      |
  |     |    |      | start negative lookahead group
  |     |    |      | |
  |     |    |      | |  negative lookahead pattern (js)
  |     |    |      | |  |
  |     |    |      | |  |    the rest of the pattern after the neg LA group
  |     |    |      | |  |    |
  |     |    |      | |  |    |                    some number of not-/ chars
  |     |    |      | |  |    |                    |
  |     |    |      | |  |    |                    |       another literal .
  |     |    |      | |  |    |                    |       |
  |     |    |      | |  |    |                    |       |    neg LA pattern (xy)
  |     |    |      | |  |    |                    |       |    |
  |     |    |      | |  |    |                    |       |    |    rest of the pattern after the neg LA
  |     |    |      | |  |    |                    |       |    |    |
  |     |    |      | |  |    |                    |       |    |    | more non-/
  |     |    |      | |  |    |                    |       |    |    | |
  |     |    |      | |  |    |                    |       |    |    | |       end
  |     |    |      | |  |    |                    |       |    |    | |       |
  v     v    v      v v  v    vvvvvvvvvvvvvvvvvvvv v       v    v    v v       v
/^(?!\.)(?=.)[^\/]*?\.(?:(?!js\.(?:(?!xy)[^\/]*?)$)[^\/]*?)\.(?:(?!xy$)[^\/]*?)$/

That means keeping track of the start and stop of each negative group, and some string slicing for each of them, moving from last to first, before parsing the regular expression. Totally doable, just some new JavaScript to type.

@isaacs
Copy link
Owner

isaacs commented Jul 15, 2015

Can one of you try doing npm install isaacs/minimatch#fix-gh-65 and see if that resolves your issue? It should behave identical to bash in these cases now.

I do still want to deprecate nonegate, but that's a major version bump. Also, I worry about the performance impact of using negative extglob sections, because the size of the regexp does kind of explode if you have more than a few.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants