Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glob: Double asterisk is not matching zero directories #185

Closed
eWert-Online opened this issue Mar 4, 2021 · 5 comments · Fixed by #192
Closed

Glob: Double asterisk is not matching zero directories #185

eWert-Online opened this issue Mar 4, 2021 · 5 comments · Fixed by #192

Comments

@eWert-Online
Copy link
Contributor

Using the latest (unreleased) version with #172 already merged, I would expect ** to match zero or more directories.

Given the following two paths

/project/path/test/lib/test.ml
/project/path/test/lib/deep/test2.ml

with the pattern test/lib/**/*.ml, I would expect both to be true.
Right now however, only the second (deep) one matches.

Doing the same without the lib like this test/**/*.ml returns both paths as true.

This leads me to the conclusion, that ** right now means one or more directories.

(Related to #171)

@rgrinberg
Copy link
Member

@sliquister do you agree that this is a bug?

@v-gb
Copy link
Contributor

v-gb commented Jun 29, 2021

@rgrinberg did you actually mean to ask me?

I don't know how globs are documented right now, but both git and hg would interpret ** as matching >= 0 number of directories, so in terms of expectations, it's likely less surprising to interpret ** similarly, as @eWert-Online is requesting. (I also looked at fnmatch but it doesn't have ** or equivalent, so no comparison possible)

@rgrinberg
Copy link
Member

I wrongly recalled that it was you who contributed many improvements to the glob api. It was actually another JaneStreet employee. In any case, I assume you are still using Re.Glob, in which case we don't want to change the behavior and subtly break things for you.

@eWert-Online since we all seem agree with your assessment, would you like to contribute a fix?

@eWert-Online
Copy link
Contributor Author

My OCaml isn't that good, but I can at least try 🙂

@eWert-Online
Copy link
Contributor Author

I looked into this, but I think solving this is currently a bit too much for me 😄

I think the problem is, that the parser currently sees the /'s as explicitly set characters that need to be present in the path.
In my understanding it should however "swallow" the path segments recursively until it finds its match or runs out of characters. If it encounters a .. or . it should also fail.

I did write some tests, which test all of the above mentioned.

  assert (re_match    (glob ~anchored "foo/**/bar/**/baz") "foo/bar/baz");
  assert (re_match    (glob ~anchored "foo/**/bar/**/baz") "foo/bar/x/y/z/baz");
  assert (re_match    (glob ~anchored "foo/**/bar/**/baz") "foo/x/y/z/bar/baz");
  assert (re_match    (glob ~anchored "foo/**/bar/**/baz") "foo/bar/x/bar/x/baz");
  assert (re_mismatch (glob ~anchored "foo/**/bar/**/baz") "foo/bar/../x/baz");
  assert (re_mismatch (glob ~anchored "foo/**/bar/**/baz") "foo/bar/./x/baz");

rgrinberg added a commit to rgrinberg/opam-repository that referenced this issue Aug 25, 2021
CHANGES:

* Add the `[:alpha:]` character class in `Re.Perl` (ocaml/ocaml-re#169)
* Double asterisk (`**`) in `Re.Glob` (ocaml/ocaml-re#172)
  Like `*` but also match `/` characters when `pathname` is set.
* Double asterisk should match 0 or more directories unless in trailing
  position. (ocaml/ocaml-re#192, fixes ocaml/ocaml-re#185)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants