Support string alternative syntax (curly braces pattern) #2

michaelsproul · 2014-09-01T12:41:18Z

Is there a plan to support string alternatives with curly braces, like {foo,bar}program matching fooprogram and barprogram?

To avoid code duplication in the matcher, I was thinking it might be a good idea to use the regex crate to implement this (as well as the rest of glob).

Support in other languages:

C/C++: yes using the GLOB_BRACE option.
Python: no
Ruby: yes
Java: yes

The text was updated successfully, but these errors were encountered:

blaenk · 2015-02-12T02:56:14Z

/cc @alexcrichton (nothing important, no need to read/reply; just if you're interested)

TL;DR: A while back I attempted a complete rewrite of glob. I called it glob-prime, as in glob'. It parsed glob patterns into the AST we have now and translated the AST to regular expressions. It also had a different 'selector-based' selection algorithm, so that each pattern type (e.g. recursive, exact, wildcard) would define its own globbing behavior. The goal was extensibility and maintainability, but it was much slower, so I gave up on it and proposed the remaining things I thought were applicable in #25/#26/#27/#28.

I'll probably take a shot at implementing this alternation syntax whenever I'm bored if no one else gets around to it.

I actually rewrote this glob package from scratch with that goal, to translate globs to regular expressions in order to make the code thinner and easier to expand/maintain. This is what Python does, and it wouldn't surprise me if others did the same. Adding this alternative syntax would've been very straightforward, as it would more or less map to a regex alternation, foo|bar. You can see the translation code here, so that creating a Pattern was essentially a two-step process of parsing the text into the glob AST we already have and then translating that to regex.

I also re-implemented the actual globbing algorithm to be one which, as this one currently does, splits the patterns on the path separator and then translates each one into a given token which has 'globbing' behavior defined for it. The goal was to make things much easier to expand: adding the alternative syntax would've basically consisted of creating an AST token for it, defining how it translates to a regex, and then defining a selector variant for it with its selection behavior, which would have been, for example, something like Alternative(Vec<Box<Selector>>), since AFAIK glob alternations can consist of sub-glob patterns, which means that something similar would have to be done for the translation into regex, which would've perhaps simply consisted of a recursive call to the translation function.

When all was said and done, however, I found it to be much slower.

First, the new selection algorithm I used, which I actually got inspiration from python's glob for, wasn't exactly optimal because on each iteration I would have to re-trace through each of the selectors. Python has the yield keyword which makes all of this very natural, but the iterator as implemented essentially would return a result, then on the next iteration would have to go through the top-most selector, then the next one, and so forth until it got back to its original position.

Perhaps I could've optimized this by storing a reference to the old position or something, but I didn't get around to it because I also discovered that the regular expression approach in general was much slower. I already kind of expected this because regular expressions are much more general and have more machinery to handle other use cases, which can hardly compete against a hand-tuned specific parser like the one we have now. But I didn't expect it to be that much slower.

I don't remember the numbers, but it was on the order of something like, glob would take maybe 50ns max with the benchmarking test set, and mine would take like 200,000ns. I figured I'd cut my losses and stick with the current globbing mechanism while keeping my regular expression back-end, but it was still something like 80,000ns to the current one's 50ns max. With a difference like that I figured my redesigns would never be accepted, even if it was still orders of magnitude faster than for example python's, and even though globbing isn't something that's typically done in a hot loop.

So I figured I'd cut my losses once more and proposed what remained, which was relative path support (#25), error reporting (#26), recursive wildcard simplification (#27), and I/O error propagation (#28).

I didn't really bother too much with attempting to optimize it further because it was mostly a temporary diversion for me, since I need this package for one of mine. But more importantly what we have now works fine enough, and globbing isn't necessarily very extensible, so it's not like we really required that extensibility; the gains would've mostly been in the area of maintainability, but this package shouldn't need to be touched too much for the aforementioned reason.

jszwedko mentioned this issue Nov 12, 2020

Curly braces in the include directive, file pattern vectordotdev/vector#5003

Closed

JohnTitor changed the title ~~Support string alternative syntax.~~ Support string alternative syntax (curly braces pattern) Jan 8, 2023

JohnTitor mentioned this issue Jan 8, 2023

{a/**} returns error #123

Closed

mike-code mentioned this issue Jan 11, 2023

glob pattern matching does not allow alternate "or" pksunkara/cargo-workspaces#85

Closed

armichaud mentioned this issue Sep 4, 2023

deno task - Support brace expansion denoland/deno_task_shell#91

Open

mtsmfm mentioned this issue Jan 2, 2024

Glob with conditional file names mtsmfm/split-test#5

Closed

igor-ramazanov mentioned this issue Jan 4, 2024

No support for braces in globPatterns kakoune-lsp/kakoune-lsp#708

Open

Sec-ant mentioned this issue Apr 27, 2024

fix(glob): correctly handle relative patterns biomejs/biome#2620

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support string alternative syntax (curly braces pattern) #2

Support string alternative syntax (curly braces pattern) #2

michaelsproul commented Sep 1, 2014

blaenk commented Feb 12, 2015

Support string alternative syntax (curly braces pattern) #2

Support string alternative syntax (curly braces pattern) #2

Comments

michaelsproul commented Sep 1, 2014

blaenk commented Feb 12, 2015