Permalink
Please sign in to comment.
Browse files
Implement the glob parser with an re2c-based lexer.
- Fixes IndexError with the single-character globs [ and \. Added unit tests to cover these cases. - But I broke the cases [[] and []]. I'm leaving them out for now because they seem to be inconsistent, with regard to [^[] and [^]] and so forth. I wrote failing spec tests in spec/glob.test.sh and added a comment in _GlobParser about this. - We can now flag 3 different syntax warnings, and there could be more. For example, [[:space is a syntax warning. Warnings are now ignored, but they could be surfaced in a future 'strict' mode. - I think the new representation emits strict output? For example, [[:space should be converted to the regex \[\[:space and not rely on the regex engine to do the same ambiguous parsing. re2c helps with: - The \ handling and other "lookahead". - Reasoning about the bad trailing \ case (Id.Glob_BadBackslash) - It preserves the "lossless syntax tree" invariant, in case we ever get around to adding syntax errors for globs in strict mode. We could statically parsed globs in some cases. See issue #151. - In theory it should be faster than iterating char-by-char in Python. Other details: - The glob_part type is now in osh/osh.asdl. A glob is composed of operators like * and ?, (opaque) character classes, and literals. The literals are composed of several different token types. Addresses issue #125.
- Loading branch information...
Showing
with
402 additions
and 303 deletions.
- +0 −8 build/dev.sh
- +181 −158 core/glob_.py
- +54 −59 core/glob_test.py
- +9 −0 core/id_kind.py
- +6 −5 core/lexer_gen.py
- +6 −5 core/libstr.py
- +28 −3 native/fastlex.c
- +0 −32 osh/glob.asdl
- +28 −0 osh/lex.py
- +15 −7 osh/match.py
- +0 −20 osh/meta.py
- +15 −4 osh/osh.asdl
- +59 −0 spec/glob.test.sh
- +1 −2 test/spec.sh
Oops, something went wrong.
0 comments on commit
a4c178a