Permalink
Browse files

Implement the glob parser with an re2c-based lexer.

- Fixes IndexError with the single-character globs [ and \.  Added unit
  tests to cover these cases.
- But I broke the cases [[] and []].  I'm leaving them out for now
  because they seem to be inconsistent, with regard to [^[] and [^]] and
  so forth.  I wrote failing spec tests in spec/glob.test.sh and added a
  comment in _GlobParser about this.
- We can now flag 3 different syntax warnings, and there could be more.
  For example, [[:space is a syntax warning.  Warnings are now ignored,
  but they could be surfaced in a future 'strict' mode.
- I think the new representation emits strict output?  For example,
  [[:space should be converted to the regex \[\[:space and not rely on
  the regex engine to do the same ambiguous parsing.

re2c helps with:

- The \ handling and other "lookahead".
  - Reasoning about the bad trailing \ case (Id.Glob_BadBackslash)
- It preserves the "lossless syntax tree" invariant, in case we ever get
  around to adding syntax errors for globs in strict mode.  We could
  statically parsed globs in some cases.  See issue #151.
- In theory it should be faster than iterating char-by-char in Python.

Other details:

- The glob_part type is now in osh/osh.asdl.  A glob is composed of
  operators like * and ?, (opaque) character classes, and literals.  The
  literals are composed of several different token types.

Addresses issue #125.
  • Loading branch information...
Andy Chu
Andy Chu committed Jun 24, 2018
1 parent 1ad997d commit a4c178ab0db26f19fba68a47e223266f3d610def
Showing with 402 additions and 303 deletions.
  1. +0 −8 build/dev.sh
  2. +181 −158 core/glob_.py
  3. +54 −59 core/glob_test.py
  4. +9 −0 core/id_kind.py
  5. +6 −5 core/lexer_gen.py
  6. +6 −5 core/libstr.py
  7. +28 −3 native/fastlex.c
  8. +0 −32 osh/glob.asdl
  9. +28 −0 osh/lex.py
  10. +15 −7 osh/match.py
  11. +0 −20 osh/meta.py
  12. +15 −4 osh/osh.asdl
  13. +59 −0 spec/glob.test.sh
  14. +1 −2 test/spec.sh
View
@@ -50,13 +50,6 @@ gen-runtime-asdl() {
echo "Wrote $out"
}
gen-glob-asdl() {
local out=_devbuild/gen/glob_asdl.py
local import='from osh.meta import GLOB_TYPE_LOOKUP as TYPE_LOOKUP'
PYTHONPATH=. asdl/gen_python.py osh/glob.asdl "$import" > $out
echo "Wrote $out"
}
# TODO: should fastlex.c be part of the dev build? It means you need re2c
# installed? I don't think it makes sense to have 3 builds, so yes I think we
# can put it here for simplicity.
@@ -109,7 +102,6 @@ minimal() {
gen-types-asdl
gen-osh-asdl
gen-runtime-asdl
gen-glob-asdl
pylibc
}
Oops, something went wrong.

0 comments on commit a4c178a

Please sign in to comment.