You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we produce $unsupported for various things (e.g. lookahead, lookbehind in pcre) and then error about it.
My suggestion instead is that we do lex these correctly, and then error about them in the parser instead. This moves the concept of unsupportedness along a layer.
Eventually I'd like to also construct AST nodes for these, and only error about the unsupportedness when we come to do the AST->NFA conversion. This way we'd also have support for these features for e.g. AST -> regexp rendering (where FSM are not involved), but perhaps also opportunities to deal with them by AST rewriting.
The text was updated successfully, but these errors were encountered:
For the PCRE dialect, $unsupported currently falls into four buckets:
Word boundary, capture groups, and multiline things that libfsm could potentially support: \b, \B, \K, \Z, and \G.
Back references. It makes sense to include these in the AST; simple forms like (foo)\1 can be transformed into (foo)(?:foo) which is compatible with DFAs and linear scanning.
Positive/negative look-ahead and look-behind assertions. We may be able to transform these into something compatible with linear scanning.
Ways to control backtracking: atomic qualifiers and (*VERB) forms like (*COMMIT) and (*PRUNE). These are so specific to backtracking matchers (and PCRE) that I'm not sure if we want to include them. On the other hand, there aren't that many forms of this, so it may make sense.
Currently we produce
$unsupported
for various things (e.g. lookahead, lookbehind in pcre) and then error about it.My suggestion instead is that we do lex these correctly, and then error about them in the parser instead. This moves the concept of unsupportedness along a layer.
Eventually I'd like to also construct AST nodes for these, and only error about the unsupportedness when we come to do the AST->NFA conversion. This way we'd also have support for these features for e.g. AST -> regexp rendering (where FSM are not involved), but perhaps also opportunities to deal with them by AST rewriting.
The text was updated successfully, but these errors were encountered: