-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precedence #2
Comments
Hi, predecence with PEGs is often a bit surprising. Original pikaparser basically compiles the precedence integers into "base grammar" by the usual failthrough construction with a base case, which removes much ambiguity and also allows you to nicely specify the fixity of whatever operators you have. In your case, there's no telling whether the "glue chars together" in I succeeded with implementing the failthrough gadget manually: using PikaParser
const P = PikaParser
r = Dict(
:regex => P.seq(:expr1), # toplevel
:expr1 => P.first(:or => P.seq(:expr1, P.token('|'), :expr2), :expr2), # precedence level 1
:expr2 => P.one_or_more(:basic), # precedence level 2
:basic => P.first( # easily decidable single-item base cases
:char => P.satisfy(isletter),
:parens => P.seq(P.token('('), :expr1, P.token(')')),
)
)
g = P.make_grammar([:regex], P.flatten(r))
input = collect("abc|d(e|fg|h)ij|kl")
p = P.parse(g, input)
println(P.traverse_match(g, p, P.find_match_at(g, p, :regex, 1), :regex)) (At precedence level 1, you may also try other combinations of On the string regex(
expr1(
or(
expr1(
or(
expr1(expr2(basic(char()), basic(char()), basic(char()))),
var"or-2"(),
expr2(
basic(char()),
basic(
parens(
var"parens-1"(),
expr1(
or(
expr1(
or(
expr1(expr2(basic(char()))),
var"or-2"(),
expr2(basic(char()), basic(char())),
),
),
var"or-2"(),
expr2(basic(char())),
),
),
var"parens-3"(),
),
),
basic(char()),
basic(char()),
),
),
),
var"or-2"(),
expr2(basic(char()), basic(char())),
),
),
) I don't see the immediate issue why your parser would produce the "bad" parsetree, but I expect the problem to be a bad interplay of greedy matching and left-recursion problems, which always gives surprises. In this case, if you'd parse with normal recursive descent, you would expect the You might be able to quickfix your grammar by instead using Also, I should probably add some support for easily creating the precedence chains :D |
Bonus: parsing the regex suffixes with normal grammars is a major source of PITA but I guess here you'll have luck with just
|
Also, with #3 merged you can write the ruleset from above roughly as: r = Dict(
P.@precedences (i -> Symbol(:r, i)) c n begin
:regex => P.seq(n, P.token('|'), c)
:sequence => P.one_or_more(n)
P.first(
:group => P.seq(P.token('('), n, P.token(')')),
:char => P.satisfy(isletter),
)
end
) (untested) |
Cool, I will play around with this. It already looked like your previous suggestions worked, and I understood better why it didn't work before, so thanks! |
OK. Feel free to report your final grammar, might be useful to add some extra bits from it to tests. |
If you want you can have a look here: https://github.com/jkrumbiegel/ReadableRegex.jl/blob/explain-regex/explain_regex.jl I wanted to do the reverse of what the package is doing, translate a given regex into function code that should hopefully be easier to understand. For example, either(
BEGIN *
maybe('-') *
between(0, 2, char_in('0':1:'9')) *
maybe(capture('.' * between(1, 2, char_in('0':1:'9')))) *
END,
BEGIN *
maybe('-') *
capture("100") *
maybe(capture('.' * between(1, 2, char_in('0')))) *
END,
) |
Aaaah nice. Regex is a bit write-only tbh, but this should work. If I get it right, you are able to reconstruct actual ReadableRegex structure from any regex pattern? Anyway, let me know in case you hit any difficulties! (also, how new is the |
Yes, that's the goal. And yeah it's mostly read only, but it's very hard to understand as it is. Like spotting a |
Thinking about that, is there any regular language matcher in Julia? This way we could have a completely regex-less regular language matching. (Technically you could translate the regular expressions to pikaparser and match them as is, but compared to plain DFA/NFA required for regexes there's an ugly lot of overhead here...) |
Closing this because the precedence helpers gonna get released asap. Thanks for input! |
Hi, I was trying to parse regexes with this parser and couldn't get the
|
operator to work.For
abc|de
I getBecause I don't know how to specify that
|
should have the largest possible sequences on left and right.The relevant grammar parts of this are
:sequence => P.one_or_more(:expr)
, then:expr => P.first(:either, :negative_lookahead, :positive_lookahead, :positive_lookbehind, :negative_lookbehind, :zero_or_more, :one_or_more, :repetition, :repetition_at_least, :repetition_from_to, :maybe, :noncapturing_group, :capturing_group, :not_set, :set, :specialized_char, :dot, :normalized_char, :char, :_begin, :_end)
and:either => P.seq(:sequence, :pipe, :sequence)
.I saw that https://github.com/lukehutch/pikaparser has precedence integers for clauses but I didn't see anything here, is that a missing piece of the puzzle?
The text was updated successfully, but these errors were encountered: