-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Greedy repeat parser problem #9
Comments
Hi, Yeah, the RepeatParser is certainly greedy. Though you do have options for this scenario: First, you can enforce the rule that a '.' must follow another character such as:
Or, if performance is a big concern (i.e. you are parsing millions of these things in a server scenario) you can use the RepeatParser's separator to allow an optional period in-between each character:
Hope this helps! |
Hm, thinking about that for a second, this will also work and is a little cleaner:
|
Thanks it works of course. I haven't done any grammars since studies. However the W3C syntax makes me wonder. I understand that the semantics of
mean that matching string contains a The question though is, is it a valid EBNF notation and such parsers are actually used? Or is this syntax just more human-readable and indented to be easier to comprehend in written form? |
This is valid EBNF notation, and works with LALR parsers, though does not work with LL parsers like Eto.Parse. Being recursive descent, the repeat parser knows nothing about what should come after it. With LALR parsers, a huge 'table of possibilities' is typically created which allows it to handle patterns like this. I've pondered the concept of adding look-ahead to Eto.Parse and it might be doable, however it may degrade performance which is not what I'd like to see. |
Thanks for clarifying this for me. I certainly won't need the enchanced functonality, given that I was able to achieve the desired result by adjusting my productions. |
Hi
I'm trying to create a grammar to parse SPARQL property paths, as defined here. I only need part of that vocabulary. Unfortunately W3C uses their own EBNF syntax so instead of tranlating it to vanilla EBNF I decided to try rewrite the relevant rules using your shortcut syntax, which I find quite neat.
However, I've bumped into problems with rule
PN_PREFIX
.In short,
PN_PREFIX
should match the prefix of a QName URI. For example, given a QNamerdf:type
it would be matching therdf
part. As per the rule, the first character must be a letter, and then additionally characters are allowed.I rewrote PN_PREFIX as
pn_chars_base
matches ther
and then the RepeatParser matchesdf
, which is unfortunate because thenpn_chars
fails, because it doesn't match the colon, thus failing entire optional pattern.The intent is that
pn_chars_base
,pn_chars
inside repeat and the lastpn_chars
matchedr
,d
andf
respectively so that the entirepn_prefix
matchedrdf
.Any idea what's not right with my grammar?
The text was updated successfully, but these errors were encountered: