-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JS Grammar Returns Ambiguous Parsings #44
Comments
Yeah, see Robin's comment about keyword/name separation. |
as i was commenting on a closed PR i figured i would leave a ptr to it here: #41 and continue the discussion here i spent some time with PEGs in general and OMeta a couple of years back - one of the ideas in OMeta is that strings are not the only thing it will parse - think of an AST that you would like to codify source to source translation rules - but i digress... OMeta will parse arbitrary objects (strings being one particular type of object) and one of the ideas that supports this is the use of generalized predicates. i believe fundamentally, boolean grammars have stronger theoretical underpinnings than 'let's add semantic predicates' . . . and yet having said that, Alessandro's pragmatic approach of generalized matching/predicates seems to work well in practice so, you might consider a rule of the form: lets say ρ is represented as {? ... ?}, the lua grammar for Name might look like: predicates don't consume any input so they are invoked when 'B' completes and the completion code advances the parse only if the predicate succeeds thoughts? |
Actually, for almost all cases we don't need this stuff. JS has look ahead negation builtin to regexes so we can just allow :!? to be added to strings or something that makes them negative lookahead assertions. Alternatively we can change to a more complicated regex parser (I'm sure there is one out there or we could right one ourselves... nearley of course). As far as the OMeta like idea as far as it look it seems allow adding a function as additional constriants to the grammar. My concern with that would be that it might be heavy handed and it may be better to try to implement more advanced features. Possible we could add arbitary ebnf like tags (with the : prefixing) and make it easy to creat functions and assign symbols to them so adding something like a not keywork ebnf would be easy? Just a thought. |
Ah, Robin, that's really cool stuff. As it happens, nearley already supports parsing a list of arbitrary objects! In fact, we're cheating a bit by using the subscript syntax ( The problem with using these features in compiled parsers is that you'll have to run a tokenizer first, which is sort of painful. It's the reason I shy away from (J | B)ison. Anyhow. Your proposal at the end is pretty exciting—I suggested something similar myself to Jacob on IRC. I'm going to look into implementing it this weekend. I'm not convinced of the
Here, (What interests me is that if I get negation working right, I'll have a parser that gracefully handles CFG intersection, because by DeMorgan we have !(!a || !b) -> a && b.) |
i've been experimenting in post-process-as-predicate, if the post process function returns null it considers that as a fail and does not generate the subsequent parse state in predicates-in-parse, i've added a predicate type of symbol - this is diffferent from the .test idea in that it inspects the post-processed result of the preceding token cheers |
k - predicates-in-parse now uses the syntax |
application by way of example, Before: >node bin/nearleythere.js examples/js/lua.js --input "v = false" Table length: 10 Number of parses: 2 Parse results: [ { Block: [ { statement: 'assignment', body: [ [ { name: 'v' } ], [ { boolean: false } ] ] } ], Return: [] }, { Block: [ { statement: 'assignment', body: [ [ { name: 'v' } ], [ { name: 'false' } ] ] } ], Return: [] } ] After: >node bin/nearleythere.js examples/js/lua.js --input "v = false" Table length: 10 Number of parses: 1 Parse results: [ { Block: [ { statement: 'assignment', body: [ [ { name: 'v' } ], [ { boolean: false } ] ] } ], Return: [] } ] |
Once we get the JS parser working well we can get rid of {% and %} and just allow native js. Well be able to tell where there statements began and end. This should work at least for what it returns. It might be easier to implement by just replacing {% and %} with { and }. |
I'm for |
for the purpose of this experiment i was using i agree, folks may very well decide to use null in spite of what theory says did you want to think it over some more before i send a PR? |
Can't you just use an empty object? Feel free to file a PR for |
Recent pushes rectify this issue—now it's just a matter of carefully patching up |
Some grammars will have ambigous parsings due to things like
return [2];
being parseable as both return the list [2] or the element at index 2 of the array return.
Nearley may need new features to circmvent this problem easily.
The text was updated successfully, but these errors were encountered: