JS Grammar Returns Ambiguous Parsings #44

YafahEdelman · 2014-11-12T22:04:38Z

Some grammars will have ambigous parsings due to things like
return [2];
being parseable as both return the list [2] or the element at index 2 of the array return.
Nearley may need new features to circmvent this problem easily.

kach · 2014-11-13T00:13:19Z

Yeah, see Robin's comment about keyword/name separation.

rwindelz · 2014-11-13T05:33:28Z

as i was commenting on a closed PR i figured i would leave a ptr to it here: #41 and continue the discussion here

i spent some time with PEGs in general and OMeta a couple of years back - one of the ideas in OMeta is that strings are not the only thing it will parse - think of an AST that you would like to codify source to source translation rules - but i digress...

OMeta will parse arbitrary objects (strings being one particular type of object) and one of the ideas that supports this is the use of generalized predicates.
so, instead of predicates restricted to 'match this character', 'match this regex', 'match this nonterminal'; you can say 'match this object such that it satisfies this condition' . . . negation drops out of this as a special case as in 'match Var where Var not_a_member_of: keywords' . . . et voila

i believe fundamentally, boolean grammars have stronger theoretical underpinnings than 'let's add semantic predicates' . . . and yet having said that, Alessandro's pragmatic approach of generalized matching/predicates seems to work well in practice
(where's Two Face when you need him :-) )

so, you might consider a rule of the form:
A → αBρβ
where α, β are sequences, possible empty;
B is a symbol (terminal or non terminal)
ρ is a predicate that takes a sequence of parse results representing the parse of A up to B - which is already available as the partially (or completely if B is the final token in the sequence) constructed post-process array representing the results of the parse so far

lets say ρ is represented as {? ... ?}, the lua grammar for Name might look like:
Name -> _name {? function (d) { return !isKeyword(d[0]); } ?} {% function(d) {return {'name': d[0]}; } %}

predicates don't consume any input so they are invoked when 'B' completes and the completion code advances the parse only if the predicate succeeds

thoughts?

YafahEdelman · 2014-11-13T22:28:20Z

Actually, for almost all cases we don't need this stuff. JS has look ahead negation builtin to regexes so we can just allow :!? to be added to strings or something that makes them negative lookahead assertions. Alternatively we can change to a more complicated regex parser (I'm sure there is one out there or we could right one ourselves... nearley of course). As far as the OMeta like idea as far as it look it seems allow adding a function as additional constriants to the grammar. My concern with that would be that it might be heavy handed and it may be better to try to implement more advanced features. Possible we could add arbitary ebnf like tags (with the : prefixing) and make it easy to creat functions and assign symbols to them so adding something like a not keywork ebnf would be easy? Just a thought.

kach · 2014-11-14T04:10:32Z

Ah, Robin, that's really cool stuff.

As it happens, nearley already supports parsing a list of arbitrary objects! In fact, we're cheating a bit by using the subscript syntax (something[5]) to get the nth character of a string. :-) Furthermore, if a rule's nonterminal is an object with a .test field, then instead of checking for equality, it runs the .test with the token as input. That's essentially how regex/charset tokens work—the JavaScript RegEx object has a built-in .test function.

The problem with using these features in compiled parsers is that you'll have to run a tokenizer first, which is sort of painful. It's the reason I shy away from (J | B)ison.

Anyhow.

Your proposal at the end is pretty exciting—I suggested something similar myself to Jacob on IRC. I'm going to look into implementing it this weekend. I'm not convinced of the {? ?} syntax, because for the sake of uniformity I want all included JS to be enclosed in {% %}. Perhaps

a -> word &{% isNotKeyword %} {% … %}

Here, & would be a pseudo-mnemonic for "it's a word and it follows this rule!". Thoughts?

(What interests me is that if I get negation working right, I'll have a parser that gracefully handles CFG intersection, because by DeMorgan we have !(!a || !b) -> a && b.)

rwindelz · 2014-11-14T06:09:56Z

i've been experimenting
in my fork of nearley https://github.com/rwindelz/nearley i've got two branches: https://github.com/rwindelz/nearley/tree/post-process-as-predicate and https://github.com/rwindelz/nearley/tree/predicates-in-parse

in post-process-as-predicate, if the post process function returns null it considers that as a fail and does not generate the subsequent parse state
this is evaluated once the rule is complete

in predicates-in-parse, i've added a predicate type of symbol - this is diffferent from the .test idea in that it inspects the post-processed result of the preceding token
changing it to your suggested &{% p %} is a simple matter of changing one line in the grammar (i may fix that in the next couple of minutes anyways)
this is evaluated immediately following the immediately preceding symbol is completed

cheers

rwindelz · 2014-11-14T06:31:37Z

k - predicates-in-parse now uses the syntax &{% js %}

rwindelz · 2014-11-14T06:48:11Z

application by way of example,

Before:
>node bin/nearleythere.js examples/js/lua.js --input "v = false"
Table length: 10
Number of parses: 2
Parse results:
[ { Block:
     [ { statement: 'assignment',
         body:
          [ [ { name: 'v' } ],
            [ { boolean: false } ] ] } ],
    Return: [] },
  { Block:
     [ { statement: 'assignment',
         body: [ [ { name: 'v' } ], [ { name: 'false' } ] ] } ],
    Return: [] } ]
After:
>node bin/nearleythere.js examples/js/lua.js --input "v = false"
Table length: 10
Number of parses: 1
Parse results:
[ { Block:
     [ { statement: 'assignment',
         body:
          [ [ { name: 'v' } ],
            [ { boolean: false } ] ] } ],
    Return: [] } ]

YafahEdelman · 2014-11-14T15:12:59Z

Once we get the JS parser working well we can get rid of {% and %} and just allow native js. Well be able to tell where there statements began and end. This should work at least for what it returns. It might be easier to implement by just replacing {% and %} with { and }.

kach · 2014-11-15T03:41:45Z

I'm for post-process-as-predicates. My only concern is that somewhere, in either existing or soon-to-be-written grammar, null will inadvertently be returned and that bug will be pretty hard to track down. Is there a way to return a unique or at least sufficiently obscure value?

rwindelz · 2014-11-15T05:06:46Z

for the purpose of this experiment i was using nullas bottom - to indicate that the parse can not return anything, ie parse fails . . . which is different than returning the empty set/empty array as the result of a successful parse - i'm pretty sure i'm abusing the math notion of bottom but it's convenient

i agree, folks may very well decide to use null in spite of what theory says
so, perhaps the thing to do is to have a special value in the Parser object - eg.
Parser.fail = function () { return "fail"; }
nb. you never apply Parser.fail, just check for returnValue === Parser.fail

did you want to think it over some more before i send a PR?

kach · 2014-11-15T06:15:31Z

Can't you just use an empty object? Parser.fail = {};

Feel free to file a PR for post-process-as-predicates, we can discuss further on there.

kach · 2014-11-16T21:44:47Z

Recent pushes rectify this issue—now it's just a matter of carefully patching up javascript.ne. Closing.

kach closed this as completed Nov 16, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JS Grammar Returns Ambiguous Parsings #44

JS Grammar Returns Ambiguous Parsings #44

YafahEdelman commented Nov 12, 2014

kach commented Nov 13, 2014

rwindelz commented Nov 13, 2014

YafahEdelman commented Nov 13, 2014

kach commented Nov 14, 2014

rwindelz commented Nov 14, 2014

rwindelz commented Nov 14, 2014

rwindelz commented Nov 14, 2014

YafahEdelman commented Nov 14, 2014

kach commented Nov 15, 2014

rwindelz commented Nov 15, 2014

kach commented Nov 15, 2014

kach commented Nov 16, 2014

JS Grammar Returns Ambiguous Parsings #44

JS Grammar Returns Ambiguous Parsings #44

Comments

YafahEdelman commented Nov 12, 2014

kach commented Nov 13, 2014

rwindelz commented Nov 13, 2014

YafahEdelman commented Nov 13, 2014

kach commented Nov 14, 2014

rwindelz commented Nov 14, 2014

rwindelz commented Nov 14, 2014

rwindelz commented Nov 14, 2014

YafahEdelman commented Nov 14, 2014

kach commented Nov 15, 2014

rwindelz commented Nov 15, 2014

kach commented Nov 15, 2014

kach commented Nov 16, 2014