Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JS Grammar Returns Ambiguous Parsings #44

Closed
YafahEdelman opened this issue Nov 12, 2014 · 12 comments
Closed

JS Grammar Returns Ambiguous Parsings #44

YafahEdelman opened this issue Nov 12, 2014 · 12 comments

Comments

@YafahEdelman
Copy link
Contributor

Some grammars will have ambigous parsings due to things like
return [2];
being parseable as both return the list [2] or the element at index 2 of the array return.
Nearley may need new features to circmvent this problem easily.

@kach
Copy link
Owner

kach commented Nov 13, 2014

Yeah, see Robin's comment about keyword/name separation.

@rwindelz
Copy link
Contributor

as i was commenting on a closed PR i figured i would leave a ptr to it here: #41 and continue the discussion here

i spent some time with PEGs in general and OMeta a couple of years back - one of the ideas in OMeta is that strings are not the only thing it will parse - think of an AST that you would like to codify source to source translation rules - but i digress...

OMeta will parse arbitrary objects (strings being one particular type of object) and one of the ideas that supports this is the use of generalized predicates.
so, instead of predicates restricted to 'match this character', 'match this regex', 'match this nonterminal'; you can say 'match this object such that it satisfies this condition' . . . negation drops out of this as a special case as in 'match Var where Var not_a_member_of: keywords' . . . et voila

i believe fundamentally, boolean grammars have stronger theoretical underpinnings than 'let's add semantic predicates' . . . and yet having said that, Alessandro's pragmatic approach of generalized matching/predicates seems to work well in practice
(where's Two Face when you need him :-) )

so, you might consider a rule of the form:
A → αBρβ
where α, β are sequences, possible empty;
B is a symbol (terminal or non terminal)
ρ is a predicate that takes a sequence of parse results representing the parse of A up to B - which is already available as the partially (or completely if B is the final token in the sequence) constructed post-process array representing the results of the parse so far

lets say ρ is represented as {? ... ?}, the lua grammar for Name might look like:
Name -> _name {? function (d) { return !isKeyword(d[0]); } ?} {% function(d) {return {'name': d[0]}; } %}

predicates don't consume any input so they are invoked when 'B' completes and the completion code advances the parse only if the predicate succeeds

thoughts?

@YafahEdelman
Copy link
Contributor Author

Actually, for almost all cases we don't need this stuff. JS has look ahead negation builtin to regexes so we can just allow :!? to be added to strings or something that makes them negative lookahead assertions. Alternatively we can change to a more complicated regex parser (I'm sure there is one out there or we could right one ourselves... nearley of course). As far as the OMeta like idea as far as it look it seems allow adding a function as additional constriants to the grammar. My concern with that would be that it might be heavy handed and it may be better to try to implement more advanced features. Possible we could add arbitary ebnf like tags (with the : prefixing) and make it easy to creat functions and assign symbols to them so adding something like a not keywork ebnf would be easy? Just a thought.

@kach
Copy link
Owner

kach commented Nov 14, 2014

Ah, Robin, that's really cool stuff.

As it happens, nearley already supports parsing a list of arbitrary objects! In fact, we're cheating a bit by using the subscript syntax (something[5]) to get the nth character of a string. :-) Furthermore, if a rule's nonterminal is an object with a .test field, then instead of checking for equality, it runs the .test with the token as input. That's essentially how regex/charset tokens work—the JavaScript RegEx object has a built-in .test function.

The problem with using these features in compiled parsers is that you'll have to run a tokenizer first, which is sort of painful. It's the reason I shy away from (J | B)ison.

Anyhow.

Your proposal at the end is pretty exciting—I suggested something similar myself to Jacob on IRC. I'm going to look into implementing it this weekend. I'm not convinced of the {? ?} syntax, because for the sake of uniformity I want all included JS to be enclosed in {% %}. Perhaps

a -> word &{% isNotKeyword %} {% … %}

Here, & would be a pseudo-mnemonic for "it's a word and it follows this rule!". Thoughts?

(What interests me is that if I get negation working right, I'll have a parser that gracefully handles CFG intersection, because by DeMorgan we have !(!a || !b) -> a && b.)

@rwindelz
Copy link
Contributor

i've been experimenting
in my fork of nearley https://github.com/rwindelz/nearley i've got two branches: https://github.com/rwindelz/nearley/tree/post-process-as-predicate and https://github.com/rwindelz/nearley/tree/predicates-in-parse

in post-process-as-predicate, if the post process function returns null it considers that as a fail and does not generate the subsequent parse state
this is evaluated once the rule is complete

in predicates-in-parse, i've added a predicate type of symbol - this is diffferent from the .test idea in that it inspects the post-processed result of the preceding token
changing it to your suggested &{% p %} is a simple matter of changing one line in the grammar (i may fix that in the next couple of minutes anyways)
this is evaluated immediately following the immediately preceding symbol is completed

cheers

@rwindelz
Copy link
Contributor

k - predicates-in-parse now uses the syntax &{% js %}

@rwindelz
Copy link
Contributor

application by way of example,

Before:
>node bin/nearleythere.js examples/js/lua.js --input "v = false"
Table length: 10
Number of parses: 2
Parse results:
[ { Block:
     [ { statement: 'assignment',
         body:
          [ [ { name: 'v' } ],
            [ { boolean: false } ] ] } ],
    Return: [] },
  { Block:
     [ { statement: 'assignment',
         body: [ [ { name: 'v' } ], [ { name: 'false' } ] ] } ],
    Return: [] } ]
After:
>node bin/nearleythere.js examples/js/lua.js --input "v = false"
Table length: 10
Number of parses: 1
Parse results:
[ { Block:
     [ { statement: 'assignment',
         body:
          [ [ { name: 'v' } ],
            [ { boolean: false } ] ] } ],
    Return: [] } ]

@YafahEdelman
Copy link
Contributor Author

Once we get the JS parser working well we can get rid of {% and %} and just allow native js. Well be able to tell where there statements began and end. This should work at least for what it returns. It might be easier to implement by just replacing {% and %} with { and }.

@kach
Copy link
Owner

kach commented Nov 15, 2014

I'm for post-process-as-predicates. My only concern is that somewhere, in either existing or soon-to-be-written grammar, null will inadvertently be returned and that bug will be pretty hard to track down. Is there a way to return a unique or at least sufficiently obscure value?

@rwindelz
Copy link
Contributor

for the purpose of this experiment i was using nullas bottom - to indicate that the parse can not return anything, ie parse fails . . . which is different than returning the empty set/empty array as the result of a successful parse - i'm pretty sure i'm abusing the math notion of bottom but it's convenient

i agree, folks may very well decide to use null in spite of what theory says
so, perhaps the thing to do is to have a special value in the Parser object - eg.
Parser.fail = function () { return "fail"; }
nb. you never apply Parser.fail, just check for returnValue === Parser.fail

did you want to think it over some more before i send a PR?

@kach
Copy link
Owner

kach commented Nov 15, 2014

Can't you just use an empty object? Parser.fail = {};

Feel free to file a PR for post-process-as-predicates, we can discuss further on there.

@kach
Copy link
Owner

kach commented Nov 16, 2014

Recent pushes rectify this issue—now it's just a matter of carefully patching up javascript.ne. Closing.

@kach kach closed this as completed Nov 16, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants