Limiting Parsings for Ambiguous Grammars (Bocages) #53

YafahEdelman · 2015-03-24T22:23:55Z

I know its a long shot and not really something earley was meant for, but are there any methods or optimizations we can implement to deal with ambiguous grammars with exponential parsings? For instance:

num -> num "+" num | [0-9]:+

This seems simple, but nearley can't deal well with large statements of this type because the amount of possible parsings is exponential with the number of plus signs. Could we implement the option of limiting the number of parsings?

The text was updated successfully, but these errors were encountered:

kach · 2015-03-24T22:29:10Z

That's not a well-written grammar, though. You should write num -> num "+" [0-9]:+ | [0-9]:+. If there's an ambiguous grammar, the assumption is that you want all possible parsings (I can think of use-cases where this is a legitimate assumption).

YafahEdelman · 2015-03-25T01:22:04Z

But it would be nice to have there be an option rather than have nearley just make this assumption. Do you have any ideas on how to use this?

kach · 2015-03-25T01:40:32Z

Well, I think it would be better to write a linting service that warns you if your grammar is ambiguous. In fact, that would be really nice! Also, algorithmically nontrivial, because I can't quickly think of a way to statically determine if a grammar is ambiguous.

YafahEdelman · 2015-03-25T12:01:27Z

After a bit of searching around because making such an algorithm did seem quiet interesting I discovered that it was extremely non-trivial, aka, not computable. Although maybe it will be possible to make a heuristic that works in a lot of basic cases, I'll have to research some more.
Edit: Maybe its only not computable for general grammars? I'm not sure, I'll keep researching.

tjvr · 2015-04-10T13:08:54Z

Well, I think it would be better to write a linting service that warns you if your grammar is ambiguous.

You know that's undecidable in general, right? As JacobEdelman points out above.

YafahEdelman · 2015-04-10T14:29:28Z

Although its undecidable in general it is possible to add a mode that allows precedence and will only find one possible parsing. I'm still not sure exactly what kind of speed increase this would give you but I've implemented a (toy) parser that can do it.

tjvr · 2015-04-10T14:56:48Z

Because Earley builds a parse table with all possible states, how can you avoid computing all possible parsings?

I'm not sure what you mean by adding precedence, but that sounds like a bad idea. @Hardmath123 is a fan of using BNF to express operator precedence :-)

YafahEdelman · 2015-04-10T15:32:20Z

A general outline is that you can selectively compute parts of the parse table in an attempt to compute the final parsing. If your parsing was not successful you compute more of the parse table. I'll look at doing that but I do agree that if it does end up causing a dramatic speed increase it won't be worth it.

patrickhuber · 2015-04-14T20:42:40Z

You could build the shared packed parse forest (SPPF) as you go. Elizabeth Scott noted this in her 2008 paper "SPPF-Style Parsing From Earley Recognisers" .

Marpa's Jeffery Kegler uses the term Bocage to denote this structure as you can see here .

kach · 2015-04-14T22:11:15Z

Hmm, @patrickhuber, that looks pretty interesting. From what I can tell, this should allow us to create some sort of lazy iterable data structure where each iteration results in one valid parsing. The obvious advantage is that for very ambiguous grammars with exponential parses, it would be more memory-efficient. On the other hand, building an SPPF takes time, and so there's a speed tradeoff. This sounds like a fun experiment.

(Again, my first choice is that people just learn to write unambiguous grammars—in 99.9% of cases, it's very obvious how to write your grammar unambiguously and you don't actually want to enumerate all possible parsings anyway.)

patrickhuber · 2015-04-14T22:44:18Z

Near the bottom of the paper she creates a inline representation that runs at the same time complexity as the Earley algorithm. Jeffery Kegler stated in the marpa forums that building the parse tree using binary, reusable nodes follows the same complexity as recognition. On trivial samples, you may notice a very small bit of overhead, but as the size of the parse grows, the overhead will be inconsequential.

I agree that the easy option is to rewrite the grammar, but part of the appeal of the Earley algorithm is its ability to handle ambiguity with ease.

kach · 2015-04-14T22:48:51Z

Whoo, okay, I'm sold. Time to dig into some papers and find someone smart enough to teach me how to implement it (@JacobEdelman ? @blob8108 ? @rwindelz ?).

YafahEdelman · 2015-04-14T22:50:01Z

I'll look it over later today/tomorrow.
On Apr 14, 2015 6:48 PM, "Hardmath123" notifications@github.com wrote:

Whoo, okay, I'm sold. Time to dig into some papers and find someone smart
enough to teach me how to implement it (@JacobEdelman
https://github.com/JacobEdelman ? @blob8108
https://github.com/blob8108 ? @rwindelz https://github.com/rwindelz
?).

—
Reply to this email directly or view it on GitHub
#53 (comment).

patrickhuber · 2015-04-15T03:49:40Z

If it helps, I have a implementation here in C#. https://github.com/patrickhuber/Pliant/tree/shared-packed-parse-forest/libraries/Pliant. It doesn't currently work for Leo Items. Main body of code is in the PulseRecognizer.cs.

kach · 2015-04-15T04:03:10Z

Ah, nice to know. This will be a handy reference (the collection of links in your README is also very useful!).

(We haven't implemented Leo optimization yet, so that's not an issue. I should really go read about that soon.)

erezsh · 2016-06-13T13:42:03Z

Personally, I would be very happy if there was some boolean parameter I could give the parser (say, "disable_ambiguity"), that would throw an exception if detecting an ambiguous input.

Obviously we shouldn't write ambiguous grammars if we don't want them, but we also shouldn't write bugs in general, and that still happens some time. And some grammars can be pretty big. It'd be nice to have a graceful exit, instead of a hang / memory error.

kach · 2016-06-13T16:33:11Z

See, the problem with this is that since nearley is streaming, you never know if the input is "ambiguous" because the ambiguity might be resolved later. Let me give you an example:

main -> cow_or_colorado | cone_or_colorado
cow_or_colorado -> "CO" "W":?
cone_or_colorado -> "CO" "NE":?

Suppose I gave this parser "C" and "O". Now, it thinks you could either have a cow_or_colorado or a cone_or_colorado—but it can't throw an error yet! This is because if you feed the parser "W" or "NE", the ambiguity will get resolved in favor of the former or latter, respectively. Nearley never knows when you're done giving it input.

This means, by the way, that you could always detect ambiguity on your end by looking at the results array and ensuring that its length is exactly 1.

erezsh · 2016-06-22T13:49:04Z

Well, the reason I ask is that I wrote a python version of nearley (just the parser) for my parsing library, and I want to be able to protect my users from writing ambiguous grammars that practically never end. Right now it's very hard to discover exactly which rule is ambiguous.

You can see it here btw: https://github.com/erezsh/plyplus/blob/master/plyplus/pearley.py
It works very well, but unfortunately it's still very slow (10x time compared to PLY).

tjvr · 2016-06-22T14:11:08Z

@erezsh On the subject of optimisation: I wrote a python version a while ago, it might have some useful optimisations

kach · 2016-06-22T17:01:04Z

@erezsh pearley looks really cool! Very exciting.

I notice most of the code seems to be a direct port of nearley's—in this case, I think I can answer your comment on line 67 (https://github.com/erezsh/plyplus/blob/master/plyplus/pearley.py#L67). Let me know if you'd like that.

Again, as I've said, you can't detect ambiguity automatically. Not possible. The solution is to inspect the output with small test cases—then it becomes really easy to see, intuitively, what is getting parsed in multiple ways. Perhaps you can figure out a clever way to automate this process—if you do, let me know and I would be happy to include it in the nearley package. :-)

wizzard0 · 2017-02-24T16:33:39Z

My 5 cents: being able to enumerate exponential number of parses is very important and is what makes nearley, um, the parser of choice (for NLP and for generated grammars at least).
Of course it makes it slower at times... but hey, universality is almost always at odds with performance.

btw the grammar num -> num "+" num | [0-9]:+ does not in fact generate ambiguous parses in playground at least, how can get my 2^n parses there? :)

kach · 2017-02-24T22:01:07Z

It's certainly supposed to show all parses… I think the author is using an outdated version of nearley, a version which had a bug that suppressed ambiguous output in some cases (including your case). Not sure, though, because the client JS is packed.

kach · 2017-08-24T18:28:45Z

Closing because #163 can continue this discussion.

kach added the enhancement label Mar 24, 2015

kach changed the title ~~Limiting Parsings for Ambiguous Grammars~~ Limiting Parsings for Ambiguous Grammars (Bocages) Apr 15, 2015

kach added the optimization label Apr 15, 2015

tjvr mentioned this issue Feb 16, 2017

Use SPPF-style structure #163

Open

kach mentioned this issue Feb 19, 2017

Notes on ambiguity #168

Open

kach closed this as completed Aug 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limiting Parsings for Ambiguous Grammars (Bocages) #53

Limiting Parsings for Ambiguous Grammars (Bocages) #53

YafahEdelman commented Mar 24, 2015

kach commented Mar 24, 2015

YafahEdelman commented Mar 25, 2015

kach commented Mar 25, 2015

YafahEdelman commented Mar 25, 2015

tjvr commented Apr 10, 2015

YafahEdelman commented Apr 10, 2015

tjvr commented Apr 10, 2015

YafahEdelman commented Apr 10, 2015

patrickhuber commented Apr 14, 2015

kach commented Apr 14, 2015

patrickhuber commented Apr 14, 2015

kach commented Apr 14, 2015

YafahEdelman commented Apr 14, 2015

patrickhuber commented Apr 15, 2015

kach commented Apr 15, 2015

erezsh commented Jun 13, 2016

kach commented Jun 13, 2016

erezsh commented Jun 22, 2016

tjvr commented Jun 22, 2016

kach commented Jun 22, 2016

wizzard0 commented Feb 24, 2017 •

edited

Loading

kach commented Feb 24, 2017

kach commented Aug 24, 2017

Limiting Parsings for Ambiguous Grammars (Bocages) #53

Limiting Parsings for Ambiguous Grammars (Bocages) #53

Comments

YafahEdelman commented Mar 24, 2015

kach commented Mar 24, 2015

YafahEdelman commented Mar 25, 2015

kach commented Mar 25, 2015

YafahEdelman commented Mar 25, 2015

tjvr commented Apr 10, 2015

YafahEdelman commented Apr 10, 2015

tjvr commented Apr 10, 2015

YafahEdelman commented Apr 10, 2015

patrickhuber commented Apr 14, 2015

kach commented Apr 14, 2015

patrickhuber commented Apr 14, 2015

kach commented Apr 14, 2015

YafahEdelman commented Apr 14, 2015

patrickhuber commented Apr 15, 2015

kach commented Apr 15, 2015

erezsh commented Jun 13, 2016

kach commented Jun 13, 2016

erezsh commented Jun 22, 2016

tjvr commented Jun 22, 2016

kach commented Jun 22, 2016

wizzard0 commented Feb 24, 2017 • edited Loading

kach commented Feb 24, 2017

kach commented Aug 24, 2017

wizzard0 commented Feb 24, 2017 •

edited

Loading