-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong(?) result on some grammars with infinite results #35
Comments
Hey, and thanks for your report! You're right, unless the current behaviour is documented this should probably be considered as a bug. The reason for the current behaviour is that the epsilon derivations of a rule The first idea that might spring to mind then is to tie the knot when computing epsilon derivations and assume that So instead I'm experimenting with an idea of productivity: While traversing the production looking for epsilon derivations, keep track of whether we've produced any results yet. When encountering A problem with this is that it's potentially order-dependent:
would return
would return Maybe we could reorder the alternatives to make also this return an infinite list but I'm worried that that might not work in general, because you don't know what order to use if there are several productive alternatives. Do you have any input or ideas? Do you have any more good test cases or important use cases? Also, what do you mean by "I don't think it's possible to have this work and ... not produce an infinite list of results"? If the first one should return Cheers! |
That's what I mean It should be possible to expose multiple variants of rule:
I'm working on a rewrite of the parser where all but 2 should be easy to implement. Any ideas for other variants? My rewrite doesn't compute epsilon derivations or interpret
including Monad and a So far, computing epsilon derivations has been avoidable but I think it might be unavoidable for infinite results |
Hmm, I'm not too fond of the idea of exposing several variants of I think I've come up with something better than what I outlined in my previous post. What do you think of the following?
This scheme supports left-recursion without looping provided the rule doesn't have any epsilon derivations, and additionally makes your two grammars produce infinite results. It makes So to me it seems promising --- I just have to convince myself that this is the right thing to do in general. I have an implementation here, and the commit before has an implementation of the scheme from my previous message. Your rewrite sounds intriguing. I'd be interested to hear how you're handling things like left-recursion in it! :) |
A wip version of my rewrite is at https://gist.github.com/fread2281/256e47aff8903d7da98d9ea6b4cff63f. A couple things:
|
Here's a problem(?) with my implementation, that I'm trying to figure out how to fix. I think Earley has the same problem? If not, it'd be very helpful if you could try to explain how Earley avoids it. I've benchmarked my implementation and Earley and they both have the same order of magnitude of function calls (of the most commonly called functions, but it's still within Can merge results at same rule w/ diff start pos & same end pos! But this still adds too many callbacks to Practical, General Parser Combinators (Meerkat) avoids this by using a HashSet of Conts in rules. However, this might not cause worst case complexity to go over cubic with fully binarized grammars. According to the meerkat paper,
I think it still affects non-worst case complexity though :( I don't know if this is avoidable without using StableNames or similar to compare Conts |
If I understand you correctly, the question is if we can merge the continuations of invocations of the same rule that start at different positions and end at the same position. Is this what you're asking? This library doesn't do that, but it doesn't seem to me to be a sound optimisation. Maybe I misunderstood what you meant, though. |
There's two ways to deal with the issue in the example:
In the example, 2 would be merging the continuations after they get added to The Meerkat paper does 2, but I think 1 would work too. I think they should have the same asymptotics, except with a Monad instance, where 1 can do better. |
Maybe you could tie the knot, but then do a breadth first search instead of a depth first search. That way, both |
Yeah, that might work actually! |
Also, an idea on how to do that efficiently. First you create a value representing a lazy binary tree, and then you search that binary tree using iterative deepening search. Then, if the compiler fuses the binary tree, it would be equivalent to an imperative iterative deepening search. If it chooses to keep the binary tree in memory instead, it would be equivalent to an imperative breadth first search (except that you are scanning the tree itself in a iterative deepening manner). An optimizing compiler (like GHC) can choose which one is better based on the cost of recalculating the values in the tree, I think. |
produces just
1
, not[1,2,3...]
I don't think it's possible to have this work and
not produce an infinite list of results
I think this is a sensible choice because it currently can't deal with infinite results, but it should be documented somewhere. Theoretically it should be possible to return an infinite lazy list.
The text was updated successfully, but these errors were encountered: