-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
out-of-order results #15
Comments
Agreed, this would be nice to have. I will think about it some more, but I don't think it can be done easily given the current implementation. Especially now that the fix of #14 has been pushed, since that means we're sort of partitioning rules into nullable and non-nullable parts. We would have to undo that partitioning after the fact. |
If we're not able to do this we should update the documentation to make it clear that there are no guarantees about the order of the results. |
I was thinking about taking a look at this today (though I doubt I'll make any progress). it seems that (in (I know this is vague, but) would it be better to (1) actually compute results in the correct order, or (2) manually thread an alternative's "list of indices" from its position in an (I've been working off of 0.10.0.1) |
I would prefer (1). Setting aside the complication that is the partitioning into nullables and non-nullables, I think you're on the right track. Continuations are added to a list at various points in time and later consumed in the order they appear in the list which means that they're processed last first. They're in stacks but they should have been in queues. The list of states for the next position has the same problem. A few well-placed |
i tried blindly reordering the state list a while back ( master...sboosali:master ) but couldn't get it to work. if we want this, (having learned more about chart parsing) i think we have to ensure that the expansion of the nonterminals (earley's "predict" rule) is performed in a top-down and left-to-right way. i think it's top-down already, but not left-to-right. same with terminals (earley's "scan" rule). i think predict must be applied before scan, but not sure yet. as you say, keeping separate sets of states complicates this; and, i cant wrap my head around how this works with the "build all the edges, then advance a token" approach of this library. on paper, it seemed like you might need to keep track of earlier states to "pop" back to them, but i could be wrong. another benefit of the depth-first-search approach / in-order results for ambiguous grammars is: we can return the first result long before (unless there's enough sharing, in which case it doesn't matter) the rest of the results, and this first result is likely to be what the user expects. (sorry for the possibly incoherent braindump, wanted to get this down before i forgot) |
This isn't possible in every case, consider
The first result would be |
I think the ordering that we're (somewhat implicitly) talking about here is what we might call "left-to-right grammar order" (basically that the results from parsing Of course, there may be other orderings that would sometimes be convenient. It looks like you'd like to order the results by recursion depth. I don't quite understand what you're referring to by calling the grammar order bad or impossible though. I did do some experiments with ordered results a while back, and I think I have a working implementation, but it affected performance in a very bad way because I couldn't find a way to do it without a bunch of list reversals. |
Olle, I want that so much! Can you share it? I don't care if it's slow or buggy because:
|
fread2281, Yes, that's exactly the behavior I want. (but for less-contrived grammars, whose recursive productions consume input). |
The reason I call that behavior bad is because it makes the parser library non-total. I want to use it in agda and my toy agda-like language, so that's a problem. Total Parser Combinators forbids that grammar with the type system, I think it's possible to do similar in haskell. |
Only as an option. The fact that Earley parses any CFG is also why I use it On Tue, Oct 4, 2016 at 1:31 AM, Valerie Charbonneau <
(this message was composed with dictation: charitably interpret typos)Sam |
@sboosali: I pushed it here. Feel free to play around with it. There may be bugs. @fread2281: Regarding totality: It's still up to the user of the library to force the result, just like it's up to the user to force the whole result list in case it's infinite. The parser library, if written correctly, should still be terminating. Basically, if I feed in a grammar that has infinite/circular solutions, I would expect a correct parser library to return those solutions. So I don't see this as bad behaviour, but I can see that it's not always convenient. :) The situation in Agda is different since it doesn't allow normal data to be circular. |
Thanks!! On Tue, Oct 4, 2016 at 12:17 PM, Olle Fredriksson notifications@github.com
(this message was composed with dictation: charitably interpret typos)Sam |
Just found out from Guillaum on #haskell that the tests on that branch are failing, so it might need some work still. :/ |
Yeah, my own tests failed too. You said that epsilons were expanded separately, right? On Friday, October 7, 2016, Olle Fredriksson notifications@github.com
(this message was composed with dictation: charitably interpret typos)Sam |
Yeah, epsilons are calculated separately to avoid loops in the more general parsing machinery. That's what the |
Does this branch work for you? |
for On Sun, Oct 9, 2016 at 3:43 PM, Olle Fredriksson notifications@github.com
(this message was composed with dictation: charitably interpret typos)Sam |
I'll run it on my (more complicated/ambiguous) dictation grammars too, but On Sun, Oct 9, 2016 at 3:59 PM, Spiros Boosalis samboosalis@gmail.com
(this message was composed with dictation: charitably interpret typos)Sam |
I see a few more swapped insertions, can you explain the changes? I'm looking at OrderedResults...OrderedResults2 |
Nice to hear that it works. :) Take how the list of The list of continuations is similar: the processing order gets reversed if we just cons to it. |
Cool. I'd tried blindly reversing the order months ago, but that hadn't worked. |
Hello. I did some tests and I don't know if I'm wrong in my understanding or if something is broken. The following code: data Test = T0 Test | T1 Test | T2 deriving (Show)
simpleParser = mdo
draw <- E.rule $
T0 <$> (sym "rot" *> draw)
<|> T1 <$> (draw <* (sym "XX"))
<|> T2 <$ sym "ident"
let tok p = (many $ E.satisfy isSpace) *> p
sym x = tok $ E.list x
return draw
parse' = E.fullParses (E.parser simpleParser) "rot ident XX" For me, it should returns as first results
I'm still getting |
I can add that, in this context, the result order is not influenced by any change in the ordering. |
Okay, I ported my actual grammar to
https://github.com/sboosali/Earley/blob/master/examples/OrderedExtremelyAmbiguous.hs#L170 I got to go eat, but I (1) still need to double check that my messy grammar is written in a "left to right" way, i.e. that the finite productions come to the left of the wildcard productions; and (2) will work on a minimal reproduction. |
@guibou: I think your expectations are correct, so it seems that recursive rules are not ordered properly yet. @sboosali: Maybe you've run into the same problem as @guibou, but a minimal repro would help immensely just in case it's not. Thanks to both of you! I ran some measurements on the few benchmarks we have with an optimised version of the OrderedResults2 branch and it seems that it takes roughly 50% longer to parse in order. |
If it's relevant, my grammar is non-recursive ( And, no thank you Olle! |
OK, here are the results of some more investigations on the branch: I think there is a well-defined ordering that's being followed in the case of recursive rules, and I'm trying to pinpoint what it is. A problem is that the order of the results of an ambiguous grammar doesn't seem to be considered in the literature so there's no established nomenclature for it (like there is for e.g. leftmost/rightmost derivations etc). We seem to be favouring associating lexically to the left, and then the left-to-right grammar order. That explains @guibou's result where we have the following grammar:
When parsing the input
Parsing the string I don't currently see a way to change this ordering. Note that it's possible to write unambiguous versions of both of the above grammars, i.e. something like:
and
In my opinion this is a better option, at least when e.g. dealing with programming languages that generally have simple grammars. The exception is when you want the ambiguity or when you just have an exceptionally tricky grammar. Unambiguous grammars can be parsed faster. Anyway, this ordering needs more investigation, tests, and documentation. |
I've added info about the order of results to the docs so this is no longer a bug. 😁 |
below,
edit
has nullable rules, andinsertion
has a wildcard. the parse results are all present, but some alternatives that appear before other alternatives ("lexically" in the grammar), appear later in the results.e.g. given (full script is below):
we get:
rather than:
I don't know if early parsers specify the order of the results for an ambiguous grammar, but it's how other parser libraries behave. and either way, would be nice to have: my hack is to postprocess the set of results, and rank them. this is messier (as ranking can be "nonlocal"), and maybe slower, than the first result being the expected one.
to reproduce:
The text was updated successfully, but these errors were encountered: