Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse_lazy has imprecise semantics #95

Closed
chengsun opened this issue Mar 30, 2017 · 8 comments
Closed

parse_lazy has imprecise semantics #95

chengsun opened this issue Mar 30, 2017 · 8 comments

Comments

@chengsun
Copy link
Contributor

So on attempting to fix a couple of other parser combinators' behaviours on error, I've come to realise that parse_lazy doesn't have the right semantics to correctly support what we want.

Let's look at a motivating example.

let mut emptyok_then_emptyerr = (value(()).message("not expected"), char('x'));

assert_eq!(emptyok_then_emptyerr.parse(State::new("hi")),
    Err(ParseError {
       position: SourcePosition { line: 1, column: 1 },
       errors: vec![Error::Unexpected('h'.into()),
                    Error::Expected('x'.into())],
    }));

// assert fails: actual error contains the message "not expected"

Here we sequentially chain two parsers, the first of which returns EmptyOk, and the second returns EmptyErr. Now this combined tuple parser will return EmptyErr, meaning "no input was consumed, and one of my child parsers failed". However due to the semantics of parse_lazy, this will also imply "the first of my child parsers failed". Hence, when add_error is called the message "not expected" will be added, which is clearly wrong.

What's the fix? parse_lazy should not conflate "no input was consumed" with "first parser failed." So it seems like we have two options:

  • Make parse_lazy semantics tighter. This will probably entail returning a separate flag "was actually lazy", which becomes the thing that specifies whether add_error should be called or not. (One thing which I'm not sure about in this scenario is whether the Consumed flag is still useful or not? Is it currently important anywhere other than for determining laziness?)

  • Keep the semantics of parse_lazy but rewrite some current parsers so they are not lazy. This loses us some amount of speed. Notably the tuple and then combinators will have to become non-lazy.

@Marwes
Copy link
Owner

Marwes commented Mar 30, 2017

Ouch, that is a case I failed to consider :(.

An idea, if we add a method on Parser can_return_empty_ok(&self) -> bool. This would return true for any parser which could return EmptyOk. When add_error is called the sequencing parsers could then check if their first parser can_return_empty_ok and if its true then they call add_error on the second parser (and if the second parser can_return_empty_ok call add_error on the third parser etc. It's a bit boilerplate-y but It might work.

@chengsun
Copy link
Contributor Author

That wouldn't work for parsers that can return both EmptyOk and EmptyErr.

@chengsun
Copy link
Contributor Author

chengsun commented Mar 30, 2017

I like the idea of making parse_lazy return (ConsumedResult<...>, bool), where the errors were deferred (and add_error needs to be called) iff the boolean is true.

Then, in this case tuple(empty_ok, empty_err).parse_lazy() would add the second subparser's error immediately, and return false. tuple(empty_err, empty_err).parse_lazy() would still be lazy and return true, requiring add_error to be called.

@chengsun
Copy link
Contributor Author

Rather, parse_lazy should return the following:

pub enum LazyResult<T, E> {
    ConsumedOk(T),
    EmptyOk(T),
    ConsumedErr(E),
    EmptyErr(E, bool),
}

Where EmptyErr(E, true) indicates that add_error should be used.

@Marwes
Copy link
Owner

Marwes commented Mar 30, 2017

That wouldn't work for parsers that can return both EmptyOk and EmptyErr.

Could you give an example how that wont work?

While I believe your idea might work it would be a breaking change which is unfortunate.

@chengsun
Copy link
Contributor Author

I mean, it would work, just not in the way that you described.

By default all parsers would have to claim can_return_empty_ok() == true for safety. Sequencing parsers can only be lazy if their first child claims false.

Assume a sequencing parser returns EmptyErr. If the sequencing parser's first child says it can return EmptyOk, we have no idea whether it was the first or the second child which caused the error, because either

  1. the first child returned EmptyErr;
  2. the first child returned EmptyOk and then the second child EmptyErr.

@Marwes
Copy link
Owner

Marwes commented Mar 31, 2017

Assume a sequencing parser returns EmptyErr. If the sequencing parser's first child says it can return EmptyOk, we have no idea whether it was the first or the second child which caused the error, because either...

Yep, that's right, it wasn't very thought through idea by me :).

So basically we need some extra state somewhere to say what parser caused the error since parsers returning EmptyOk means that it is possible for the second (or third, etc) parser in a sequence to cause an EmptyErr to be returned (breaking my implicit assumption of EmptyErr only being generated from the first parser). Due to the nature of the library adding this state is definitely going to be breaking regardless of the solution.

@Marwes
Copy link
Owner

Marwes commented Aug 5, 2017

A fix for this is coming in #105. It turned out to have a lot more edge cases than I initially had thought of but I am pretty sure the approach is sound. Only drawback of it is that it can result in slowdowns in some uses of parsers returning EmptyOk

@Marwes Marwes closed this as completed in 54fecc6 Aug 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants