Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors are often suppressed #160

Closed
asajeffrey opened this issue Jan 5, 2016 · 11 comments
Closed

Errors are often suppressed #160

asajeffrey opened this issue Jan 5, 2016 · 11 comments
Milestone

Comments

@asajeffrey
Copy link

If I create a parser which uses any of the choice combinators (alt!, opt!, many0!, etc.) then errors are suppressed from the child parser. For example:

named!(pub foo< bool >,
    chain!(
        tag!("a") ~
        error!(nom::ErrorKind::Custom(42),tag!("b")) ,
        || { true }
    )
);

named!(pub foos< Vec<bool> >,
    delimited!(
        tag!("("),
        many0!(foo),
        tag!(")")
    )
);

If you feed this the input "(ac)" then the error 42 from foo is suppressed, and instead you get the error when tag!(")") fails to match "ac)". Here are tests I'd expect to pass, but which don't:

#[test]
fn test_ok() {
    match foos(b"(abab)") {
        nom::IResult::Done(_,result) => assert_eq!(result,vec![true,true]),
        res => panic!("Oops {:?}.",res)
    }
}

#[test]
fn test_err() {
    match foos(b"(ac)") {
        nom::IResult::Error(nom::Err::Position(kind,_)) => assert_eq!(kind,nom::ErrorKind::Custom(42)),
        res => panic!("Oops, {:?}",res)
    }
}

This is a problem with backtracking: there's no way to stop backtracking outside of the current named! function. The program behaves properly if we inline foo into foos, but this doesn't scale too well.

@Geal
Copy link
Collaborator

Geal commented Jan 13, 2016

Hi!

Backtracking is an essential feature of parser combinators, and the error! combinator is actually a non standard one (I borrowed the idea from the cut operator in Prolog).

In the reduced test case you provide, many0! uses an error in the underlying parser as stopping condition, but cannot discriminate the error type.

I'm thinking of adding a new combinator that could work like that:

named!(pub foos< Vec<bool> >,
    delimited!(
        tag!("("),
        many0!(
          cut_if!(ErrorKind::Custom(42) | ErrorKind::Custom(0), foo)
        ),
        tag!(")")
    )
);

The idea is to provide a list of patterns (so things like Error::Custom(_) should work), and if the returned error matches the pattern, stop backtracking there. It is a bit cumbersome, but that's the only way I see right now.

There's also the problem of the format of patterns. Here, I match on specific error codes, but nom parsers can return an error chain instead. As an example, in the second test case, the error returned by foo is Error(NodePosition(Custom(42), [99, 41], Position(Tag, [99, 41]))), because error! wraps the underlying error. I could make the new combinator match on complete error chains, but it can be annoying to write.

BTW, I saw the parser combinators library you're developing, it looks really cool! The use of associated types is tricky :)
Using types instead of macros was my first approach with nom, but it was too complex to maintain, since the slightest modification could trigger hundreds of errors everywhere. I'd be happy to see where your approach is going :)

@asajeffrey
Copy link
Author

Indeed, controlling backtracking is one of the trickier areas of parser libraries. Something like prolog cut is very useful to indicate commitment points, especially for error reporting.

Your proposal looks interesting, but I'm a bit concerned that it'll end up being everywhere, since the default is that errors are ignored, so if you want them propagated they need a cut. Perhaps errors could be divided into backtracking ones (suppressed by default) and aborting ones (propagated by default).

Over in https://github.com/asajeffrey/wasm/blob/master/src/parser/combinators.rs#L46, I made failure explicitly say whether backtracking was allowed or not, but this would be quite a significant rewrite to the code.

The way lifetime polymorphism was being used was rewritten by @eddyb, who managed to remove all the associated types, but still make everything lifetime polymorphic, and allow polymorphic functions like map. The library was heavily influenced by nom, as you can probably tell!

@asajeffrey
Copy link
Author

I split off my parser combinator library, and released it as a crate (https://crates.io/crates/parsell/).

It was interesting seeing a slightly different bit of the design space to nom, it seems like there's a trade-off for streaming parsers between minimizing memory allocation and allowing backtracking. It looks to me like parsell went down the route of reducing backtracking by only handling LL(1) parsers, whereas nom allows arbitrary backtracking at the cost of more space usage. Does this seem like a fair summary to you?

@Geal
Copy link
Collaborator

Geal commented Feb 23, 2016

Hey, I have given some thought to your problem, and found a way. It is a bit hackish, and I know you got your own library to worry about now, but you might find it interesting.

Take a look at the gist

The basic idea is to change the result type of parsers from nom::IResult<I,O,E> to std::result::Result<nom::IResult<I,O,E>, nom::Err<I,E>>. If the parser returns Ok(something) continue as usual, backtrack normally if there's an error. But if it is std::result::Result::Err(nom::Err(something)) then stop backtracking right there.

It is doable by replacing named! with n! which will replace the result types and wrap everything with Ok. That's a large part of the code because named! has so many syntaxes :/

Then you have the cut! macro, which works like nom's error! combinator, but will wrap with Ok for Incomplete or Done, and wrap with Err and cut if there's an error.

At last, you have c!, used to wrap any sub function call. This will check the result, and cut backtracing if there's an Err. You can see it used at line 106.

So, as I said, it is a bit hackish, but the ugly macros can be stored away from the eyes, it does not change the code much, and the overhead of Result will not be too big.

What do you think?

@lyze lyze mentioned this issue Jun 6, 2016
59 tasks
@sam701
Copy link

sam701 commented Jul 31, 2016

I have the same issue as @asajeffrey. The solution proposed by @Geal works well but since you have to wrap into cut! almost every option in alt!, opt!, many0!, etc., it might become tedious for deep grammars.

What do you think of introducing a kind of ErrorKind::Fatal that will cause alt!, opt!, many0! to propagate such a fatal error to the root?

@Yamakaky
Copy link

Yamakaky commented Sep 18, 2016

Same problem. error! is pretty useless if you can't use it with a combinator. At least it should be written in the doc.

@Geal
Copy link
Collaborator

Geal commented Nov 1, 2016

in the context of #356, if I changed the error type, I could probably fit in there a "non backtrackable error" that can contain the same thing as a regular error.

@shepmaster
Copy link
Contributor

shepmaster commented Nov 15, 2016

Over in peresil, I believe I've had similar issues and ideas. Two main points of interest:

  1. I had a trait Recoverable that indicated whether backtracking was feasible.
  2. I keep and reused a Failures struct. Every encountered failure would be stashed in a Vec. If a new error came along that occurred later in the input, that would replace all existing errors. Errors that occurred at the same place would add to the list, and errors that occurred earlier would be ignored. This produced reasonable error messages, as well as giving a way to produce "expected X or Y or Z at position W" style messages.

@Geal
Copy link
Collaborator

Geal commented Nov 15, 2016

there is an error accumulation feature in nom, but it's now optional, since a lot of formats won't use it and it makes parsers slower. From what I saw, defining whether an error is recoverable depends on where it was generated in the parsing tree.

@shepmaster
Copy link
Contributor

an error accumulation feature in nom

That's wonderful! Perhaps this is just an inability of me to find the appropriate documentation or macros. I can only find references to keeping around a single error (as part of an error chain, perhaps) when reading the error management. Searching through master's API docs for accum doesn't seem to yield anything either.

Given a parser like this:

named!(part<&str, &str>, recognize!(do_parse!(
    add_error!(ErrorKind::Custom(10), tag!("b")) >>
    add_error!(ErrorKind::Custom(11), tag!("c")) >>
    ()
)));

named!(example<&str, &str>, recognize!(do_parse!(
    opt!(add_error!(ErrorKind::Custom(1), tag!("a"))) >>
    opt!(add_error!(ErrorKind::Custom(2), part)) >>
    add_error!(ErrorKind::Custom(3), tag!("z")) >>
    ()
)));

assert!(example("az").is_done());
assert!(example("bcz").is_done());
assert!(example("abcz").is_done());
assert!(example("z").is_done());

When given the input t, I'd like to be able to generate something that I can then display to the user as expected "a", "bc", or "z" (I'd also be ok with "b" instead of "bc"). Right now, it generates the error Error(Custom(3)), which maps to the tag("z"), just as mentioned in the original post.

When given the input bt, however, the parser knows that it has descended into a rule, and should be able to report expected "c"

Would you kindly point me to some examples or implementation code I could read to achieve this? I'd be happy to provide a draft of what I implement for the error handling document.

defining whether an error is recoverable depends on where it was generated in the parsing tree

That's believable. I might have "solved" that by making separate error variants for errors that occur in one location or the other, trading off annotating the grammar in one way for another.

@Geal Geal added this to the 4.0 milestone Sep 6, 2017
@Geal
Copy link
Collaborator

Geal commented Oct 3, 2017

so now there are two ways to return an error, one of those is called "failure", and will stop going through other branches in alt, and prevent many* from returning a value when they should actually fail. And that error version is propagated from one function call to the next. This can be tested in the "result" branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants