Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread state through nom combinators #1419

Open
joshrule opened this issue Sep 30, 2021 · 11 comments
Open

Thread state through nom combinators #1419

joshrule opened this issue Sep 30, 2021 · 11 comments
Milestone

Comments

@joshrule
Copy link

Prerequisites

Here are a few things you should provide to help me understand the issue:

  • Rust version : rustc 1.55.0 (c8dfcfe04 2021-09-06)
  • nom version : 7.0.0
  • nom compilation features used: None

Test case

I'm updating a parser from nom 4 to nom 7. I'm parsing term rewriting systems, which involves parsing a lot of first-order terms. These first-order terms can contain function symbols and constants, collectively called operators, as well as variables. The identity of each operator and variable is maintained by assigning it a numeric id, and these ids are established while parsing. The parser maintains a map between strings and ids.

In nom 4, I created a parser struct and used combinations of the method! and call_m! macros to build up the id map. In nom 7, these macros no longer exist, nor do corresponding functions.

My first attempt was to translate something like:

method!(application<Parser<'a>, CompleteStr, Term>, mut self,
        alt!(call_m!(self.standard_application) |
             call_m!(self.binary_application))
);

into

fn application(&mut self, s: &'a str) -> IResult<&'a str, Term> {
    alt((
        |x: &'a str| self.standard_application(x),
        |x: &'a str| self.binary_application(x),
    ))(s)
}

but the compiler helpfully reminds me that this requires simultaneous unique access to self in both closures.

Perhaps nom could provide the option to fold some sort of mutable state type through the combinators so that I could write:

fn application(&mut self, s: &'a str) -> IResult<&'a str, Term> {
    alt_with_state(self, (
        |x: &'a str, this /*: &mut Self */| this.standard_application(x),
        |x: &'a str, this| this.binary_application(x),
    ))(s)
}
@joshrule
Copy link
Author

Additional discussion on the Rust Discourse.

@Stargateur
Copy link
Contributor

I'm think you are doing it wrong, a parser should not require an external state like that. That doesn't make sense that if a parser fail it mutate the state. You would mutate the state in the top level parser in function of what your subparser return.

I'm opposite to this feature, FnMut already allow state and sharing state between parser in alt is bad design.

@joshrule
Copy link
Author

joshrule commented Oct 1, 2021

Thanks for responding. I'm not sure exactly how to apply your advice. I'm trying to parse text directly into first-order terms whose symbols are identified with integers. Mutating state as I go along to build a map between symbol names and integer IDs seems like a natural approach that avoids multiple passes through the data. How else would you suggest I solve this problem?

@Stargateur
Copy link
Contributor

Thanks for responding. I'm not sure exactly how to apply your advice. I'm trying to parse text directly into first-order terms whose symbols are identified with integers. Mutating state as I go along to build a map between symbol names and integer IDs seems like a natural approach that avoids multiple passes through the data. How else would you suggest I solve this problem?

Unfortunately, I didn't understand much, first-order terms doesn't mean much without context. If possible could you make me a little example of how you would use alt_with_state with your parser ? I believe I already explain a solution, your parser should return result and you should use this result to build your state but maybe with a code example it would be more clear for both of us. I try to look at the code of your crate but that was too hard to follow without a lot of time to read.

@Geal
Copy link
Collaborator

Geal commented Oct 10, 2021

could the mutable state be carried through a Rc<RefCell<>> from one parser to the next?

@joshrule
Copy link
Author

@Geal - That's basically the solution I've landed on for now. I just wanted to bring up the friction point here in case either:

  1. there's a known idiom or tutorial that I've missed somewhere

    The potential to mutate some sort of parser state seemed to a common use case in, e.g., nom 4, so at the very least, it would be nice to have some sort of discussion in the documentation of how to handle this situation or why it should be avoided.

  2. there's an alternate interface that nom could consider introducing to support mutable parser state.

@Stargateur
Copy link
Contributor

Stargateur commented Oct 21, 2021

why it should be avoided

Cause it's become very hard to trace side effect, nom changed parser type for FnMut allowing some mutable state, that generally should be avoided cause... side effect should always be avoided if possible. But here you introduce a share side effect over multiple parser. This mean that in some of your parser inside alt you mutate the state by mistake or whatever, it's become very hard to debug the code. Limiting the ability for the user to make mistake is also a part of any good library. I'm not strongly opposite to add such feature just a little but I think you should think twice before use it.

how to handle this situation

Some pseudo code:

enum App {
    Standard(StandardApp),
    Binary(BinaryApp),
}

fn application(state: &HashMap<Id, Foo>) -> impl Parser<&'a str, App> {
    |input: &'a str|
    alt((
        map(|x: &'a str| self.standard_app(state)(x), App::Standard),
        map(|x: &'a str| self.binary_app(state)(x), App::Binary),
    ))(input)
}

fn handle_state(input: &'a str) -> IResult<&'a str, HashMap<Id, Foo>> {
  let mut state = HashMap::new();
  // use fold or while let or whatever and update the state
  while let Ok((input, app)) = application(&state)(input) {
    match app {
       // do thing like state.entry(id).or_insert_with(app);
    }
  }
}

Probably missed thing but the general idea is here, this have pro & con:

  • allow reuse
  • limit side effect to one parser
  • more verbose
  • may have a runtime cost (depend on opti or if you need or not the enum)

@Geal Geal added this to the 8.0 milestone Mar 14, 2022
@przygienda
Copy link

there is another very good use case for at least to carry read only data through the parser. Variants. Sometimes complex parsers have "well, accept this but not that for compatibility reasons, leads to bunch of alts driven by an if possibly". Completely stateless parsers are cute, like perfect FSMs. practically speaking, most realistic complex code does extended FSM with state or parsers that are nob'ed for behavior

@przygienda
Copy link

a possible solution is to actually implement the whole Input trait zoo for something that is (configuration, &[u8]) e.g. which leads quickly into a nice lifetime hell ;-) But in a sense this is the clean solution to parametrize the parser.

@epage
Copy link
Contributor

epage commented Oct 30, 2023

For state

@przygienda
Copy link

przygienda commented Oct 30, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants