Thread state through nom combinators #1419

joshrule · 2021-09-30T03:32:59Z

Prerequisites

Here are a few things you should provide to help me understand the issue:

Rust version : rustc 1.55.0 (c8dfcfe04 2021-09-06)
nom version : 7.0.0
nom compilation features used: None

Test case

I'm updating a parser from nom 4 to nom 7. I'm parsing term rewriting systems, which involves parsing a lot of first-order terms. These first-order terms can contain function symbols and constants, collectively called operators, as well as variables. The identity of each operator and variable is maintained by assigning it a numeric id, and these ids are established while parsing. The parser maintains a map between strings and ids.

In nom 4, I created a parser struct and used combinations of the method! and call_m! macros to build up the id map. In nom 7, these macros no longer exist, nor do corresponding functions.

My first attempt was to translate something like:

method!(application<Parser<'a>, CompleteStr, Term>, mut self,
        alt!(call_m!(self.standard_application) |
             call_m!(self.binary_application))
);

into

fn application(&mut self, s: &'a str) -> IResult<&'a str, Term> {
    alt((
        |x: &'a str| self.standard_application(x),
        |x: &'a str| self.binary_application(x),
    ))(s)
}

but the compiler helpfully reminds me that this requires simultaneous unique access to self in both closures.

Perhaps nom could provide the option to fold some sort of mutable state type through the combinators so that I could write:

fn application(&mut self, s: &'a str) -> IResult<&'a str, Term> {
    alt_with_state(self, (
        |x: &'a str, this /*: &mut Self */| this.standard_application(x),
        |x: &'a str, this| this.binary_application(x),
    ))(s)
}

The text was updated successfully, but these errors were encountered:

joshrule · 2021-09-30T03:33:47Z

Additional discussion on the Rust Discourse.

Stargateur · 2021-10-01T07:39:09Z

I'm think you are doing it wrong, a parser should not require an external state like that. That doesn't make sense that if a parser fail it mutate the state. You would mutate the state in the top level parser in function of what your subparser return.

I'm opposite to this feature, FnMut already allow state and sharing state between parser in alt is bad design.

joshrule · 2021-10-01T22:52:05Z

Thanks for responding. I'm not sure exactly how to apply your advice. I'm trying to parse text directly into first-order terms whose symbols are identified with integers. Mutating state as I go along to build a map between symbol names and integer IDs seems like a natural approach that avoids multiple passes through the data. How else would you suggest I solve this problem?

Stargateur · 2021-10-01T23:44:17Z

Thanks for responding. I'm not sure exactly how to apply your advice. I'm trying to parse text directly into first-order terms whose symbols are identified with integers. Mutating state as I go along to build a map between symbol names and integer IDs seems like a natural approach that avoids multiple passes through the data. How else would you suggest I solve this problem?

Unfortunately, I didn't understand much, first-order terms doesn't mean much without context. If possible could you make me a little example of how you would use alt_with_state with your parser ? I believe I already explain a solution, your parser should return result and you should use this result to build your state but maybe with a code example it would be more clear for both of us. I try to look at the code of your crate but that was too hard to follow without a lot of time to read.

Geal · 2021-10-10T21:12:47Z

could the mutable state be carried through a Rc<RefCell<>> from one parser to the next?

joshrule · 2021-10-11T22:32:51Z

@Geal - That's basically the solution I've landed on for now. I just wanted to bring up the friction point here in case either:

there's a known idiom or tutorial that I've missed somewhere

The potential to mutate some sort of parser state seemed to a common use case in, e.g., nom 4, so at the very least, it would be nice to have some sort of discussion in the documentation of how to handle this situation or why it should be avoided.
there's an alternate interface that nom could consider introducing to support mutable parser state.

Stargateur · 2021-10-21T09:49:01Z

why it should be avoided

Cause it's become very hard to trace side effect, nom changed parser type for FnMut allowing some mutable state, that generally should be avoided cause... side effect should always be avoided if possible. But here you introduce a share side effect over multiple parser. This mean that in some of your parser inside alt you mutate the state by mistake or whatever, it's become very hard to debug the code. Limiting the ability for the user to make mistake is also a part of any good library. I'm not strongly opposite to add such feature just a little but I think you should think twice before use it.

how to handle this situation

Some pseudo code:

enum App {
    Standard(StandardApp),
    Binary(BinaryApp),
}

fn application(state: &HashMap<Id, Foo>) -> impl Parser<&'a str, App> {
    |input: &'a str|
    alt((
        map(|x: &'a str| self.standard_app(state)(x), App::Standard),
        map(|x: &'a str| self.binary_app(state)(x), App::Binary),
    ))(input)
}

fn handle_state(input: &'a str) -> IResult<&'a str, HashMap<Id, Foo>> {
  let mut state = HashMap::new();
  // use fold or while let or whatever and update the state
  while let Ok((input, app)) = application(&state)(input) {
    match app {
       // do thing like state.entry(id).or_insert_with(app);
    }
  }
}

Probably missed thing but the general idea is here, this have pro & con:

allow reuse
limit side effect to one parser
more verbose
may have a runtime cost (depend on opti or if you need or not the enum)

przygienda · 2023-10-29T17:01:40Z

there is another very good use case for at least to carry read only data through the parser. Variants. Sometimes complex parsers have "well, accept this but not that for compatibility reasons, leads to bunch of alts driven by an if possibly". Completely stateless parsers are cute, like perfect FSMs. practically speaking, most realistic complex code does extended FSM with state or parsers that are nob'ed for behavior

przygienda · 2023-10-29T17:25:21Z

a possible solution is to actually implement the whole Input trait zoo for something that is (configuration, &[u8]) e.g. which leads quickly into a nice lifetime hell ;-) But in a sense this is the clean solution to parametrize the parser.

epage · 2023-10-30T12:55:50Z

For state

nom_locate has an extra field
pori has the Stateful input decorator

przygienda · 2023-10-30T14:37:23Z

unfortunately 0.0 crate and zero doc. Having had a quick look I basically implemented the pori Stateful stuff myself now

…

On Mon, Oct 30, 2023 at 1:56 PM Ed Page ***@***.***> wrote: For state - nom_locate has an extra field <https://docs.rs/nom_locate/4.2.0/nom_locate/struct.LocatedSpan.html#structfield.extra> - pori has the Stateful input decorator <https://docs.rs/pori/latest/pori/struct.Stateful.html> — Reply to this email directly, view it on GitHub <#1419 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANDIC5BCCJGN5SSP7CTEJDYB6PWDAVCNFSM5FBKA6D2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZYGUYTGNRYGI3Q> . You are receiving this because you commented.Message ID: ***@***.***>

Geal added this to the 8.0 milestone Mar 14, 2022

epage mentioned this issue Feb 1, 2023

Post-fork documentation audit winnow-rs/winnow#100

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread state through nom combinators #1419

Thread state through nom combinators #1419

joshrule commented Sep 30, 2021

joshrule commented Sep 30, 2021

Stargateur commented Oct 1, 2021

joshrule commented Oct 1, 2021

Stargateur commented Oct 1, 2021

Geal commented Oct 10, 2021

joshrule commented Oct 11, 2021

Stargateur commented Oct 21, 2021 •

edited

Loading

przygienda commented Oct 29, 2023

przygienda commented Oct 29, 2023

epage commented Oct 30, 2023

przygienda commented Oct 30, 2023 via email

Thread state through nom combinators #1419

Thread state through nom combinators #1419

Comments

joshrule commented Sep 30, 2021

Prerequisites

Test case

joshrule commented Sep 30, 2021

Stargateur commented Oct 1, 2021

joshrule commented Oct 1, 2021

Stargateur commented Oct 1, 2021

Geal commented Oct 10, 2021

joshrule commented Oct 11, 2021

Stargateur commented Oct 21, 2021 • edited Loading

przygienda commented Oct 29, 2023

przygienda commented Oct 29, 2023

epage commented Oct 30, 2023

przygienda commented Oct 30, 2023 via email

Stargateur commented Oct 21, 2021 •

edited

Loading