Add an OutputMode type parameter to drive parser results #1631

Geal · 2023-01-29T20:33:01Z

this work takes inspiration from chumsky's exploration of generic associated types. In chumsky's PR, output value creation depends on a Mode type parameter to either produce the value, or return (), and the call site decides if it needs the value or not. That mode can be transmitted from parent to child parser, and prevent the compiler from generating the code for an entire chain of results.
As an example, in the delimited combinator, we test tree parsers in sequence, but we only care about the result of the second one. For the other two parsers, we only need to know that they succeeded, we do not need their result, so they might as well not be generated.

This PR extends this solution further, by having a mode parameter for errors too: in a lot of cases, nom combinators just need to know that a parser failed, but do not want to process the error. The OutputMode type also carries a parameter that will drive streaming VS complete behaviour.

TODO:

validate conversion of multi combinators
validate conversion of branch combinators
validate conversion of number combinators
rewrite some tests and examples to avoid the .parse syntactic noise
test the effect on benchmarks
convert the rest of the parsers
docs

The Mode trait will be used to adapt the production of output or error values depending on the call site. If we do not care about the actual value or error, and just want to know if a parser was succssful or not, then the Mode trait allows us to signal it without producng the actual type. in Err, the Error and Failure variants get different types, because usually want to get the failure variant, while the error variant might change independently. The OutputMode trait also carries information about the streaming or complete status, and in the same way define which type of parser we want, directly at the call site

a new `process` method is used to handle the new OutputMode trait. The `OutputM` supporting structure is used to carry which modes we want to use, and depending on the case, we can call an inner parser using directly the mode we're in (example: Map), or we can convert it to Emit for Output (ex: MapRes because we have to apply the mapping function on the parser output and check for errors). We keep the `parse` method with `Emit` for both output and error, which allows us to convert combinators gradually: the ones using `parse` will work but won't benefit directly from these traits also parent and child combinators in a serie of parsers may support it

they are good candidates for this conversion, since some of them ignore parser results

coveralls · 2023-01-29T20:37:57Z

Pull Request Test Coverage Report for Build 5231826950

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

738 of 1136 (64.96%) changed or added relevant lines in 19 files are covered.
145 unchanged lines in 7 files lost coverage.
Overall coverage decreased (-0.3%) to 62.127%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/character/complete.rs	10	12	83.33%
src/character/streaming.rs	10	12	83.33%
benchmarks/benches/http.rs	0	4	0.0%
src/branch/mod.rs	18	22	81.82%
src/sequence/mod.rs	13	18	72.22%
src/bytes/complete.rs	29	35	82.86%
src/number/complete.rs	18	24	75.0%
src/bytes/streaming.rs	29	36	80.56%
src/number/streaming.rs	16	24	66.67%
src/internal.rs	26	41	63.41%

Files with Coverage Reduction	New Missed Lines	%
src/number/complete.rs	1	65.22%
src/error.rs	2	16.21%
src/lib.rs	3	33.33%
src/multi/mod.rs	3	81.01%
src/internal.rs	11	51.43%
benchmarks/benches/json.rs	21	0.0%
src/traits.rs	104	60.71%

Totals
Change from base Build 4725732019:	-0.3%
Covered Lines:	1811
Relevant Lines:	2915

💛 - Coveralls

epage · 2023-01-30T15:16:09Z

src/internal.rs

+  /// TODO
+  fn incomplete<E, F: FnOnce() -> E>(needed: Needed, err_f: F) -> Err<E>;
+  /// TODO
+  fn is_streaming() -> bool;


Something I've been wondering about is if this should have a &self parameter. In most cases, it'll be

#[inline] fn is_streaming(&self) -> bool { false // or `true` }

and I hope the compiler can optimize away the branch.

But taking &self would allow a BufReader to choose to change states from streaming to complete when the source has been exhausted which might help with some of the gotchas around streaming input.

Granted, that approach is easier when its tied to Input; I haven't thought through how it works with modes.

I'm not decided on that yet. Right now my thinking is that having that at runtime would generate a lot of code in all cases, while having it static would depend only on the call site: if you want both, you get both implementations, but probably with some code duplication

epage · 2023-01-30T15:21:23Z

src/internal.rs

+  fn parse(&mut self, input: Input) -> IResult<Input, Self::Output, Self::Error> {
+    self.process::<OutputM<Emit, Emit, Streaming>>(input)
+  }


Parser::parse is being hard coded to Streaming? That seems odd. I assume I am I missing something about this streaming design

yes, that part is not written yet. That might end up with two methods, like parse and parse_streaming

epage · 2023-01-30T15:35:43Z

src/internal.rs

+  fn process<OM: OutputMode>(
+    &mut self,
+    input: Input,
+  ) -> PResult<OM, Input, Self::Output, Self::Error>;


test the effect on benchmarks

I'm curious to see if you are able to see the dramatic savings that chumsky did or if some of that is more inherent to their design. My gut feeling is that in a lot of cases, the overhead from a lot of these outputs and error should be pretty low with nom's design as-is and that this wouldn't offer much improvement unless people were going heavy on complex preceded / termianted parsers which json doesn't do and I've not done in my various parsers.

Besides runtime performance, it might also be good to check the impact to binary size or release binaries as this might cause more instances of the generic functions to be generated. I'd also be curious about compile times but I'm assuming that won't be big enough to notice.

currently most of nom's overhead is in creating values and errors and discarding them, and that will be even more present with heavy error types. So far I have not measured it though, because I have not converted enough code to affect the benchmarks. I'm worried about compilation times, though. That might be a mix of increasing compilation times due to heavy use of impl traits (returning closures was erasing type information), but decreasing it due to less code being generated.

currently most of nom's overhead is in creating values and errors and discarding them

Can you qualify this with some more details?

For example, if you are doing delimited(char('['), <something>, char(']')), then I would expect the value wouldn't have much overhead. From the nom examples I've looked at and in my own parsers, this is a fairly common case. Is the overhead more than I expect or is this for more complex usage patterns? chumsky saw the gains on json but I see these more simple cases in nom's json benchmark.

One possible gain is people not discover the _count variant of many parsers. I know I make that mistake. However, that also ties into needing to also pay attention to using terminated / preceded rather than tuples Something I was considering was an alternative trait to Extend that (1) allowed pre-allocation and (2) was implemented for usize and (), replacing _count variants. This is a bottom-up driven approach to not creating expensive values, rather than the Mode's top-down approach. One benefit to top-down is when reusing parsers between throw-away and non-throwaway contexts. So far, I've not needed to do it too much and I would be interested in looking at cases for those that do and how much it impacts performance.

However, I can see it in errors with alt. With combine, I did see performance overhead from their version of alt with heavy errors. On the other hand, I suspect it'd be better to replace it with combine's dispatch. It is a macro that lets you write a match with parsers which cuts down on the number of parse attempts made, reducing overhead further. I can only speak for switching choice / alt to dispatch but it dramatically sped up my code. While this can be done with an imperative parser in nom, I found duplicating dispatch made it more ergonomic which makes it more likely to be used. I suspect some of my speed gains over combine were from finding another place I could insert a dispatch call.

Even with that, I've noticed my performance is sensitive to changes in my error type and I've intentionally kept it simple, putting all of the expensive operations when converting to the public error type for my crate. An opt that doesn't apply can still easily do a single allocation that is thrown away. Dropping that allocation and dropping any of the panic handling around allocations can be beneficial. I was considering a non-GAT approach here as well by making the backtracking error type accepted by opt to be ()). GATs do simplify the process.

epage · 2023-01-30T15:42:43Z

src/internal.rs

@@ -242,7 +361,16 @@ pub trait Parser<Input> {

  /// A parser takes in input type, and returns a `Result` containing
  /// either the remaining input and the output value, or an error
-  fn parse(&mut self, input: Input) -> IResult<Input, Self::Output, Self::Error>;
+  fn parse(&mut self, input: Input) -> IResult<Input, Self::Output, Self::Error> {


Won't modal parsers wrapping non-modal parsers lose some of the benefit?

Example: If I have a preceded((parser1, cut(parser2)), parser3). preceded does a Check on (parser1, parser2). Regardless of what the tuple impl does, cut just calls parse which switches the mode from Check to Emit

Depending on how concerned you are about this problem, it almost seems like the existence of parse makes it easy for people to call the wrong thing and people should call process but processs API is more complicated for end-users who don't want to worry about this stuff.

yes, to work properly, the entire chain, at least in what nom provides, should support it. I'd like to see if converting the existing way of writing parsers (functions returning closures) to this would not be too noisy

epage · 2023-01-30T15:45:35Z

src/internal.rs

+
+      fn process<OM: OutputMode>(&mut self, i: I) -> PResult<OM, I, Self::Output, Self::Error> {
        let ($(ref mut $parser),+,) = *self;

-        $(let(i, $output) = $parser.parse(i)?;)+
+        // FIXME: is there a way to avoid producing the output values?
+        $(let(i, $output) = $parser.process::<OutputM<Emit, OM::Error, OM::Incomplete>>(i)?;)+

-        Ok((i, ($($output),+,)))
+        // ???
+        Ok((i, OM::Output::bind(|| ($($output),+,))))


Naively, I would assume you could pass the mode along which would end up with ((), (), ()) for Check and then bind would turn that into ().

However, i assume you tried that. What problem did you run into?

I haven't fixed that one yet, passing the mode along should be enough, but ideally I'd like a way to selectively discard some tuple elements as you said in winnow-rs/winnow#82

epage · 2023-01-30T15:49:06Z

src/internal.rs

@@ -343,12 +477,15 @@ macro_rules! impl_parser_for_tuple {
    {
      type Output = ($($output),+,);
      type Error = E;
-      fn parse(&mut self, i: I) -> IResult<I, ($($output),+,), E> {
+
+      fn process<OM: OutputMode>(&mut self, i: I) -> PResult<OM, I, Self::Output, Self::Error> {


Side note: I feel like adding Check support further highlights the problem in winnow-rs/winnow#82.

Geal · 2023-01-30T18:08:30Z

@epage this is still pretty early, so don't overindex on the current state of the code and API, I'm still exploring how it can work in the end

epage · 2023-01-30T18:29:04Z

@epage this is still pretty early, so don't overindex on the current state of the code and API, I'm still exploring how it can work in the end

I understand and am commenting now in case any of my thoughts are helpful for that exploratory process as I've been thinking about parallel problems. If its not helpful at this stage, I can back off.

epage · 2023-01-30T23:13:02Z

src/multi/mod.rs

 pub fn many0<I, F>(
-  mut f: F,
-) -> impl FnMut(I) -> IResult<I, Vec<<F as Parser<I>>::Output>, <F as Parser<I>>::Error>
+  f: F,
+) -> impl Parser<I, Output = Vec<<F as Parser<I>>::Output>, Error = <F as Parser<I>>::Error>
 where
  I: Clone + InputLength,
  F: Parser<I>,
 {
-  move |mut i: I| {
-    let mut acc = crate::lib::std::vec::Vec::with_capacity(4);
+  Many0 { parser: f }
+}
+
+/// Prser implementation for the [many0] combinator
+pub struct Many0<F> {
+  parser: F,
+}


aside: In winnow-rs/winnow#6, I gave up on moving the parsers to be structs in part because of type inference issues. Looking back at the errors, it looks like it all came down to my PhantomData:

error[E0282]: type annotations needed Error: --> src/number/complete.rs:1411:9 | 1411 | map(tuple((digit1, opt(pair(char('.'), opt(digit1))))), |_| ()), | ^^^ cannot infer type of the type parameter `PO` declared on the function `map` | help: consider specifying the generic arguments | 1411 | map::<_, PO, _, I, O, E>(tuple((digit1, opt(pair(char('.'), opt(digit1))))), |_| ()), | +++++++++++++++++++++

(PO, I, O, and E were PhantomData fields)

I suspect #1626 would help with this type inference problem and, now looking at it, I think it does, making life easier on this PR than it was for me on winnow-rs/winnow#6.

Geal · 2023-02-05T15:17:18Z

it's able to run the JSON parser now (not fully, still missing the float parsing), and so far it looks slower (between 3% and 16%). I do not know yet where that comes from, I suspect that either there's an issue with recursin, or that some parts are not properly inlined. The UX looks ok, it requires adding a .parse() here and there but it would not be too traumatic to convert a nom 7 parser, and it's possible to convert selected parts of a parser as needed, while keeping the functions everywhere else.

Writing an implementation of Parser::process is a bit annoying though, I'll spend more time on UX

epage · 2023-02-05T15:31:48Z

I've been finding the json benchmark is too trivial to get reasonable numbers from it, so I've added a larger one: winnow-rs/winnow@1a48893

FYI the dispatch! macro dropped json parse time from 12ms to 9ms for the large file: winnow-rs/winnow#119

Geal · 2023-02-05T16:55:18Z

Yes, I've been testing with the canada file too, that's where I get a 3% difference.
dispatch is a good idea, I wrote similar switches in other parsers but never took the time to add it back.
FWIW the goal of that benchmark was not to make the fastest json parser, but to check the evolution between nom versions with a naive parser, so that's why I have not optimized it much

Geal · 2023-02-05T18:54:41Z

flamegraph of execution from main:

flamegraph of execution from this PR:

epage · 2023-02-06T15:27:21Z

dispatch is a good idea, I wrote similar switches in other parsers but never took the time to add it back.

Ah, I should have clarified. As dispatch! is probably the closest to the idealized result of GAT, the benchmark I ran provides an theoretical lower limit on parse times for the alt calls I replaced.

FWIW the goal of that benchmark was not to make the fastest json parser, but to check the evolution between nom versions with a naive parser, so that's why I have not optimized it much

Yes, I've kept a naive version which is what I'd highlight in winnow's cookbook. I also am considering writing an optimization guide that show cases the various stages of optimization and report the benchmark results so people can see the relative impact of various changes.

epage · 2023-02-06T15:28:13Z

Are your benchmarks with Error or VerboseError? While I'm surprised by a slow down, I would expect more of a speed up with an error type that does more processing.

tisonkun · 2023-06-23T12:10:12Z

It seems customized parser become harder after this patch.

Previously works:

fn comma_separated_list1<'a, T>(
    f: impl FnMut(Input<'a>) -> ParseResult<'a, T>,
) -> impl FnMut(Input<'a>) -> ParseResult<'a, Vec<T>> {
    separated_list1(Comma, f)
}

Now breaks:

error[E0277]: expected a `FnMut<(&'a [Token<'a>],)>` closure, found `impl Parser<&[Token<'_>], Output = Vec<<impl FnMut(Input<'a>) -> ParseResult<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = parser::error::ParseError>`
   |
67 | ) -> impl FnMut(Input<'a>) -> ParseResult<'a, Vec<T>> {
   |      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected an `FnMut<(&'a [Token<'a>],)>` closure, found `impl Parser<&[Token<'_>], Output = Vec<<impl FnMut(Input<'a>) -> ParseResult<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = parser::error::ParseError>`
68 |     separated_list1(Comma, f)
   |     ------------------------- return type was inferred to be `impl Parser<&[Token<'_>], Output = Vec<<impl FnMut(Input<'a>) -> ParseResult<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = ParseError>` here
   |
   = help: the trait `FnMut<(&'a [Token<'a>],)>` is not implemented for `impl Parser<&[Token<'_>], Output = Vec<<impl FnMut(Input<'a>) -> ParseResult<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = parser::error::ParseError>`

epage · 2023-06-23T13:11:35Z

If you switch the impl FnMut to impl Parser, I'm assuming it'll be resolved

tisonkun · 2023-06-23T15:33:26Z

@epage you're right. I actually change the return type when debugging, but not for the argument.

After changing the argument also, it seems work now:

fn comma_separated_list1<'a, T>(
    f: impl Parser<Input<'a>, Output = T, Error = ParseError>,
) -> impl Parser<Input<'a>, Output = Vec<T>, Error = ParseError> {
    separated_list1(Comma, f)
}

... while I may introduce a local trait bound to reduce the longy type name.

UPDATE - Here is the trait alias:

pub(self) trait ParserBound<'a, O> = Parser<Input<'a>, Output = O, Error = ParseError>;

... but a trait bound doesn't work:

pub(self) trait ParserBound<'a, O>:
    Parser<Input<'a>, Output = O, Error = ParseError>
{
}

error[E0277]: the trait bound `impl Parser<&[Token<'_>], Output = Vec<<impl ParserBound<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = parser::error::ParseError>: ParserBound<'a, Vec<T>>` is not satisfied
  --> src/database/src/sql/parser/parse/mod.rs:69:64
   |
69 | fn comma_separated_list1<'a, T>(f: impl ParserBound<'a, T>) -> impl ParserBound<'a, Vec<T>> {
   |                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `ParserBound<'a, Vec<T>>` is not implemented for `impl Parser<&[Token<'_>], Output = Vec<<impl ParserBound<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = parser::error::ParseError>`
70 |     separated_list1(Comma, f)
   |     ------------------------- return type was inferred to be `impl Parser<&[Token<'_>], Output = Vec<<impl ParserBound<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = ParseError>` here

epage · 2023-06-23T15:41:47Z

For trait ParserBound, did you add a blanket impl for Parser<Input<'a>, Output = O, Error = ParseError>?

tisonkun · 2023-06-23T16:21:50Z

blanket impl

I think I should do something like that. But actually I don't know what does it mean and how to do 🤣

epage · 2023-06-23T16:26:51Z

pub(self) trait ParserBound<'a, O>:
    Parser<Input<'a>, Output = O, Error = ParseError>
{
}

impl<'a, P, O> ParserBound<'a, O> for P
where
    P: Parser<Input<'a>, Output = O, Error = ParseError>
{
}

tisonkun · 2023-06-24T02:05:24Z

@epage Thank you! It works.

Geal added 4 commits January 24, 2023 22:20

convert some combinators

a2145cb

convert sequence parsers

1cec982

they are good candidates for this conversion, since some of them ignore parser results

fix doctests

ed6c905

epage reviewed Jan 30, 2023

View reviewed changes

Geal added 2 commits January 30, 2023 22:11

convert the alt combinator

30a129f

convert many0 and separated_list0

b5da7f3

epage reviewed Jan 30, 2023

View reviewed changes

Geal added 5 commits January 31, 2023 23:27

recursive parsers can work with an explicit implementation

2afb163

convert tag and take, with streaming behaviour decided by the Mode

11d25d6

convert verify

7029107

convert character parsers and split_at_position

a72de13

improve benchmarks

bd38093

port the HTTP bench

7d198e2

profiling target

f350eeb

Geal added 3 commits February 13, 2023 08:49

port double and recognize_float

116c4bf

Merge branch 'main' into gat

488208a

alt: only create and combine error when needed

044b426

epage mentioned this pull request Feb 15, 2023

Should Parser use associated types, rather than generics? winnow-rs/winnow#163

Closed

support opt and recognize

6c1b4d3

Geal added 12 commits February 28, 2023 21:48

convert combinators

60cc737

convert multi combinators

e349d66

Merge branch 'main' into gat

adda2ae

convert the escaped parsers

f675c88

convert the permutation combinator

7b5dc82

port satisfy

370163d

raise minimal version to 1.65

25af30d

fix build in no-default-features

3c2e18e

fix docs and warnings

c4cb53f

fix tests and benchmarks

a84ee03

number parsers

f60dda5

fix doc tests

d403217

epage mentioned this pull request Jun 5, 2023

Feature request: Errors should not have the same type as Failure #1669

Closed

Geal added 5 commits June 10, 2023 22:23

Merge branch 'main' into gat

8b1e822

warning

721048e

fix doc link

d9f643a

cleanup

a55d6c6

Merge branch 'main' into gat

d390a64

Geal marked this pull request as ready for review June 10, 2023 21:15

Geal merged commit 90d78d6 into main Jun 10, 2023
28 checks passed

Geal deleted the gat branch June 10, 2023 21:16

Geal restored the gat branch June 10, 2023 21:16

krtab mentioned this pull request Oct 20, 2023

fix: Recipes had drifted from the implementation and weren't compiling #1695

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an OutputMode type parameter to drive parser results #1631

Add an OutputMode type parameter to drive parser results #1631

Geal commented Jan 29, 2023

coveralls commented Jan 29, 2023 •

edited

epage Jan 30, 2023 •

edited

Geal Jan 30, 2023

epage Jan 30, 2023

Geal Jan 30, 2023

epage Jan 30, 2023

Geal Jan 30, 2023

epage Jan 30, 2023

epage Jan 30, 2023

Geal Jan 30, 2023

epage Jan 30, 2023

Geal Jan 30, 2023

epage Jan 30, 2023

Geal commented Jan 30, 2023

epage commented Jan 30, 2023

epage Jan 30, 2023 •

edited

Geal commented Feb 5, 2023

epage commented Feb 5, 2023

Geal commented Feb 5, 2023

Geal commented Feb 5, 2023

epage commented Feb 6, 2023

epage commented Feb 6, 2023

tisonkun commented Jun 23, 2023

epage commented Jun 23, 2023

tisonkun commented Jun 23, 2023 •

edited

epage commented Jun 23, 2023

tisonkun commented Jun 23, 2023

epage commented Jun 23, 2023

tisonkun commented Jun 24, 2023

Add an OutputMode type parameter to drive parser results #1631

Add an OutputMode type parameter to drive parser results #1631

Conversation

Geal commented Jan 29, 2023

coveralls commented Jan 29, 2023 • edited

Pull Request Test Coverage Report for Build 5231826950

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

epage Jan 30, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Geal commented Jan 30, 2023

epage commented Jan 30, 2023

epage Jan 30, 2023 • edited

Choose a reason for hiding this comment

Geal commented Feb 5, 2023

epage commented Feb 5, 2023

Geal commented Feb 5, 2023

Geal commented Feb 5, 2023

epage commented Feb 6, 2023

epage commented Feb 6, 2023

tisonkun commented Jun 23, 2023

epage commented Jun 23, 2023

tisonkun commented Jun 23, 2023 • edited

epage commented Jun 23, 2023

tisonkun commented Jun 23, 2023

epage commented Jun 23, 2023

tisonkun commented Jun 24, 2023

coveralls commented Jan 29, 2023 •

edited

epage Jan 30, 2023 •

edited

epage Jan 30, 2023 •

edited

tisonkun commented Jun 23, 2023 •

edited