Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usability: address changes introduced with Complete* and many* combinators #839

Closed
Geal opened this issue Oct 6, 2018 · 9 comments
Closed
Milestone

Comments

@Geal
Copy link
Collaborator

Geal commented Oct 6, 2018

references:

There are especially issues around transforming to and from the Complete* types, when reusing parsers or passing the results through flat_map! and other combinators.

There's also the problem of making sure basic parser functions work with all types. As an example, the hex_u32, be_* and le_* only work on &[u8], not CompleteByteSlice

@zaphar
Copy link

zaphar commented Oct 10, 2018

Just want to say that this particular issue is a blocker for my moving a project over to nom4. I was really excited to see the improvements in error handling in nom4 but this change and the invasiveness of the fix are giving me pause to finishing the conversion.

Most of the other changes are pretty mechanical and easy to do. This one does not fall in that category though.

@Geal
Copy link
Collaborator Author

Geal commented Oct 13, 2018

@zaphar can you provide examples of code that is affected by the changes? This will help me think about the issue

@leshow
Copy link

leshow commented Oct 14, 2018

Since my issue is mentioned, I'll add I'm blocked form upgrading from nom 2 as well. @khernyo appears to have written a PR to fix my particular problem (referenced in the issue)

@Geal Geal added this to the 5.0 milestone Dec 14, 2018
@Geal Geal pinned this issue Dec 26, 2018
@Geal
Copy link
Collaborator Author

Geal commented Feb 6, 2019

I experimented a bit in snack with edition 2018, to see if the edition 2018 could change things.
My first idea was to make different submodules, one with parsers that work in streaming (ex: nom::streaming::bytes), one for parsers that don't nom::complete::bytes) and let the user choose which macro to import with use nom::complete::bytes::tag.
Unfortunately, the #[macro_export] attribute puts macro at the top level, not at the module level: rust-lang/rust#54112

I have another idea though. It is possible to rename macros when importing them. So we could have use nom::many0 if we want to use the streaming version, but we could do use nom::many0_complete as many0 if we wanted a version that stops once it reaches the end of the buffer.
It makes imports a bit more annoying, but will keep the parsers more readable (at the cost of a lot of code duplication). More importantly, we would not need the CompleteStr and CompleteByteSlice types anymore.

I could go further with this and making nom a very small crates with only the core types, and separate the streaming or not streaming macros in different crates. But that might be a bit more annoying to maintain.

@athei
Copy link

athei commented Feb 6, 2019

That is nice! With that change is there even a use case for many0 and similar macros? They always return Incomplete and everyone just tries to come up with hacks to make them behave like the many0_complete you proposed. When you think about it they make only sense as complete version because how would a streaming parser know when many0 is done: The amount or length was transmitted and they are usually called through something like delimited! or length_bytes! through which we know for sure that the input must me complete.

That means the Incomplete is always returned by a caller of many0 but never by many0 itself and therefore there is no reason to have non-complete versions of them.

@Geal
Copy link
Collaborator Author

Geal commented Feb 7, 2019

the current behaviour of many0! is actually correct. It should stop and return the Vec of values once its underlying parser fails, but the issue is in defining what makes that parser fails.
Imagine you have the following: named!(parser<Vec<&[u8]>>, many0!(tag!("abc")))
if you pass abcabcabc! the parser will return ["abc", "abc", "abc"] as expected, because the tag will fail on the ! character.
But if you pass abcabc, the parser has no way of knowing if that's the whole input, or if it's missing data and the whole input is actually abcabcabc!.
But as you mentioned, often when working in streaming mode with length delimited format, we get fields for which we know the exact length (and probably have everything in the buffer already) so we should be able to switch to another behaviour.

@athei
Copy link

athei commented Feb 7, 2019

Ahh okay I was under the impression that many0! returns an error when the underlying parser fails. My mistake. It makes much more sense now in my head. Yes, complete versions of the macros would be much appreciated.

@leshow
Copy link

leshow commented Mar 11, 2019

I noticed my issue was closed, but doesn't work on 4.2. These changes will be included in 5?

@Geal
Copy link
Collaborator Author

Geal commented Jun 17, 2019

closing this now, as nom 5 will fix those usability issues with CompleteByteSlice and CompleteStr by check notes REMOVING THEM ENTIRELY :D

Now there are specialized versions of parsers for streaming or complete input data in the streaming or complete submodules for some parser categories (see for example nom::character or nom::number`)

@Geal Geal closed this as completed Jun 17, 2019
@Geal Geal unpinned this issue Apr 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants