-
Notifications
You must be signed in to change notification settings - Fork 789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IResult VS Result: making Incomplete part of errors? #356
Comments
From my point of view, it would remove a difficulty from nom usage. Now I have no idea what will be the result but it seems very interesting! |
Without knowing more details about what the final form of this would take, it would also mean a significant engineering effort for some of my own projects. For my own selfish reasons, I would be opposed to the change unless there is a material benefit for me as well. I do use
Yes. Many, but not all, of my usage of nom is with sockets. |
It's worth noting that Also I don't see how hiding |
@vcsjones how much code have you written using nom? If I decide to do the change, it might be something I can help with (I'd try to make the changes sed-able too). |
@keeperofdakeys the producers and consumers are awkward to use, and when you implement streaming protocols, you like to manage your sockets and buffers yourself, so that's why |
I don't have enough understanding of other nom use cases to make a call here, but as you mentioned, this would probably allow us to use real nom rather than a fork in |
A few to several thousand lines directly using Nom, much of which is not open :-(. The gist of it is an IPSec and SSH implementation.
That's good. I suppose if you do go through with the change, I would hope to be able to automate as much as I could. |
To be honest, I'd actually suggest that pub type Poll<T, E> = Result<Async<T>, E>;
/// Return type of future, indicating whether a value is ready or not.
#[derive(Copy, Clone, Debug, PartialEq)]
pub enum Async<T> {
/// Represents that a value is immediately ready.
Ready(T),
/// Represents that a value is not ready yet, but may be so later.
NotReady,
} Rather than putting A user could simply do an |
Hi there, As told in the IRC channel, I would like to work on the issue (though I promise nothing, I just investigate a bit for now). Currently, I am wondering whether we want to keep the
Thoughts?
I wonder if this couldn't be a separate crate, as (I guess) streaming is a specific need. What do you think? Florent |
@fflorent Well, part of my question is "Does In particular, I feel that scrutinizing the Even better, this could be done compatibly once such a method is added - |
If I am not wrong, one of the main point of this thread is the difficulty of using And I think @Geal is willing to separate some part of nom in different crate (please @Geal correct me if I am wrong :)). … At least, of course, when that's feasible (I have to admit that I don't know how to handle streaming in a project using nom, if anyone can point me to some I would be grateful :)).
This sounds reasonable to me to add this in nom too. |
mmmh, so, a few things here:
|
Thanks for your answer Geal!
Sounds wise.
So if I get your point correctly, you state that handling incomplete data doesn't mean we do streaming (but that we just may need to load in memory some partial data)? Florent |
I have been thinking about this proposal:
I fear that having Also is this worth to introduce such a breaking change? With my understanding (I emphasize these three previous words :)), I wonder if that wouldn't be wiser to keep IResult unchanged. Also if I am missing key points of the idea proposed here, I would be happy to get explanations or to let someone else do the work and see its virtues :). Cheers, |
So I'm currently testing this, and pushed the code there: 6a15807 I'm putting |
I agree with @eternaleye (#356 (comment)): Put Using For For This RFC might break the public API too of the software using nom. It's not a good idea to expose Despite the negative feedbacks about the amount of work to update the code, I am willing to go in this direction. If you fear that the user base will not be glad with this second BC (this one is much more important), publish a blog post explaining why it is important, apologise, and go through. People will understand. |
I am trying to compare the old // common
pub enum ErrorKind<E=u32> {
Custom(E),
Tag,
MapRes
}
pub enum Needed {
Unknown,
Size(usize)
}
// new
pub enum State<I, O> {
Ok(I, O),
Incomplete(Needed)
}
type Result_new<I, O, E=u32> = Result<State<I, O>, ErrorKind<E>>;
// old
enum Result_old<I, O, E=u32> {
Done(I, O),
Error(ErrorKind<E>),
Incomplete(Needed)
}
fn main() {
println!("{}", std::mem::size_of::<Result_new<&[u8], &[u8]>>()); // 48
println!("{}", std::mem::size_of::<Result_old<&[u8], &[u8]>>()); // 40
} This is due to the You might find a better way to represent it than I did, but the size of |
From what I have seen, moving from About the enum size, here's an update version of your code example, witht he pattern I'm currently testing (Incomplete on the error side): pub enum ErrorKind<E=u32> {
Custom(E),
Tag,
MapRes
}
pub enum Needed {
Unknown,
Size(usize)
}
// new
pub enum State<I, O> {
Ok(I, O),
Incomplete(Needed)
}
type Result_new<I, O, E=u32> = Result<State<I, O>, ErrorKind<E>>;
// old
enum Result_old<I, O, E=u32> {
Done(I, O),
Error(ErrorKind<E>),
Incomplete(Needed)
}
// Incomplete in error
type Result_err<I, O, E=u32> = Result<(I, O), Err<E>>;
pub enum Err<E=u32> {
Error(ErrorKind<E>),
Incomplete(Needed),
}
fn main() {
println!("IResult: {}", std::mem::size_of::<Result_old<&[u8], &[u8]>>()); // 40
println!("Incomplete in Ok: {}", std::mem::size_of::<Result_new<&[u8], &[u8]>>()); // 48
println!("Incomplete in Err: {}", std::mem::size_of::<Result_err<&[u8], &[u8]>>()); // 40
} So it stays at 40 bytes. There's another reason to put it on the error side: I'll add an unrecoverable error, so it would be a bit like this (I'm not decided on the name yet): pub enum Err<E=u32> {
Error(ErrorKind<E>),
UnrecoverableError(ErrorKind<E>),
Incomplete(Needed),
} This is for errors indicating that we should not backtrack and try other branches, but instead just bubble up the error. It does not increase the enum size and will make some parsers easier to manage. Afterwards, transform a nom result to the futures |
Yes, we have the same result when keeping We can probably rename |
Yes I don't like the name |
another interesting thing, while you got me worrying about struct size, I have a way to make Basically, instead of: type Result_err<I, O, E=u32> = Result<(I, O), Err<E>>;
pub enum Err<E=u32> {
Error(ErrorKind<E>),
Incomplete(Needed),
} I would have: type Result_err<I, O, E=u32> = Result<(I, O), Err<I,E>>;
pub enum Err<I,E=u32> {
Error(Context<I,E>),
Incomplete(Needed),
}
pub enum Context<I,E> {
Code(I,ErrorKind<E>),
} and in verbose mode: pub enum Context<I,E> {
Code(I,ErrorKind<E>),
List<Vec(I,ErrorKind<E>)>),
} That way, the
|
Sounds great! |
it's getting interesting: https://twitter.com/gcouprie/status/906186641706012672 |
FYI this is now done, with an additional element in the |
Any notable impacts on benchmarks? |
adding |
There have been a lot of demands that I change nom's basic type from
IResult
tostd::result::Result
.IResult
has the following definition:This was originally inspired from attoparsec's
IResult
in which thePartial
branch contained a closure to be called when more data is available. For various reasons, I was not able to make the closure idea work (note that Rust was very far from 1.0 at the time), so I chose to show how much data was needed to ask the user to parse again.I open this issue to study what a change from
IResult
toResult
would entail. I make no promise to do that change, and I will not put the issue to a vote. I will however take into account the responses I get.The proposal is to make the
Incomplete
branch part of theError
branch, which would allow employingResult
. I do not know yet what the end type would look like. I see two possibilities for theErr
type: containing either aNeeded
or the other error branches, or flatteningNeeded
at the same level.So, to detail the arguments:
pro:
Result
methods and the code relying on itIncomplete
usage, parsers are easier to writeDone
andError
. In some combinators, there will be only three of them:Done
,Error
orIncomplete(Needed::Unknown)
,Incomplete::Needed::Size(sz)
, because the calculation of needed data must still happenIncomplete
confusing, since they work on complete data (like a file completely read in memory)con:
Incomplete
, and this is a big breaking change for themIncomplete
would need to be updated to useResult
alt!
currently return onIncomplete
instead of testing the next branch. So, instead ofalt!
andalt_complete!
, do aalt!
andalt_incomplete!
?Incomplete
will hide this benefit of the libraryalso, I am not sure about the timeline here. I am doing nom 2.0 very soon and it introduces some breaking changes, but I'm worried this change might be too big and drive users away. On the other hand, a 3.0 would likely happen far in the future, and there would be even more code (in nom and in code relying on nom) to update.
so, what do people think?
The text was updated successfully, but these errors were encountered: