-
Notifications
You must be signed in to change notification settings - Fork 803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow types to be both stream and complete parsers #1535
Comments
|
What if we had
This would
We'd need to have |
The following is a idea I have while doing my own parser. This is maybe not possible to do in nom or would require a lot of change but I don't see why not share ideas. The idea was to not have streaming and complete parser like nom. I wanted to remove the problem of duplicate code in nom but I also wanted to have a way to make "streaming" parser like nom, currently nom have a major problem you can't "resume" where you stop. I saw two solutions:
For example: struct Buf<Reader, const N: usize> {
buf: Vec<u8>,
reader: Reader,
}
#[derive(Clone)]
pub(crate) enum Position {
RangeFrom(RangeFrom<usize>),
Range(Range<usize>),
}
#[derive(Clone)]
pub struct ReaderStream<Reader, const N: usize> {
buf: Rc<RefCell<Buf<Reader, N>>>,
position: Position,
}
This design is quite, MUTABLE, but for the user it's hidden, then it has a lot of advantage, you don't resume when you need more data, you just read for more on the Reader. The Reader itself could be implemented to allow "sort of" async code, for example in tokio you could spawn a blocking thread for the parsing, that will wait the data read by an async task. This is not perfect but compared to resume from start the parsing it's quite nice. It's also remove the need of duplicate code and remove one branch in the outcome of a parser. There is another problem, present in nom too, this buffer could become very large, that why I also think to a trait, while the implementation of this trait must be safe, this could allow bug in runtime if wrongly used, the purpose is simple, call the method if you have "finish" with previous data. Thus I still didn't work much on it: pub trait Consume {
fn consume(self) -> Self;
} For now it's basic I didn't try this yet. The point would be to flush the buffer, making any use of previously used data an error. Thus I don't think a lot of parser could make use such trait. |
I've been playing with the Currently, almost all of the parsers are either To put this in more concrete terms, we need to turn pub fn tag<I, O, C, E: ParseError<(I, usize)>>(
pattern: O,
count: C,
) -> impl Fn((I, usize)) -> IResult<(I, usize), O, E>
where
I: Slice<RangeFrom<usize>> + InputIter<Item = u8> + InputLength + Clone,
C: ToUsize,
O: From<u8> + AddAssign + Shl<usize, Output = O> + Shr<usize, Output = O> + PartialEq,
{
let count = count.to_usize();
move |input: (I, usize)| {
let inp = input.clone();
take(count)(input).and_then(|(i, o)| {
if pattern == o {
Ok((i, o))
} else {
Err(Err::Error(error_position!(inp, ErrorKind::TagBits)))
}
})
}
} into pub fn tag<O, C>(pattern: O, count: C) -> Tag<O, C> {
Tag { pattern, count }
}
/// Implementation of [`tag`]
pub struct Tag<O, C> {
pattern: O,
count: C,
}
impl<I, O, C, E> Parser<(I, usize), O, E> for Tag<O, C>
where
I: Slice<RangeFrom<usize>> + InputIter<Item = u8> + InputLength + Clone,
C: ToUsize,
O: From<u8> + AddAssign + Shl<usize, Output = O> + Shr<usize, Output = O> + PartialEq,
E: ParseError<(I, usize)>,
{
fn parse(&mut self, input: (I, usize)) -> IResult<(I, usize), O, E> {
let count = self.count.to_usize();
let inp = input.clone();
take(count).parse(input).and_then(|(i, o)| {
if self.pattern == o {
Ok((i, o))
} else {
Err(Err::Error(error_position!(inp, ErrorKind::TagBits)))
}
})
}
} With this change, |
The compiler is not always happy with the above conversion Take pub fn bits<I, O, E1, E2, P>(mut parser: P) -> impl FnMut(I) -> IResult<I, O, E2>
where
E1: ParseError<(I, usize)> + ErrorConvert<E2>,
E2: ParseError<I>,
I: Slice<RangeFrom<usize>>,
P: FnMut((I, usize)) -> IResult<(I, usize), O, E1>,
{
move |input: I| match parser((input, 0)) {
Ok(((rest, offset), result)) => {
// If the next byte has been partially read, it will be sliced away as well.
// The parser functions might already slice away all fully read bytes.
// That's why `offset / 8` isn't necessarily needed at all times.
let remaining_bytes_index = offset / 8 + if offset % 8 == 0 { 0 } else { 1 };
Ok((rest.slice(remaining_bytes_index..), result))
}
Err(Err::Incomplete(n)) => Err(Err::Incomplete(n.map(|u| u.get() / 8 + 1))),
Err(Err::Error(e)) => Err(Err::Error(e.convert())),
Err(Err::Failure(e)) => Err(Err::Failure(e.convert())),
}
} When I transform this as described above, you get: pub fn bits<P>(parser: P) -> Bits<P> {
Bits { parser }
}
/// Implementation of [`bits`]
pub struct Bits<P> {
parser: P,
}
impl<I, O, E1, E2, P> Parser<I, O, E2> for Bits<P>
where
E1: ParseError<(I, usize)> + ErrorConvert<E2>,
E2: ParseError<I>,
I: Slice<RangeFrom<usize>>,
P: Parser<(I, usize), O, E1>,
{
fn parse(&mut self, input: I) -> IResult<I, O, E2> {
match self.parser((input, 0)) {
Ok(((rest, offset), result)) => {
// If the next byte has been partially read, it will be sliced away as well.
// The parser functions might already slice away all fully read bytes.
// That's why `offset / 8` isn't necessarily needed at all times.
let remaining_bytes_index = offset / 8 + if offset % 8 == 0 { 0 } else { 1 };
Ok((rest.slice(remaining_bytes_index..), result))
}
Err(Err::Incomplete(n)) => Err(Err::Incomplete(n.map(|u| u.get() / 8 + 1))),
Err(Err::Error(e)) => Err(Err::Error(e.convert())),
Err(Err::Failure(e)) => Err(Err::Failure(e.convert())),
}
}
} but this fails because of $ cargo check
Checking nom v7.1.1 (/home/epage/src/personal/nom-experimental)
error[E0207]: the type parameter `E1` is not constrained by the impl trait, self type, or predicates
--> src/bits/mod.rs:50:12
|
50 | impl<I, O, E1, E2, P> Parser<I, O, E2> for Bits<P>
| ^^ unconstrained type parameter
For more information about this error, try `rustc --explain E0207`.
error: could not compile `nom` due to previous error We cannot constrain Normally, this would be handled by adding Running
While |
General wisdom is that generics are used for input types and associated types are used for outputs. In our case, we are trying to make the compiler happy for rust-bakery/nom#1535
Switching to associated types didn't cause any problems: winnow-rs/winnow#5 Switching Still fighting |
Thinking back through the latest problems I'm having and the first, I think I'm going to capture |
Got it past the constraint issue: winnow-rs/winnow#6 |
So I've gotten further with winnow-rs/winnow#6 and am looking at going with a different direction. As a recap, my goals are
With winnow-rs/winnow#6, I'm trying to solve that by forking the Problems I've run into include:
✅ The goals are being met. Some things I think I like (not fully decided on all of them)
Overall, I think I should step back and evaluate this compared to what I think makes
In considering alternatives designs, something I keep coming back to is that complete vs stream is an attribute of the
As an alternative, what if we had: pub trait Stream {
fn is_complete() -> bool;
}
pub struct Complete<Input>(Input);
// TODO: Also implement all relevant input traits
impl<Input> Stream for Complete<Input> {
#[inline]
fn is_complete() -> bool {
true
}
}
pub struct Incomplete<Input>(Input);
// TODO: Also implement all relevant input traits
impl<Input> Stream for Incomplete<Input> {
#[inline]
fn is_complete() -> bool {
false
}
}
// TODO: Also implement all relevant input traits
impl<Token> Stream for &[Token] {
#[inline]
fn is_complete() -> bool {
false
}
} Now, each of the parsers that can be either complete or streaming will add an additional constraint ( Evaluating this compared to what makes
Compared to problems with winnow-rs/winnow#6
Compared to my goals
I think this strikes a balance worth exploring and I'll a crack at this next. |
winnow-rs/winnow#28 seems to be working well! I think it can be used to resolve this Issue and unblock other work (like implementing |
btw since I did my write up, I ended up going in the direction of making the The most obvious downside is that it requires an extra trait parameter. When implementing |
Right now,
nom
has distinct types for complete and streaming parsers.Problems with this
streaming
/complete
is in the root #1414 would move them around)&str
because one of the two has to be offered (Reduce syntactic noise #1408)Finish::finish
will panic on IncompleteWhat if a type could implement have both types of parsing on it?
The text was updated successfully, but these errors were encountered: