Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consumer.run() panics when consume() requests more data than available #78

Closed
ngrewe opened this issue Aug 30, 2015 · 5 comments
Closed
Milestone

Comments

@ngrewe
Copy link
Contributor

ngrewe commented Aug 30, 2015

The run() method currently doesn't check correctly whether the producer has given it enough data to meet what the consume() method requested. You'll drop out of the data collection loop because you are at eof (same thing could happen because of a ProducerError as well), with needed > acc.len(), and then try to get a slice that extends beyond the end of the buffer.

This test case demonstrates the problem:

   struct TestConsumer {
       done : bool
   }

   impl Consumer for TestConsumer {
       fn end(&mut self) {
       }

  fn consume(&mut self, input: &[u8]) -> ConsumerState {
    if self.done {
        ConsumerState::ConsumerDone
    }  else if input.len() < 2 {
        ConsumerState::Await(0,2)
    } else {
        self.done = true;
        ConsumerState::ConsumerDone
       }
    }

   fn failed(&mut self, error_code: u32) {
        println!("failed with error code: {}", error_code);
   }
}

  #[test]
  fn overrun() {
      let mut p = MemProducer::new(&b"a"[..], 1);
      let mut c = TestConsumer{ done: false };
      c.run(&mut p);
      assert_eq!(c.done, false);
  }

The right thing would probably be to call failed(), but that usually takes error codes produced by the consumer as an argument, so I'm not sure what to do here.

@Geal
Copy link
Collaborator

Geal commented Aug 30, 2015

There are two points of view here:

  • the consumer needed that many bytes, and returning less than what is requested is an error, so it should fail
  • the consumer can decide whether there is enough data or not. Sometimes you request more data but don't know how much you need (ex: byte separators).

I would be more inclined to the second option, since it helps for more parsers. In both cases, though, it should not fail allocating the slice. I even think it should always return the largest slice possible (all of the accumulated bytes) and let consume() decide where to stop.

What do you think?

@ngrewe
Copy link
Contributor Author

ngrewe commented Aug 31, 2015

I agree: The second option is clearly superior in terms of flexibility. The issue that I see with that is that existing code might do something like if input.len() < bytes_needed { ConsumerState::Await(0,bytes_needed) } (or the equivalent thing by pulling the size from an IResult::Incomplete()), and if you repeatedly return the maximum available input if you don't get enough new data to fulfill the consumer's request such code would thrown into an endless loop (as opposed to panicking, as it does now). So you would have to track whether you've already asked for that input and not gotten it, which is clumsy – especially since run() already knows about that.

In short, run() would need a way to inform consume() about whether the last Await/Seek operation could be fulfilled. I can't think of a non-API breaking way to do that, but what might work would be to wrap the input in an enum telling consume() about the way the new input came to pass. This could be something like the following (off the top of my head, probably nonsense and/or needs some 'a sprinkled around). That would also be a nice opportunity to pass ProducerErrors down into consume():

enum InputChunk {
  /// The entire requested data was produced
  Complete(&[u8]),
  /// None or only a part of the requested data was produced
  Incomplete(&[u8]),
  /// Encountered an error while producing data, some data
  /// might still have been produced
  Error(&[u8], ProducerError)
}

Alternatively, you could have callbacks about incomplete input, but I really can't come up with an idea that feels optimal.

@Geal
Copy link
Collaborator

Geal commented Sep 7, 2015

FYI I fixed the crash in 9f5128c, but I will not modify much the current design of producers and consumers, since I plan to change them completely for the 1.0 version. They're rather slow, and not really flexible, and I want to change that. The good news is that your suggestions on a new design are very welcome :)

@Geal
Copy link
Collaborator

Geal commented Sep 7, 2015

Please indicate your suggestions in #80 :)

@Geal Geal added this to the 4.0 milestone Sep 6, 2017
@Geal
Copy link
Collaborator

Geal commented Oct 3, 2017

The stream feature will be removed in the next nom version (4.0), since commit 564a934. If someone wants to extract the code and maintain it, feel free to do it, it's in src/stream.rs. Otherwise, I would recommend that you check out the (currently in nightly) generators feature, which is much nicer to use. Here's an example of using nom with generators.

@Geal Geal closed this as completed Oct 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants