Consumer.run() panics when consume() requests more data than available #78

ngrewe · 2015-08-30T05:12:00Z

The run() method currently doesn't check correctly whether the producer has given it enough data to meet what the consume() method requested. You'll drop out of the data collection loop because you are at eof (same thing could happen because of a ProducerError as well), with needed > acc.len(), and then try to get a slice that extends beyond the end of the buffer.

This test case demonstrates the problem:

   struct TestConsumer {
       done : bool
   }

   impl Consumer for TestConsumer {
       fn end(&mut self) {
       }

  fn consume(&mut self, input: &[u8]) -> ConsumerState {
    if self.done {
        ConsumerState::ConsumerDone
    }  else if input.len() < 2 {
        ConsumerState::Await(0,2)
    } else {
        self.done = true;
        ConsumerState::ConsumerDone
       }
    }

   fn failed(&mut self, error_code: u32) {
        println!("failed with error code: {}", error_code);
   }
}

  #[test]
  fn overrun() {
      let mut p = MemProducer::new(&b"a"[..], 1);
      let mut c = TestConsumer{ done: false };
      c.run(&mut p);
      assert_eq!(c.done, false);
  }

The right thing would probably be to call failed(), but that usually takes error codes produced by the consumer as an argument, so I'm not sure what to do here.

The text was updated successfully, but these errors were encountered:

Geal · 2015-08-30T10:11:58Z

There are two points of view here:

the consumer needed that many bytes, and returning less than what is requested is an error, so it should fail
the consumer can decide whether there is enough data or not. Sometimes you request more data but don't know how much you need (ex: byte separators).

I would be more inclined to the second option, since it helps for more parsers. In both cases, though, it should not fail allocating the slice. I even think it should always return the largest slice possible (all of the accumulated bytes) and let consume() decide where to stop.

What do you think?

ngrewe · 2015-08-31T07:40:24Z

I agree: The second option is clearly superior in terms of flexibility. The issue that I see with that is that existing code might do something like if input.len() < bytes_needed { ConsumerState::Await(0,bytes_needed) } (or the equivalent thing by pulling the size from an IResult::Incomplete()), and if you repeatedly return the maximum available input if you don't get enough new data to fulfill the consumer's request such code would thrown into an endless loop (as opposed to panicking, as it does now). So you would have to track whether you've already asked for that input and not gotten it, which is clumsy – especially since run() already knows about that.

In short, run() would need a way to inform consume() about whether the last Await/Seek operation could be fulfilled. I can't think of a non-API breaking way to do that, but what might work would be to wrap the input in an enum telling consume() about the way the new input came to pass. This could be something like the following (off the top of my head, probably nonsense and/or needs some 'a sprinkled around). That would also be a nice opportunity to pass ProducerErrors down into consume():

enum InputChunk {
  /// The entire requested data was produced
  Complete(&[u8]),
  /// None or only a part of the requested data was produced
  Incomplete(&[u8]),
  /// Encountered an error while producing data, some data
  /// might still have been produced
  Error(&[u8], ProducerError)
}

Alternatively, you could have callbacks about incomplete input, but I really can't come up with an idea that feels optimal.

Geal · 2015-09-07T12:08:22Z

FYI I fixed the crash in 9f5128c, but I will not modify much the current design of producers and consumers, since I plan to change them completely for the 1.0 version. They're rather slow, and not really flexible, and I want to change that. The good news is that your suggestions on a new design are very welcome :)

Geal · 2015-09-07T12:41:09Z

Please indicate your suggestions in #80 :)

Geal · 2017-10-03T10:57:13Z

The stream feature will be removed in the next nom version (4.0), since commit 564a934. If someone wants to extract the code and maintain it, feel free to do it, it's in src/stream.rs. Otherwise, I would recommend that you check out the (currently in nightly) generators feature, which is much nicer to use. Here's an example of using nom with generators.

Geal added this to the 4.0 milestone Sep 6, 2017

Geal closed this as completed Oct 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consumer.run() panics when consume() requests more data than available #78

Consumer.run() panics when consume() requests more data than available #78

ngrewe commented Aug 30, 2015

Geal commented Aug 30, 2015

ngrewe commented Aug 31, 2015

Geal commented Sep 7, 2015

Geal commented Sep 7, 2015

Geal commented Oct 3, 2017

Consumer.run() panics when consume() requests more data than available #78

Consumer.run() panics when consume() requests more data than available #78

Comments

ngrewe commented Aug 30, 2015

Geal commented Aug 30, 2015

ngrewe commented Aug 31, 2015

Geal commented Sep 7, 2015

Geal commented Sep 7, 2015

Geal commented Oct 3, 2017