Unexpected test result for test encoding of empty input #16

cmcqueen · 2022-06-17T01:31:13Z

What should be the COBS encoding for an empty input? You could argue for either an empty output, or an output consisting of the single byte [ 0x01 ]. In my work on COBS encoding for Python and C, [ 0x01 ] seems best—in a comms protocol it allows "empty frame" to be distinguished from "nothing at all", which can be useful for some protocols.

So it's worth adding a test case for empty input. So I tried adding to tests/test.rs:

#[test]
fn test_encode_0() {
    test_pair(vec![], vec![1])
}

But that unexpectedly failed, returning an encoding of [ 128 ].

The text was updated successfully, but these errors were encountered:

jamesmunns · 2022-07-08T11:58:19Z

Hey @cmcqueen, thanks for opening this! At least for now - cobs.rs chooses the "empty output" choice. What you saw in your test above is a limitation in the test_pair function, which doesn't seem to handle empty inputs correctly.

I've added a test that verifies the current behavior:

#[test]
fn test_encode_0() {
    // An empty input is encoded as no characters.
    let mut output = [0xFFu8; 16];
    let used = encode(&[], &mut output);
    assert_eq!(used, 0);
    assert_eq!(output.as_slice(), &[0xFFu8; 16]);
}

If you'd like to propose switching to the other behavior, I'm open to it! Though it would need to be a new method, OR would be a breaking change to the public API, I'm not against it though if someone feels strongly about it. Please feel free to open up a new issue to discuss that change if you'd like.

cmcqueen · 2022-07-12T04:47:31Z

If you'd like to propose switching to the other behavior, I'm open to it!

The choice is somewhat philosophical, somewhat pragmatic.

Pragmatically, I have used COBS (or COBS/R) to send binary packets over an RS-232 serial interface. In a steady stream of packets, I put a single 0x00 byte between COBS-encoded packets. But after some idle time, I like the TX side to send a 0x00 byte at the start of a packet also. That reduces the chance of the packet being corrupted at the RX side by line noise during the idle time. So in that case, the RX side considers adjacent 0x00 bytes to have literally "no packet" between them.

On the other hand, I have also wanted to send a high-level lightweight heartbeat packet which is simply an "empty packet", and the RX end responds likewise with an "empty packet" as a heartbeat response.

Putting these two goals together, it makes sense to be able to encode an "empty packet" and distinguish it from "no packet". So, on the TX side the COBS encoder encodes an empty packet as a single byte 0x01. On the RX side it receives the 0x01 demarcated by 0x00, and decodes that as an empty packet.

If an empty packet were encoded to an empty packet, then the receiver couldn't distinguish between "empty packet" and "no packet". Pragmatically and philosophically, it's useful to be able to make that distinction.

jamesmunns · 2022-07-12T13:49:35Z

That's fair! My comment above is mostly that the change you are proposing would be a breaking change, because it would change the existing behavior. However it might be worth it to do it for 0.3!

As a note, on the DECODE side, you can see the difference here, if you use the _with_report function, which provides the number of output AND input bytes used during decode. This test case passes, for example.

#[test]
fn test_decode_0() {
    let mut output = [0xFFu8; 16];
    let output_used = decode(&[0x01, 0x00], &mut output).unwrap();
    assert_eq!(output_used, 0);
    assert_eq!(output.as_slice(), &[0xFFu8; 16]);

    // NOTE: "src_used" is The number of source bytes used,
    // NOT INCLUDING the sentinel byte, if there was one.

    let mut buffer = [0x01, 0x00];
    let report_1 = decode_in_place_report(&mut buffer).unwrap();
    assert_eq!(report_1.src_used, 1);
    assert_eq!(report_1.dst_used, 0);

    let mut buffer = [0x00];
    let report_2 = decode_in_place_report(&mut buffer).unwrap();
    assert_eq!(report_2.src_used, 0);
    assert_eq!(report_2.dst_used, 0);
}

jamesmunns · 2022-07-12T13:50:38Z

You could decide to "manually" send the [0x01, 0x00] sequence, though you might need to handle that manually (or with the _report function) to determine the different cases.

jamesmunns closed this as completed in 674fd03 Jul 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected test result for test encoding of empty input #16

Unexpected test result for test encoding of empty input #16

cmcqueen commented Jun 17, 2022

jamesmunns commented Jul 8, 2022

cmcqueen commented Jul 12, 2022

jamesmunns commented Jul 12, 2022

jamesmunns commented Jul 12, 2022

Unexpected test result for test encoding of empty input #16

Unexpected test result for test encoding of empty input #16

Comments

cmcqueen commented Jun 17, 2022

jamesmunns commented Jul 8, 2022

cmcqueen commented Jul 12, 2022

jamesmunns commented Jul 12, 2022

jamesmunns commented Jul 12, 2022