Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected test result for test encoding of empty input #16

Closed
cmcqueen opened this issue Jun 17, 2022 · 4 comments
Closed

Unexpected test result for test encoding of empty input #16

cmcqueen opened this issue Jun 17, 2022 · 4 comments

Comments

@cmcqueen
Copy link

What should be the COBS encoding for an empty input? You could argue for either an empty output, or an output consisting of the single byte [ 0x01 ]. In my work on COBS encoding for Python and C, [ 0x01 ] seems best—in a comms protocol it allows "empty frame" to be distinguished from "nothing at all", which can be useful for some protocols.

So it's worth adding a test case for empty input. So I tried adding to tests/test.rs:

#[test]
fn test_encode_0() {
    test_pair(vec![], vec![1])
}

But that unexpectedly failed, returning an encoding of [ 128 ].

@jamesmunns
Copy link
Owner

Hey @cmcqueen, thanks for opening this! At least for now - cobs.rs chooses the "empty output" choice. What you saw in your test above is a limitation in the test_pair function, which doesn't seem to handle empty inputs correctly.

I've added a test that verifies the current behavior:

#[test]
fn test_encode_0() {
    // An empty input is encoded as no characters.
    let mut output = [0xFFu8; 16];
    let used = encode(&[], &mut output);
    assert_eq!(used, 0);
    assert_eq!(output.as_slice(), &[0xFFu8; 16]);
}

If you'd like to propose switching to the other behavior, I'm open to it! Though it would need to be a new method, OR would be a breaking change to the public API, I'm not against it though if someone feels strongly about it. Please feel free to open up a new issue to discuss that change if you'd like.

@cmcqueen
Copy link
Author

If you'd like to propose switching to the other behavior, I'm open to it!

The choice is somewhat philosophical, somewhat pragmatic.

Pragmatically, I have used COBS (or COBS/R) to send binary packets over an RS-232 serial interface. In a steady stream of packets, I put a single 0x00 byte between COBS-encoded packets. But after some idle time, I like the TX side to send a 0x00 byte at the start of a packet also. That reduces the chance of the packet being corrupted at the RX side by line noise during the idle time. So in that case, the RX side considers adjacent 0x00 bytes to have literally "no packet" between them.

On the other hand, I have also wanted to send a high-level lightweight heartbeat packet which is simply an "empty packet", and the RX end responds likewise with an "empty packet" as a heartbeat response.

Putting these two goals together, it makes sense to be able to encode an "empty packet" and distinguish it from "no packet". So, on the TX side the COBS encoder encodes an empty packet as a single byte 0x01. On the RX side it receives the 0x01 demarcated by 0x00, and decodes that as an empty packet.

If an empty packet were encoded to an empty packet, then the receiver couldn't distinguish between "empty packet" and "no packet". Pragmatically and philosophically, it's useful to be able to make that distinction.

@jamesmunns
Copy link
Owner

That's fair! My comment above is mostly that the change you are proposing would be a breaking change, because it would change the existing behavior. However it might be worth it to do it for 0.3!

As a note, on the DECODE side, you can see the difference here, if you use the _with_report function, which provides the number of output AND input bytes used during decode. This test case passes, for example.

#[test]
fn test_decode_0() {
    let mut output = [0xFFu8; 16];
    let output_used = decode(&[0x01, 0x00], &mut output).unwrap();
    assert_eq!(output_used, 0);
    assert_eq!(output.as_slice(), &[0xFFu8; 16]);

    // NOTE: "src_used" is The number of source bytes used,
    // NOT INCLUDING the sentinel byte, if there was one.

    let mut buffer = [0x01, 0x00];
    let report_1 = decode_in_place_report(&mut buffer).unwrap();
    assert_eq!(report_1.src_used, 1);
    assert_eq!(report_1.dst_used, 0);

    let mut buffer = [0x00];
    let report_2 = decode_in_place_report(&mut buffer).unwrap();
    assert_eq!(report_2.src_used, 0);
    assert_eq!(report_2.dst_used, 0);
}

@jamesmunns
Copy link
Owner

You could decide to "manually" send the [0x01, 0x00] sequence, though you might need to handle that manually (or with the _report function) to determine the different cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants