Truncating behavior is confusing and forces allocations #77

matklad · 2021-07-07T11:14:26Z

I expect the following test to pass:

#[test]
fn append() {
    let mut buf = "hello world".to_string();
    bs58::encode(&[92]).into(&mut buf).unwrap();
    assert_eq!("hello world2b", buf.as_str());
}

Instead, it fails, as the buf contains just "2b". That is, encoding discards existing data, rather than appending to it.

There are two problems with it:

it is surprising behavior. Standard library APIs like read_line always append. If overwriting is desired, the caller can call .clear()
it forces can force an allocation, if the user actually wants to append data to some existing buffer. This comes up when, for exmple, using sri-encoding hashes: "<algo-name>-<base58 encoded bytes>".

The text was updated successfully, but these errors were encountered:

Nemo157 · 2021-07-07T14:49:07Z

Agreed. Should be a pretty easy change and I think worth a breaking release.

Nemo157 · 2021-07-07T15:04:10Z

Would you expect the same when decoding into a &mut Vec<u8> (vs &mut [u8])?

#[test]
fn append() {
    let mut buf = b"hello world".to_owned();
    bs58::decode("a").into(&mut buf).unwrap();
    assert_eq!(b"hello world!", buf.as_ref());
}

#[test]
fn no_append() {
    let mut buf = b"hello world".to_owned();
    bs58::decode("a").into(buf.as_mut()).unwrap();
    assert_eq!(b"!ello world", buf.as_ref());
}

matklad · 2021-07-07T15:14:11Z

For Vec<u8>, I'd expect expect the same behavior as for String -- append the end.

For &mut [u8], I'd expect the same behavior as char::encode_utf8 -- overwrite the prefix, return the str slice of the data actually written.

Nemo157 · 2021-07-07T15:17:19Z

Returning an &str would require checking/asserting utf-8 validity at that point, if you're doing more ASCII-only processing on the buffer (or never actually asserting it is a string) then you might want to delay that.

matklad · 2021-07-07T15:23:17Z

Hm, I think base58 guarantees that the encoded result is utf8, so no additional validation is necessary? If this assumption is correct, that returning &mut str allows the calling code to avoid utf8-validation and bounds checking. In any case, returning just usize signifying the amount of bytes written would be fine as well. Maybe retuning usize is even better: I wager that the main benefit for char's return type is not actaully eliding the check, but just basic conveniecne for cases where you'd want to encode char to a local [u8; 4], and then do something with the resulting string.

Nemo157 · 2021-07-07T15:24:55Z

Yeah, it wouldn't need validating since the API guarantees it's ASCII, but I want to minimize the unsafe code here (currently the only unsafe code used is the bare minimum necessary to actually work with &mut str).

matklad mentioned this issue Jul 7, 2021

feat: Make CurveType accessible within Base58PublicKey near/near-sdk-rs#453

Merged

Nemo157 mentioned this issue Jul 7, 2021

Append data onto resizeable output buffers instead of truncating them #79

Closed

bors bot closed this as completed in d545e0e Jul 19, 2021

Nemo157 mentioned this issue Aug 14, 2022

encode: introduce EncodeBuilder::apply_to method #85

Closed

Nemo157 mentioned this issue May 23, 2023

crates release? #100

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Truncating behavior is confusing and forces allocations #77

Truncating behavior is confusing and forces allocations #77

matklad commented Jul 7, 2021

Nemo157 commented Jul 7, 2021

Nemo157 commented Jul 7, 2021

matklad commented Jul 7, 2021

Nemo157 commented Jul 7, 2021

matklad commented Jul 7, 2021

Nemo157 commented Jul 7, 2021

Truncating behavior is confusing and forces allocations #77

Truncating behavior is confusing and forces allocations #77

Comments

matklad commented Jul 7, 2021

Nemo157 commented Jul 7, 2021

Nemo157 commented Jul 7, 2021

matklad commented Jul 7, 2021

Nemo157 commented Jul 7, 2021

matklad commented Jul 7, 2021

Nemo157 commented Jul 7, 2021