Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncating behavior is confusing and forces allocations #77

Closed
matklad opened this issue Jul 7, 2021 · 6 comments
Closed

Truncating behavior is confusing and forces allocations #77

matklad opened this issue Jul 7, 2021 · 6 comments

Comments

@matklad
Copy link

matklad commented Jul 7, 2021

I expect the following test to pass:

#[test]
fn append() {
    let mut buf = "hello world".to_string();
    bs58::encode(&[92]).into(&mut buf).unwrap();
    assert_eq!("hello world2b", buf.as_str());
}

Instead, it fails, as the buf contains just "2b". That is, encoding discards existing data, rather than appending to it.

There are two problems with it:

  • it is surprising behavior. Standard library APIs like read_line always append. If overwriting is desired, the caller can call .clear()
  • it forces can force an allocation, if the user actually wants to append data to some existing buffer. This comes up when, for exmple, using sri-encoding hashes: "<algo-name>-<base58 encoded bytes>".
@Nemo157
Copy link
Member

Nemo157 commented Jul 7, 2021

Agreed. Should be a pretty easy change and I think worth a breaking release.

@Nemo157
Copy link
Member

Nemo157 commented Jul 7, 2021

Would you expect the same when decoding into a &mut Vec<u8> (vs &mut [u8])?

#[test]
fn append() {
    let mut buf = b"hello world".to_owned();
    bs58::decode("a").into(&mut buf).unwrap();
    assert_eq!(b"hello world!", buf.as_ref());
}

#[test]
fn no_append() {
    let mut buf = b"hello world".to_owned();
    bs58::decode("a").into(buf.as_mut()).unwrap();
    assert_eq!(b"!ello world", buf.as_ref());
}

@matklad
Copy link
Author

matklad commented Jul 7, 2021

For Vec<u8>, I'd expect expect the same behavior as for String -- append the end.

For &mut [u8], I'd expect the same behavior as char::encode_utf8 -- overwrite the prefix, return the str slice of the data actually written.

@Nemo157
Copy link
Member

Nemo157 commented Jul 7, 2021

Returning an &str would require checking/asserting utf-8 validity at that point, if you're doing more ASCII-only processing on the buffer (or never actually asserting it is a string) then you might want to delay that.

@matklad
Copy link
Author

matklad commented Jul 7, 2021

Hm, I think base58 guarantees that the encoded result is utf8, so no additional validation is necessary? If this assumption is correct, that returning &mut str allows the calling code to avoid utf8-validation and bounds checking. In any case, returning just usize signifying the amount of bytes written would be fine as well. Maybe retuning usize is even better: I wager that the main benefit for char's return type is not actaully eliding the check, but just basic conveniecne for cases where you'd want to encode char to a local [u8; 4], and then do something with the resulting string.

@Nemo157
Copy link
Member

Nemo157 commented Jul 7, 2021

Yeah, it wouldn't need validating since the API guarantees it's ASCII, but I want to minimize the unsafe code here (currently the only unsafe code used is the bare minimum necessary to actually work with &mut str).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants