Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add std::io::Write adapter for Hashers #309

Closed
SUPERCILEX opened this issue Dec 4, 2023 · 7 comments
Closed

Add std::io::Write adapter for Hashers #309

SUPERCILEX opened this issue Dec 4, 2023 · 7 comments
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@SUPERCILEX
Copy link

Proposal

Problem statement

It would be nice to be able to use functions like io::copy to write into a Hasher.

Motivating examples or use cases

io::copy(
    &mut File::open(entry.path()).unwrap(),
    &mut HashWriteAdapter::new(&mut hasher),
)
.unwrap();

Solution sketch

Rough sketch similar to rust-lang/rust#104389:

struct HashWriteAdapter<'a, H> {
    hasher: &'a mut H,
}

impl<'a, H> HashWriteAdapter<'a, H> {
    pub fn new(hasher: &'a mut H) -> Self {
        Self { hasher }
    }
}

impl<'a, H: Hasher> io::Write for HashWriteAdapter<'a, H> {
    fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
        self.hasher.write(buf);
        Ok(buf.len())
    }

    fn flush(&mut self) -> io::Result<()> {
        Ok(())
    }
}

Alternatives

Everybody writes their own adapters.

Links and related work

Similar to rust-lang/rust#104389.

This is clearly a common need as someone ran into the same issue 5 years ago: https://stackoverflow.com/questions/48533445/proper-way-to-hash-a-reader-in-rust

@SUPERCILEX SUPERCILEX added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Dec 4, 2023
@the8472
Copy link
Member

the8472 commented Dec 6, 2023

If you want to hash files wouldn't Digest be better than Hash/Hasher? The latter is designed for hashmaps, not for arbitrary cryptographic operations.

@SUPERCILEX
Copy link
Author

No? I don't want to pull in a third party lib and I'm also using the same hasher to hash structs. Hasher is the easiest way to go since a u64 output is good enough for what I need.

@the8472
Copy link
Member

the8472 commented Dec 6, 2023

easieast to use does not mean it is intended for that use

@SUPERCILEX
Copy link
Author

Sure, but this is fixating on the example. Hasher has a method for writing a byte array and the io APIs offer a nice way to compose loading and writing byte arrays which can come from anywhere including memory.

seahash includes an adapter: https://gitlab.redox-os.org/redox-os/seahash/-/blob/master/src/impl_std.rs?ref_type=heads#L5-13
xxhash includes file hashing too: https://github.com/shepmaster/twox-hash/blob/main/src/bin/hash_file.rs


All I'm hearing is "your use case is invalid, go away" which is quite frustrating. I'm clearly not the only one who wants this and I don't see how it hurts. We'd need to figure out where to put things, but at the end of the day Hasher has a method to write bytes and so does the io Write trait, so being able to compose those two traits is a pretty obvious improvement. Since those two traits can't change, I don't see how we'd regret being able to compose them as long as the adapter is in the right place.

@the8472
Copy link
Member

the8472 commented Dec 6, 2023

Hasher is designed to repeatedly hash data structures for indexing. Hashing a Read (via io::copy) consumes it which makes the hash non-reproducible.

One of the requirements of Hash is that it is consistent with Eq: https://doc.rust-lang.org/nightly/std/hash/trait.Hash.html#hash-and-eq

The shape happening to be right doesn't mean it's a good choice for the purpose. E.g. Result looks a lot like Either and yet Either is a separate type in the ecosystem because it's semantically different.

Hash's prefix-collision-avoidance also makes it behaviorally quite different from Digest. I.e. feeding data in via io::copy can have unpredictable outputs depending on how the copying chunks the data.

Sure, but this is fixating on the example.

Any line of code added is a line of code to be maintained. Individually it may be a small cost. But it still should be justified. So the justification should at least be good and consistent with the purpose of the API.

@SUPERCILEX
Copy link
Author

I don't think Result vs Either is a good comparison because that's adding a completely new type whereas this proposal is trying to compose two existing types. I still don't see how enabling the composition is bad, but I do agree that this is abusing the API somewhat.

Ideally what I want is just a simple interface that says here are some bytes, give me a hash. That doesn't make sense to add to the stdlib because it's too vague and implementation detail dependent, hence abusing Hasher.


I wonder if #133 and this proposal should just be put in an io-adapters crate? That feels pretty silly to me, but at the same time it avoids the issue of someone coming along and complaining that the stdlib provided an adapter for Hasher but didn't warn them that there are prefix shenanigans or other gotyas.

@SUPERCILEX
Copy link
Author

I went ahead and made a crate: https://docs.rs/io-adapters/0.1.0/io_adapters/. Still a little rough around the edges, but I'll clean things up and make an announcement on the fmt adapter issue once it's ready. This issue is probably moot, so closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

2 participants