-
Notifications
You must be signed in to change notification settings - Fork 31
Add HashingSource and HashingSink #756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
runtime/hashing/build.gradle.kts
Outdated
| commonMain { | ||
| dependencies { | ||
| implementation(project(":runtime:utils")) | ||
| implementation(project(":runtime:io")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: This adds a dependency on I/O, which seems like a potentially unnecessary coupling -- would this HashingSink and HashingSource be better located in the io module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe they'd be better off in io.
| * @param hash The [HashFunction] to hash data with | ||
| * @param sink the [SdkSink] to hash | ||
| */ | ||
| public class HashingSink(public val hash: HashFunction, sink: SdkSink) : HashFunction by hash, SdkSinkObserver(sink) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correctness: Why implement HashFunction? Is there a use case for calling update/reset on a sink? Will we ever pass a HashingSink as an argument to ByteArray.hash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no particular reason, and you're right, there are no strong use cases for directly calling update / reset on this sink, so I'll remove this implementation
runtime/hashing/build.gradle.kts
Outdated
| commonMain { | ||
| dependencies { | ||
| implementation(project(":runtime:utils")) | ||
| implementation(project(":runtime:io")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe they'd be better off in io.
| public class HashingSink(public val hash: HashFunction, sink: SdkSink) : HashFunction by hash, SdkSinkObserver(sink) { | ||
| override fun observe(data: ByteArray, offset: Int, length: Int) { | ||
| update(data, offset, length) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correctness: This largely duplicates the private class in Canonicalizer.kt. We should remove that and update the surrounding code to use this.
| @Test | ||
| fun testHashingSinkDigest() = run { | ||
| val byteArray = ByteArray(1024) { 0xf } | ||
| val buffer = SdkBuffer() | ||
| buffer.write(byteArray) | ||
|
|
||
| val hashingSink = HashingSink(Crc32c(), SdkSink.blackhole()) | ||
|
|
||
| val expectedHash = Crc32c() | ||
|
|
||
| assertEquals(expectedHash.digest().decodeToString(), hashingSink.digest().decodeToString()) | ||
| hashingSink.write(buffer, buffer.size) | ||
| expectedHash.update(byteArray) | ||
| assertEquals(expectedHash.digest().decodeToString(), hashingSink.digest().decodeToString()) | ||
| } | ||
|
|
||
| @Test | ||
| fun testCrcSinkDigest() = run { | ||
| val byteArray = ByteArray(1024) { 0xf } | ||
| val buffer = SdkBuffer() | ||
| buffer.write(byteArray) | ||
|
|
||
| val hashingSink = HashingSink(Crc32(), SdkSink.blackhole()) | ||
|
|
||
| val expectedHash = Crc32() | ||
|
|
||
| assertEquals(expectedHash.digest().decodeToString(), hashingSink.digest().decodeToString()) | ||
| hashingSink.write(buffer, buffer.size) | ||
| expectedHash.update(byteArray) | ||
| assertEquals(expectedHash.digest().decodeToString(), hashingSink.digest().decodeToString()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Are these two tests identical?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They weren't initially, but I refactored and they ended up being basically the same, just using a different hash function. I will remove the duplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my eyes missed the Crc32 vs Crc32c. That's great to test both, maybe just refactor the test itself into a helper method that accepts a hash implementation.
| * @param hash The [HashFunction] to hash data with | ||
| * @param sink the [SdkSink] to hash | ||
| */ | ||
| public class HashingSink(public val hash: HashFunction, sink: SdkSink) : HashFunction by hash, SdkSinkObserver(sink) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment: Making the underlying hash publicly visible is inadvisable since that means its state can be mutated by object holders. We should probably make it private instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or use a HashSupplier instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I made this public was for the digestValue() use case for CRC hash functions. I'm not sure how to nicely provide this function if the underlying hash is private, but I understand the problem with mutation by object holders.
From what I see, HashSupplier wouldn't be great here because it returns a new HashFunction each invocation, while we want to use the same HashFunction object to get the hash value of the entire underlying Source/Sink.
For digestValue(), I could do this:
public class HashingSource(private val hash: HashFunction, source: SdkSource) : SdkSourceObserver(source) {
override fun observe(data: ByteArray, offset: Int, length: Int) {
hash.update(data, offset, length)
}
public fun digestValue() : UInt? = (hash as Crc32?)?.digestValue()
}But it seems too special-cased to CRC32 to be included in a generic HashingSource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a single unsigned-int digest value is very specific to CRC32. We shouldn't include it in anything that utilizes generic hash functionality. How will this be used eventually outside of things like tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's being used for event steams today. I'm adding digestValue() only to support event streams and change the usage of the original CrcSource implementation to this new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the event streams code need to treat the CRC as a UInt? Could it just as easily use byte arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible. Event streams sends the expected checksum value as UInt, so we need to either
- compare using digestValue()'s
UInt(how it's done today) - convert the given expected checksum from
UInttoByteArray, and compare usingdigest()instead
I didn't want to make too many changes in event streams besides replacing the CrcSource/Sink that was used there, which is why I went with option 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, we decided to use just digest()/ByteArray everywhere for hash functions including CRC32. We'll remove specialized digestValue() members for the source/sink.
| @Test | ||
| fun testHashingSourceDigest() = run { | ||
| val byteArray = ByteArray(1024) { 0xf } | ||
| val source = byteArray.source() | ||
| val hashingSource = HashingSource(Crc32c(), source) | ||
|
|
||
| val sink = SdkBuffer() | ||
|
|
||
| val expectedHash = Crc32c() | ||
| assertEquals(expectedHash.digest().decodeToString(), hashingSource.digest().decodeToString()) | ||
|
|
||
| hashingSource.read(sink, 1024) | ||
| expectedHash.update(byteArray) | ||
|
|
||
| assertEquals(expectedHash.digest().decodeToString(), hashingSource.digest().decodeToString()) | ||
| } | ||
|
|
||
| @Test | ||
| fun testCrcSourceDigest() = run { | ||
| val byteArray = ByteArray(1024) { 0xf } | ||
| val source = byteArray.source() | ||
| val hashingSource = HashingSource(Crc32(), source) | ||
|
|
||
| val sink = SdkBuffer() | ||
|
|
||
| val expectedHash = Crc32() | ||
| assertEquals(expectedHash.digest().decodeToString(), hashingSource.digest().decodeToString()) | ||
|
|
||
| hashingSource.read(sink, 1024) | ||
| expectedHash.update(byteArray) | ||
|
|
||
| assertEquals(expectedHash.digest().decodeToString(), hashingSource.digest().decodeToString()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Identical?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix and ship
| * @param hash The [HashFunction] to hash data with | ||
| * @param sink the [SdkSink] to hash | ||
| */ | ||
| public class HashingSink(public val hash: HashFunction, sink: SdkSink) : HashFunction by hash, SdkSinkObserver(sink) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably make these InternalApi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If anything I would create a CrcSource/Sink instead of trying to force them through HashingSource/Sink. It's going to look nearly identical but it won't be awkwardly special casing the generic hashing source/sink
| * @param hash The [HashFunction] to hash data with | ||
| * @param sink the [SdkSink] to hash | ||
| */ | ||
| public class HashingSink(private val hash: HashFunction, sink: SdkSink) : SdkSinkObserver(sink) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@InternalApi
| * Provides the digest value as an unsigned integer for the CRC32 function family. | ||
| * @return unsigned integer representing the value of the digest, if the [HashFunction] is [Crc32] or [Crc32c], and null otherwise. | ||
| */ | ||
| public fun digestValue() : UInt? = (hash as Crc32?)?.digestValue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either special case OR move this to an internal package as an extension function (would require making hash internal rather than private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix and ship
| @ParameterizedTest | ||
| @ValueSource(strings = ["crc32", "crc32c", "md5", "sha1", "sha256"]) | ||
| fun testHashingSinkDigest(hashFunctionName: String) = run { | ||
| val byteArray = ByteArray(1024) { 0xf } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix: let's use something that forces okio to cross segment boundaries, something odd like 19k would be 2 segments + a partial. Same for all source/sink tests
|
Kudos, SonarCloud Quality Gate passed!
|








Adds HashingSource and HashingSink, which hash their data as they are being read and written, respectively.
Issue #
Description of changes
This change is required to generalize the CrcSource to support any hashing function. It is planned to be used in the flexible checksums implementation.
Note on digestValue()
Crc32 and Crc32C have an additional function called
digestValue()which returns an unsigned int. To keep this function but also provide generic support for any hashing function, the HashingSource and HashingSink provide access to the underlying hash function as a member variable calledhash.With this, the user can cast the hash function to Crc32 and call the
digestValue()function.Example
An alternative design would be to create a class like
CrcHashingSourcewhich implementsHashingSourceand also providesdigestValue(), but I thought the first option was better.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.