Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getUInt returns invalid values #3

Closed
mattmook opened this issue Jan 22, 2022 · 1 comment
Closed

getUInt returns invalid values #3

mattmook opened this issue Jan 22, 2022 · 1 comment

Comments

@mattmook
Copy link

mattmook commented Jan 22, 2022

Hi @goncalossilva,

There's currently a bug in getUInt when reading values that are negative bytes.

For example, consider the following:

val bytes = byteArrayOf(0x2e, 0xf6.toByte())

println(bytes.getUInt(0)) // prints 46
println(bytes.getUInt(1)) // prints 4294967286 but should be 246

This looks as simple as just ensuring the value is masked, i.e. private fun ByteArray.getUInt(index: Int) = get(index).toUInt() and 0xffu although I've not throughly tested this. I assume the same needs to be applied to getULong too.

The test suite is probably not broad enough in terms of the byte values under test; which is why it doesn't pick up this issue. I'm currently in the process of adding an incremental version of murmurhash into my own Multiplatform library, cryptohash and while looking into that came across this test suite from Apache Commons Codec which helped highlight this issue.

From a testing point of view, with a fix in place, hash32x86 with a zero seed would be expected to return 0x368f9df1 for the 2-byte array above.

@goncalossilva
Copy link
Owner

goncalossilva commented Jan 22, 2022

Hey @mattmook, good to hear from you!

Good catch, thank you for taking the time. I think an idiomatic way of addressing it is to convert:

private fun ByteArray.getUInt(index: Int) = get(index).toUInt()
private fun ByteArray.getULong(index: Int) = get(index).toULong()

into:

private fun ByteArray.getUInt(index: Int) = get(index).toUByte().toUInt()

private fun ByteArray.getULong(index: Int) = get(index).toUByte().toULong()

Which behind the scenes does the masking you suggest, but keeps it “Kotlin-esque” by leveraging the expressive conversion methods from/to signed/unsigned that the platform provides.

You're right that the test suite should have caught these. The wordlist is extensive, but it's most;y "a" to "z", i.e., "\u0041" through "\u005a", which conveniently misses on all of these edge cases. I didn't consider this before, but your Apache Commons Codec share looks like a great resource (good license, too). I'll likely borrow bits of it.

Fix coming later today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants