Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting hashes without converting to unsigned numbers #44

Closed
wants to merge 1 commit into from

Conversation

devel59
Copy link

@devel59 devel59 commented Mar 25, 2023

Some DBMSs don't have unsigned numbers. For example, postgresql or sqlite. I suggest getting rid of the double conversion for such cases.

const hasher = await xxhash();
const signedHash = hasher.signed.h32("abcd");

@devel59
Copy link
Author

devel59 commented Mar 25, 2023

if (seed === undefined) {
   seed = defaultSeed;
}

This is to reduce the size of the build

@jungomi
Copy link
Owner

jungomi commented Mar 27, 2023

Okay, I was originally not quite sure if there is really a need for this and that it could possibly have some correctness issue since that would be misinterpreting the result. But after thinking a little about it, I can now see when it would be helpful. But it remains complicated as the interpretation of the bytes depends on the context it's used in, as JavaScript just doesn't have unsigned integers, exactly what you said about some of the DBMS that are presumably based on JS.

I'm not familiar how the DBMS store these values. Can you give an example how you would use these values? I'm asking for an example because I don't know how the values translate to the underlying DB, because if you need to read them as bytes, there is not really any difference between the two, just having to use a different method of setting them, which may or may not be possible with the specific libraries.

To give you an example what I mean for setting 32-bit to all 1, which is -1 when interpreted as a signed integer (i32) and 2 ** 32 - 1 for unsigned integers (u32):

// An array with two 32-bit words
const arr = new Uint32Array(2);
const view = new DataView(arr.buffer);

// Values for all bits to 1, interpreted as either an i32 or u32.
const i32 = -1;
const u32 = 2 ** 32 - 1;

// Set the signed integer to the first element (32-bit word).
view.setInt32(0, i32);
// Set the unsigned integer to the second element (32-bit word), i.e. offset of 4 bytes.
view.setUint32(4, u32);

// Both values are equal, just set differently
console.log(arr[0] == arr[1]); // => true

It's really tricky, because there is no clear way to handle it because of the limitations of JavaScript. But the reason I'm a bit hesitant, is primarily the fact, that interpreting the hash as a signed integer, is in fact a misinterpretation and in JS I would never expect anyone to purely look at the byte layout, which is isn't even defined, as every Number (not including BigInt) is internally a float.

Note: It would be nice to have benchmarks for the specific use case, to see whether there is even a meaningful difference.

@jungomi
Copy link
Owner

jungomi commented Mar 27, 2023

if (seed === undefined) {
   seed = defaultSeed;
}

This is to reduce the size of the build

Mhh, I wonder why that gets transpiled, default parameters are supported by all targets, I certainly do not like that the default parameters transpile to using arguments (which is also slow):

arguments.length > 0 && void 0 !== arguments[0] ? arguments[0] : 0

But instead of using the if-version, I'd much rather find out why that is transpiled and just make it use the actual feature rather than a workaround from long before this feature was implemented.

@devel59 devel59 closed this May 23, 2024
@devel59 devel59 deleted the signed branch May 23, 2024 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants