Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xxHash instead of FNV?! #246

Open
sjakobi opened this issue Apr 3, 2022 · 9 comments
Open

xxHash instead of FNV?! #246

sjakobi opened this issue Apr 3, 2022 · 9 comments

Comments

@sjakobi
Copy link
Member

sjakobi commented Apr 3, 2022

@Mistuke mentioned this algorithm in #118 (comment).

According to https://github.com/rurban/smhasher and https://github.com/Cyan4973/xxHash it compares nicely to FNV.

AFAIK GHC already uses this algorithm in its linker.

Has there already been any research towards using it in hashable?

@sjakobi
Copy link
Member Author

sjakobi commented Apr 3, 2022

One particular requirement of hashable is that lazy ByteString values and lazy Text values are hashed to the same output regardless of their internal chunking:

instance Hashable BL.ByteString where
hashWithSalt salt = finalise . BL.foldlChunks step (SP salt 0)
where
finalise (SP s l) = hashWithSalt s l
step (SP s l) bs = unsafeDupablePerformIO $
B.unsafeUseAsCStringLen bs $ \(p, len) -> do
s' <- hashPtrWithSalt p (fromIntegral len) s
return (SP s' (l + len))

instance Hashable TL.Text where
hashWithSalt salt = finalise . TL.foldlChunks step (SP salt 0)
where
finalise (SP s l) = hashWithSalt s l
step (SP s l) (T.Text arr off len) = SP
(hashByteArrayWithSalt (TA.aBA arr) (off `shiftL` 1) (len `shiftL` 1) s)
(l + len)

I wonder whether this property can be upheld when xxHash is used.

@Bodigrim
Copy link
Contributor

I wonder whether this property can be upheld when xxHash is used.

This is not a problem, xxHash supports stream hashing.

I did a quick experiment using https://hackage.haskell.org/package/xxhash-ffi and numbers are almost too good to be true:

All
  hash
    ByteString
      strict
        5:        OK
          9.91 ns ± 930 ps,       same as baseline
        8:        OK
          9.64 ns ± 834 ps,       same as baseline
        11:       OK
          11.0 ns ± 806 ps,       same as baseline
        40:       OK
          13.7 ns ± 826 ps, 54% less than baseline
        128:      OK
          17.0 ns ± 1.6 ns, 84% less than baseline
        512:      OK
          39.4 ns ± 1.8 ns, 92% less than baseline
        2^20:     OK
          63.0 μs ± 3.6 μs, 94% less than baseline

@phadej would you be open to using xxHash instead of FNV for instance Hashable ByteString? Or is depending on xxhash.c a no-go for hashable?

@phadej
Copy link
Contributor

phadej commented Mar 25, 2024

@Bodigrim it's an option to use xxhash for just hash of ByteString (and Text and ByteArray), but this issue is also talking about replacing the general hashing (i.e. general hashWithSalt)

The former is easily doable; the latter isn't without breaking stuff.

@phadej
Copy link
Contributor

phadej commented Mar 25, 2024

Or is depending on xxhash.c a no-go for hashable?

Probably GHC WASM/JS people will complain, I don't really care about these backends (my POV is that they don't exist yet), so i won't.

EDIT: they don't complain about current cbits, adding more (or actually, replacing) shouldn't be a huge issue.

@Bodigrim
Copy link
Contributor

I'm looking just for faster hashing of ByteArray and friends, I'm not interested in changes to hashInt / defaultHashWithSalt.

WASM should be fine, it can cope well with bundled C libraries.

@phadej would you like me to take a stab at this?

@phadej
Copy link
Contributor

phadej commented Mar 26, 2024

would you like me to take a stab at this?

No need. I'd end up redoing it anyway.

@Bodigrim
Copy link
Contributor

I'd end up redoing it anyway.

That's exactly why I asked ;)

FWIW I gained maintainership for xxhash-ffi and plan to update it in upcoming weeks.

@Bodigrim
Copy link
Contributor

With regards to WASM, I got it working fine at https://github.com/haskell-haskey/xxhash-ffi, so this should not be an issue.

Here are benchmarks after switching to XXH3:

All
  10B
    Data.Digest.XXHash (c, XXH3): OK
      10.1 ns ± 838 ps
    Data.Hashable (c, FNV?):      OK
      12.1 ns ± 862 ps
  1kB
    Data.Digest.XXHash (c, XXH3): OK
      71.8 ns ± 6.5 ns
    Data.Hashable (c, FNV?):      OK
      1.14 μs ± 104 ns
  4kB
    Data.Digest.XXHash (c, XXH3): OK
      290  ns ±  26 ns
    Data.Hashable (c, FNV?):      OK
      4.68 μs ± 412 ns
  1MB
    Data.Digest.XXHash (c, XXH3): OK
      175  μs ±  14 μs
    Data.Hashable (c, FNV?):      OK
      2.87 ms ± 214 μs

@Bodigrim
Copy link
Contributor

I've released xxhash-ffi-0.3 as a proof of concept. Potentially raw bindings in Data.Digest.XXHash.FFI.C could be separated into their own library, which hashable could depend upon, if it makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants