Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up division by bucket's size #28

Merged
merged 1 commit into from Mar 22, 2024

Conversation

Bodigrim
Copy link
Contributor

By choosing primes which allow for faster division, we can shave off some time.

The exact performance depends on the architecture. The worst possible case is aarch64, where
(due to GHC shortcomings) we cannot benefit from faster division algorithms at all.
Nevertheless, there are some modest gains, mostly because the divisor is now passed unboxed
and remainders forced, so they are unboxed Int# as well. Here are numbers on macOS M2:

Comparison
  1000000
    insert
      vector-hashtables boxed:        OK
        35.6 ms ± 590 μs,  3% less than baseline
      vector-hashtables unboxed keys: OK
        25.2 ms ± 486 μs,       same as baseline
      vector-hashtables:              OK
        6.59 ms ± 115 μs,       same as baseline
    insert (resize)
      vector-hashtables boxed:        OK
        47.1 ms ± 480 μs, 10% less than baseline
      vector-hashtables unboxed keys: OK
        32.9 ms ± 498 μs, 10% less than baseline
      vector-hashtables:              OK
        9.79 ms ± 110 μs, 11% less than baseline
    insert, delete
      vector-hashtables:              OK
        10.4 ms ±  55 μs,  1% less than baseline
    find
      vector-hashtables:              OK
        4.28 ms ±  53 μs,  2% less than baseline
      vector-hashtables (frozen):     OK
        1.28 ms ±  15 μs, 14% less than baseline
    lookupIndex
      vector-hashtables:              OK
        2.40 ms ±  31 μs, 11% less than baseline
    fromList
      vector-hashtables: OK
        12.6 ms ±  75 μs,  7% less than baseline
    toList
      vector-hashtables:              OK
        56.7 ms ± 1.4 ms,       same as baseline

Running the same benchmark on x86_64 demonstrates much more pronounced benefits, up to 3x faster:

Comparison
  1000000
    insert
      vector-hashtables boxed:        OK
        44.8 ms ± 370 μs,  9% less than baseline
      vector-hashtables unboxed keys: OK
        32.2 ms ± 800 μs,  9% less than baseline
      vector-hashtables:              OK
        10.5 ms ±  55 μs, 26% less than baseline
    insert (resize)
      vector-hashtables boxed:        OK
        57.5 ms ± 223 μs, 18% less than baseline
      vector-hashtables unboxed keys: OK
        40.3 ms ± 295 μs, 24% less than baseline
      vector-hashtables:              OK
        16.2 ms ±  22 μs, 41% less than baseline
    insert, delete
      vector-hashtables:              OK
        16.2 ms ±  86 μs, 36% less than baseline
    find
      vector-hashtables:              OK
        3.91 ms ±  18 μs, 62% less than baseline
      vector-hashtables (frozen):     OK
        2.44 ms ±  14 μs, 68% less than baseline
    lookupIndex
      vector-hashtables:              OK
        3.78 ms ±  26 μs, 57% less than baseline
    fromList
      vector-hashtables: OK
        26.9 ms ± 157 μs, 14% less than baseline
    toList
      vector-hashtables:              OK
        61.7 ms ± 157 μs,  1% less than baseline

@klapaucius
Copy link
Owner

    find
      vector-hashtables:              OK
        3.91 ms ±  18 μs, 62% less than baseline
      vector-hashtables (frozen):     OK
        2.44 ms ±  14 μs, 68% less than baseline

That's a really impressive result.

    insert
      vector-hashtables:              OK
        6.59 ms ± 115 μs,       same as baseline
    insert (resize)
      vector-hashtables:              OK
        9.79 ms ± 110 μs, 11% less than baseline

That's probably because the new sequence of sizes grows a bit faster, so fewer resizes happened. Which isn't a bad thing.

@Bodigrim
Copy link
Contributor Author

@klapaucius @swamp-agr @ulysses4ever anything else you'd like me to do here? Or is it acceptable?

@swamp-agr swamp-agr merged commit 8c49926 into klapaucius:master Mar 22, 2024
8 checks passed
@swamp-agr
Copy link
Collaborator

Impressive. Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants