Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: some possible SwissTable map benchmark improvements #70700

Open
thepudds opened this issue Dec 5, 2024 · 5 comments
Open

runtime: some possible SwissTable map benchmark improvements #70700

thepudds opened this issue Dec 5, 2024 · 5 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@thepudds
Copy link
Contributor

thepudds commented Dec 5, 2024

The current runtime map benchmarks (imported from the CockroachDB SwissTable repo and used for recent runtime map work) have clearly been useful, especially for comparing the optimizations that have happened so far, but it might be worth tweaking those benchmarks.

For example, for the current benchmarks:

  1. Most might be overly friendly to the branch predictor with repetitive key patterns, especially at lower elem counts.
  2. Most of the elem sizes are power-of-two, which translates to a relatively low load factor and might not make the tables "sweat" as much as they would in practice.
  3. Most of the benchmarks use a mod operation in the benchmark code, which might be around 1/3 of the CPU time for the benchmarks that fit in cache.
    • This is fine if someone has this in mind when comparing results, but might be cleaner without.
  4. Could probably benefit from some cold map cases (e.g., so many small or medium maps that they become cold) to better assess cache miss impact.
    • The original SwissTable folks and Folly F14 folks had collaborated on C++ benchmarks to help them compare approaches, with a sample of one of their cold map C++ benchmarks here. (A couple of years ago, I had a WIP set of cold map Go benchmarks here, partially based on that).

Part of my immediate interest is the current benchmarks might not capture some of the benefits / costs of some adjustments to the current implementation, like:

  • Trying 16-element group sizes (instead of the current 8), or
  • Alternative memory layouts (like putting all the control bytes for a single table together, rather than current interspersing of control bytes with the slots)

In particular, I have a draft change to the runtime SwissTable for 16-element group sizes (currently SWAR-only, not SIMD), but I suspect some of the benefit would be masked by the current benchmarks (maybe -- I don't know without tweaking the benchmarks 😅).

Separately, the benchmarks as they are might not be showing the advantages of the new implementation as clearly as they could compared to the old runtime map (including because "easier" benchmarks are probably making the prior map implementation look artificially better, vs. "easy" benchmarks not helping the SwissTable implementation as much).


All that said, maybe some of those concerns don't matter in practice or maybe I'm mistaken.

Filing this issue to help with any discussion (including perhaps any feedback of "not worth it").

I plan on sending some benchmark CLs to attempt to improve or mitigate above issues.

CC @prattmic

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Dec 5, 2024
@gabyhelp
Copy link

gabyhelp commented Dec 5, 2024

Related Issues

Related Code Changes

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 6, 2024
@mknyszek mknyszek added this to the Backlog milestone Dec 6, 2024
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/634396 mentions this issue: internal/runtime/maps: speed up small map lookups ~1.7x for unpredictable keys

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/634698 mentions this issue: runtime: remove somewhat expensive mod from map benchmarks

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/634700 mentions this issue: runtime: add hot key lookup benchmark for maps

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/634699 mentions this issue: runtime: shuffle keys to reduce impact of branch predictor in map benchmarks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

4 participants