Skip to content

runtime: some possible SwissTable map benchmark improvements #70700

@thepudds

Description

@thepudds

The current runtime map benchmarks (imported from the CockroachDB SwissTable repo and used for recent runtime map work) have clearly been useful, especially for comparing the optimizations that have happened so far, but it might be worth tweaking those benchmarks.

For example, for the current benchmarks:

  1. Most might be overly friendly to the branch predictor with repetitive key patterns, especially at lower elem counts.
  2. Most of the elem sizes are power-of-two, which translates to a relatively low load factor and might not make the tables "sweat" as much as they would in practice.
  3. Most of the benchmarks use a mod operation in the benchmark code, which might be around 1/3 of the CPU time for the benchmarks that fit in cache.
    • This is fine if someone has this in mind when comparing results, but might be cleaner without.
  4. Could probably benefit from some cold map cases (e.g., so many small or medium maps that they become cold) to better assess cache miss impact.
    • The original SwissTable folks and Folly F14 folks had collaborated on C++ benchmarks to help them compare approaches, with a sample of one of their cold map C++ benchmarks here. (A couple of years ago, I had a WIP set of cold map Go benchmarks here, partially based on that).

Part of my immediate interest is the current benchmarks might not capture some of the benefits / costs of some adjustments to the current implementation, like:

  • Trying 16-element group sizes (instead of the current 8), or
  • Alternative memory layouts (like putting all the control bytes for a single table together, rather than current interspersing of control bytes with the slots)

In particular, I have a draft change to the runtime SwissTable for 16-element group sizes (currently SWAR-only, not SIMD), but I suspect some of the benefit would be masked by the current benchmarks (maybe -- I don't know without tweaking the benchmarks 😅).

Separately, the benchmarks as they are might not be showing the advantages of the new implementation as clearly as they could compared to the old runtime map (including because "easier" benchmarks are probably making the prior map implementation look artificially better, vs. "easy" benchmarks not helping the SwissTable implementation as much).


All that said, maybe some of those concerns don't matter in practice or maybe I'm mistaken.

Filing this issue to help with any discussion (including perhaps any feedback of "not worth it").

I plan on sending some benchmark CLs to attempt to improve or mitigate above issues.

CC @prattmic

Metadata

Metadata

Assignees

Labels

NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.compiler/runtimeIssues related to the Go compiler and/or runtime.

Type

No type

Projects

Status

Todo

Relationships

None yet

Development

No branches or pull requests

Issue actions