Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: replace Map implementation with HashTrieMap #70683

Closed
mknyszek opened this issue Dec 4, 2024 · 4 comments
Closed

sync: replace Map implementation with HashTrieMap #70683

mknyszek opened this issue Dec 4, 2024 · 4 comments
Assignees
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@mknyszek
Copy link
Contributor

mknyszek commented Dec 4, 2024

HashTrieMap was added for the unique package, but it turns out that it's faster than Map in many cases, including its own microbenchmarks.

This work was already done during the Go 1.24 cycle, culminating in https://go.dev/cl/608335. This issue just captures the work that was done so that it can be referenced.

Microbenchmarks

/gomaxprocs:
                                │    before    │                after                 │
                                │    sec/op    │    sec/op      vs base               │
MapLoadMostlyHits                  31.31n ± 1%    33.48n ±  3%   +6.93% (p=0.002 n=6)
MapLoadMostlyMisses                28.66n ± 3%    20.81n ±  6%  -27.36% (p=0.002 n=6)
MapLoadOrStoreBalanced             265.8n ± 7%    233.5n ±  2%  -12.14% (p=0.002 n=6)
MapLoadOrStoreUnique               510.9n ± 7%    441.6n ±  3%  -13.56% (p=0.002 n=6)
MapLoadOrStoreCollision            20.11n ± 0%    15.63n ±  1%  -22.30% (p=0.002 n=6)
MapLoadAndDeleteBalanced           24.61n ± 2%    20.85n ±  2%  -15.26% (p=0.002 n=6)
MapLoadAndDeleteUnique             23.63n ± 4%    25.19n ±  4%   +6.56% (p=0.009 n=6)
MapLoadAndDeleteCollision          386.5n ± 6%    120.4n ±  6%  -68.84% (p=0.002 n=6)
MapRange                          12.050µ ± 1%    7.672µ ±  2%  -36.33% (p=0.002 n=6)
MapAdversarialAlloc               161.05n ± 5%    53.16n ±  6%  -66.99% (p=0.002 n=6)
MapAdversarialDelete              123.75n ± 8%    27.69n ±  1%  -77.62% (p=0.002 n=6)
MapDeleteCollision                 12.92n ± 2%    12.69n ±  3%        ~ (p=0.065 n=6)
MapSwapCollision                   70.88n ± 3%    44.49n ±  6%  -37.22% (p=0.002 n=6)
MapSwapMostlyHits                  78.25n ± 5%    81.82n ±  6%   +4.56% (p=0.004 n=6)
MapSwapMostlyMisses                531.2n ± 6%    154.8n ±  6%  -70.85% (p=0.002 n=6)
MapCompareAndSwapCollision         107.6n ± 2%    129.9n ±  5%  +20.72% (p=0.002 n=6)
MapCompareAndSwapNoExistingKey     23.44n ± 6%    24.57n ±  3%        ~ (p=0.180 n=6)
MapCompareAndSwapValueNotEqual     24.40n ± 1%    22.31n ±  3%   -8.56% (p=0.002 n=6)
MapCompareAndSwapMostlyHits        82.29n ± 6%    90.62n ±  8%  +10.13% (p=0.002 n=6)
MapCompareAndSwapMostlyMisses      53.72n ± 5%    49.79n ±  6%   -7.33% (p=0.002 n=6)
MapCompareAndDeleteCollision       80.29n ± 4%   105.40n ±  7%  +31.27% (p=0.002 n=6)
MapCompareAndDeleteMostlyHits      118.6n ± 6%    182.3n ± 12%  +53.82% (p=0.002 n=6)
MapCompareAndDeleteMostlyMisses    43.07n ± 7%    38.99n ±  6%   -9.48% (p=0.002 n=6)
MapClear                           193.7n ± 2%    134.4n ±  1%  -30.60% (p=0.002 n=6)
geomean                            91.17n         69.91n        -23.32%

/gomaxprocs: 2
                                │     before     │                 after                 │
                                │     sec/op     │    sec/op      vs base                │
MapLoadMostlyHits                  16.05n ±   1%    17.10n ±  1%    +6.48% (p=0.002 n=6)
MapLoadMostlyMisses                14.54n ±   2%    10.53n ±  2%   -27.55% (p=0.002 n=6)
MapLoadOrStoreBalanced             354.9n ±  10%    128.9n ±  7%   -63.67% (p=0.002 n=6)
MapLoadOrStoreUnique               634.0n ±  13%    247.1n ±  6%   -61.04% (p=0.002 n=6)
MapLoadOrStoreCollision           10.125n ±  85%    7.887n ±  1%   -22.10% (p=0.002 n=6)
MapLoadAndDeleteBalanced           25.70n ±  51%    10.64n ±  5%   -58.58% (p=0.002 n=6)
MapLoadAndDeleteUnique             12.28n ±  78%    12.84n ±  4%         ~ (p=0.065 n=6)
MapLoadAndDeleteCollision          337.6n ±   8%    131.8n ± 47%   -60.94% (p=0.002 n=6)
MapRange                           6.100µ ±   2%    3.827µ ±  3%   -37.26% (p=0.002 n=6)
MapAdversarialAlloc               170.00n ±   2%    27.58n ±  4%   -83.78% (p=0.002 n=6)
MapAdversarialDelete              133.20n ±   4%    14.38n ± 11%   -89.20% (p=0.002 n=6)
MapDeleteCollision                 6.449n ± 119%    6.348n ±  1%    -1.57% (p=0.002 n=6)
MapSwapCollision                  100.80n ±  11%    81.76n ± 19%   -18.89% (p=0.015 n=6)
MapSwapMostlyHits                  56.65n ±  12%    52.62n ± 20%         ~ (p=0.310 n=6)
MapSwapMostlyMisses                624.8n ±   4%    235.8n ± 11%   -62.25% (p=0.002 n=6)
MapCompareAndSwapCollision         105.1n ±  24%    100.5n ± 35%         ~ (p=1.000 n=6)
MapCompareAndSwapNoExistingKey     12.61n ±  93%    12.28n ±  7%         ~ (p=0.180 n=6)
MapCompareAndSwapValueNotEqual     12.25n ±   1%    11.21n ±  8%    -8.53% (p=0.002 n=6)
MapCompareAndSwapMostlyHits        66.34n ±  21%    64.94n ± 19%         ~ (p=0.937 n=6)
MapCompareAndSwapMostlyMisses      47.60n ±  13%    26.30n ±  8%   -44.74% (p=0.002 n=6)
MapCompareAndDeleteCollision       46.10n ±  87%   132.50n ±  9%  +187.39% (p=0.002 n=6)
MapCompareAndDeleteMostlyHits      105.2n ±  12%    186.6n ± 20%   +77.41% (p=0.002 n=6)
MapCompareAndDeleteMostlyMisses    40.77n ±  10%    21.54n ±  8%   -47.17% (p=0.002 n=6)
MapClear                           252.8n ±  11%    128.7n ± 22%   -49.11% (p=0.002 n=6)
geomean                            72.61n           46.91n         -35.39%

/gomaxprocs: 4
                                │     before      │                 after                 │
                                │     sec/op      │    sec/op      vs base                │
MapLoadMostlyHits                   7.870n ±   1%    8.415n ±  3%    +6.93% (p=0.002 n=6)
MapLoadMostlyMisses                 7.210n ±   1%    5.314n ±  2%   -26.28% (p=0.002 n=6)
MapLoadOrStoreBalanced             360.10n ±  18%    71.78n ±  2%   -80.07% (p=0.002 n=6)
MapLoadOrStoreUnique                707.2n ±  18%    135.2n ±  4%   -80.88% (p=0.002 n=6)
MapLoadOrStoreCollision             5.089n ± 201%    3.963n ±  1%   -22.11% (p=0.002 n=6)
MapLoadAndDeleteBalanced           17.045n ±  64%    5.280n ±  1%   -69.02% (p=0.002 n=6)
MapLoadAndDeleteUnique             14.250n ±  57%    6.452n ±  1%         ~ (p=0.368 n=6)
MapLoadAndDeleteCollision           19.34n ±  39%    23.31n ± 27%         ~ (p=0.180 n=6)
MapRange                            3.055µ ±   3%    1.918µ ±  2%   -37.23% (p=0.002 n=6)
MapAdversarialAlloc                245.30n ±   6%    14.90n ± 23%   -93.92% (p=0.002 n=6)
MapAdversarialDelete              143.550n ±   2%    8.184n ±  1%   -94.30% (p=0.002 n=6)
MapDeleteCollision                  9.199n ±  65%    3.165n ±  1%   -65.59% (p=0.002 n=6)
MapSwapCollision                    164.7n ±   7%    108.7n ± 36%   -34.01% (p=0.002 n=6)
MapSwapMostlyHits                   33.12n ±  15%    35.79n ±  9%         ~ (p=0.180 n=6)
MapSwapMostlyMisses                 604.5n ±   5%    280.2n ±  7%   -53.64% (p=0.002 n=6)
MapCompareAndSwapCollision          96.02n ±  40%    69.93n ± 24%   -27.17% (p=0.041 n=6)
MapCompareAndSwapNoExistingKey      6.345n ±   1%    6.202n ±  1%    -2.24% (p=0.002 n=6)
MapCompareAndSwapValueNotEqual      6.121n ±   3%    5.564n ±  4%    -9.09% (p=0.002 n=6)
MapCompareAndSwapMostlyHits         44.21n ±  13%    43.46n ± 11%         ~ (p=0.485 n=6)
MapCompareAndSwapMostlyMisses       33.51n ±   6%    13.51n ±  5%   -59.70% (p=0.002 n=6)
MapCompareAndDeleteCollision        27.85n ± 104%    31.02n ± 26%         ~ (p=0.180 n=6)
MapCompareAndDeleteMostlyHits       50.43n ±  33%   109.45n ±  8%  +117.03% (p=0.002 n=6)
MapCompareAndDeleteMostlyMisses     27.17n ±   7%    11.37n ±  3%   -58.14% (p=0.002 n=6)
MapClear                            300.2n ±   5%    124.2n ±  8%   -58.64% (p=0.002 n=6)
geomean                             50.38n           25.79n         -48.81%

Higher GOMAXPROCS values are omitted because they're not very useful. It's very unlikely that the data structure will be heavily contended on all cores in the same way in real programs; even 4 is pushing it.

Analysis

The load-hit case is slightly slower due to Swiss Tables improving the performance of the old sync.Map (the new implementation was slightly faster before those changes, when I initially benchmarked). Some benchmarks show a seemingly large slowdown, but that's mainly due to the fact that the new implementation shrinks promptly, whereas the old one shrinks in generations (the dirty map needs to be promoted).

The new implementation also avoids issues the old one had, like the ramp-up time of promoting a key to read-only (which isn't captured by the benchmarks) and likely lower contention when mutating disjoint sets of keys.

Feedback

If any of the slowdowns affect you, you can set GOEXPERIMENT=nosynchashtriemap at build time to revert to the old implementation, and please file a bug. We'd like the implementations to not cause slowdowns in real-world programs.

(We can possibly avoid some of these additional allocations by adding a sync.Pool for entry nodes to the map's structure, though this would mainly only help heavy uses of sync.Map and may not actually matter in practice.)

@mknyszek mknyszek self-assigned this Dec 4, 2024
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Dec 4, 2024
@mknyszek mknyszek added this to the Go1.24 milestone Dec 4, 2024
@mknyszek mknyszek added NeedsFix The path to resolution is known, but the work has not been done. and removed compiler/runtime Issues related to the Go compiler and/or runtime. labels Dec 4, 2024
@mknyszek mknyszek closed this as completed Dec 4, 2024
@mknyszek
Copy link
Contributor Author

mknyszek commented Jan 14, 2025

I was thinking about this analysis again and I was wrong to say that the old map did not shrink. I intended to say that in the benchmark it wouldn't shrink. The old map implementation indeed does shrink, it just takes much longer. Deleted entries from the read-only map are replaced with a tombstone ("expunged") and only fully cleared when a new dirty map is promoted to a read-only map. This requires enough misses to happen in the read-only map.

The new map doesn't have such a generational approach, and disjoint sets of keys can be added and removed at will, with the memory for those being promptly released. The downside is that programs (more likely microbenchmarks) that very frequently add and remove the same key are going to suffer. Again, I'm not terribly worried about this case.

I've updated the comment above.

@PeterBocan
Copy link

hi @mknyszek out of curiosity, can you explain what's the core of the "HashTrieMap" and how it differs from the original implementation?

@thepudds
Copy link
Contributor

thepudds commented Feb 5, 2025

Hi @PeterBocan , this new implementation was originally part of CL 573956 as an internal HashTrieMap to support the new unique package. You can look at that CL, including the code comments, and the associated #62483 issue.

For replacing the public facing sync.Map implementation, you can look at this stack of CLs, including some performance comments in CL 608335, and of course the performance numbers at the start of this issue here are useful as well.

There is also some related discussion comparing the old and new sync.Map implementations (starting in ~2 months ago in December) in #21035 and #21032.

The old implementation targeted a narrower set of use cases (e.g., see CL 33912 and #18177), especially some of the "read mostly" caches that were used in the stdlib.

After you look at some of the details, you could open a thread on the golang-nuts mailing list either with more specific follow-up questions, or even just post there with a summary of what you learned. (There are very knowledgeable people on golang-nuts, and it's a topic people tend to find interesting, so I think you'd get useful replies there).

@josharian
Copy link
Contributor

To add to @thepudds' history (nice), the very first experimental implementation in CL 33852 had an even more restricted API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants