Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: use zero byte as empty control word in maps (potential performance improvement) #70966

Open
colega opened this issue Dec 23, 2024 · 4 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@colega
Copy link
Contributor

colega commented Dec 23, 2024

Proposal Details

Context

The implementation of Swiss maps brought in Go 1.24 is more complex than previously seen implementations, but it retains one of the original details that everyone seems to implement in the same way: the control word used to denote an empty slot is 0b10000000 (and the one for deleted items is 0b11111110.

Problem

Every time a new table is allocated or it grows, it has to be filled with this specific "empty control word" pattern. For large maps, this is quite a lot of work.

Proposal

I propose to:

  • Use zero-byte 0b00000000 to denote an empty slot.
  • Use byte 1 (0b00000001) for deleted slots.
  • Add 2 to h2 to ensure that it always has some non-zero bit set in bits 1-7.

While writing this issue I saw a similar proposal in a TODO comment in the implementation.

// TODO(prattmic): Consider inverting the top bit so that the zero value is empty.
type ctrl uint8

What does this mean:

  • The most important thing: an zeroed group is filled with empty control words out of the box, so we don't have to do that manually on each allocation/growth of the table.
  • There's an extra operation to be done when splitting a key into the h1/h2 pair (plus two, my guess is no modern CPU will have a noticeable impact of that, but it's worth noting).
  • Depending on how the SIMD code is written, this may result in less operations needed.
    • When I proposed this to github.com/dolthub/swiss, I was able to use 2 less operations, although their version appears to be longer than the one implemented here in stdlib, but it's also designed for 16-element groups, which isn't done in Go yet.
    • I checked the code in stdlib and I don't even understand why it works yet1.

I previously sent a PR to dolthub's swiss map implementation implementing this change, and the benchmarks were quite promising (although I didn't test the SIMD path, since I was benchmarking on arm64).

Given that Go's version is quite more complex than that one, I decided to drop an issue before attempting to hack a proof-of-concept.

cc @prattmic

Footnotes

  1. The comment says that Empty slots are negated, becoming 1000 0000 (unchanged!)., but negating something should change it, right? I didn't find any docs regarding this behaviour, only ChatGPT explained me that since it's already the most negative value, it remains unchanged.

@gopherbot gopherbot added this to the Proposal milestone Dec 23, 2024
@gabyhelp
Copy link

Related Issues

Related Code Changes

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@thepudds thepudds changed the title proposal: runtime: use zero byte as empty control word (potential performance improvement) runtime: use zero byte as empty control word (potential performance improvement) Dec 23, 2024
@thepudds
Copy link
Contributor

This would be an internal implementation detail, and as such would not need to go through the proposal process.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Dec 23, 2024
@thepudds thepudds removed Proposal compiler/runtime Issues related to the Go compiler and/or runtime. labels Dec 23, 2024
@thepudds thepudds removed this from the Proposal milestone Dec 23, 2024
@thepudds thepudds added the compiler/runtime Issues related to the Go compiler and/or runtime. label Dec 23, 2024
@cherrymui cherrymui added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance labels Dec 23, 2024
@cherrymui cherrymui added this to the Backlog milestone Dec 23, 2024
@prattmic
Copy link
Member

The original Abseil swisstable design uses 0b10000000 for empty specifically to enable the use of PSIGNB, which you also linked to above.

I think the comment there does a pretty good job of explaining how it works, but to your question:

The comment says that Empty slots are negated, becoming 1000 0000 (unchanged!)., but negating something should change it, right?

In 2's complement, negation is defined as NEG a = (NOT a) + 1. So:

  • a = 1000 0000
  • NOT a = 0111 1111
  • (NOT a) + 1 = 1000 0000

Big kudos to the Abseil folks for coming up with this optimization, I never would have thought of something so subtle using a seemly unrelated instruction.

@prattmic
Copy link
Member

Whether we need to keep this optimization is the question.

There are several things at play here:

Use of PSIGNB is fewer instructions than PCMPEQB in general because the latter needs to load ctrlEmpty into a register prior to comparison. But in the register ABI, X15 is defined as a zero register, so we wouldn't actually need to load anything for PCMPEQB.

On the other hand, matchH2 benefits from having bit 7 clear. If we do the obvious change of using bit 7 to indicate in use, then matchH2 needs to set bit 7 of h2 prior to comparison. That should be cheap, but it is more work.

I haven't fully considered the implications of your alternative proposal of adding 2 to h2, but that would be a similar amount of work.

@prattmic prattmic changed the title runtime: use zero byte as empty control word (potential performance improvement) runtime: use zero byte as empty control word in maps(potential performance improvement) Jan 7, 2025
@prattmic prattmic changed the title runtime: use zero byte as empty control word in maps(potential performance improvement) runtime: use zero byte as empty control word in maps (potential performance improvement) Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Development

No branches or pull requests

6 participants