Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utils: compact-radix-tree: fix accidental cache line bouncing #9246

Closed
wants to merge 1 commit into from

Conversation

michoecho
Copy link
Contributor

@michoecho michoecho commented Aug 26, 2021

Whenever a node_head_ptr is assigned to nil_root, the _backref inside it is
overwritten. But since nil_root is shared between shards, this causes severe
cache line bouncing. (It was observed to reduce the total write throughput
of Scylla by 90% on a large NUMA machine).

This backreference is never read anyway, so fix this bug by not writing it.

Fixes #9252

@michoecho
Copy link
Contributor Author

@avikivity That's it. That's the entire NUMA slowness bug. Sharing .data (.bss in this case, actually) does not seem to cause problems in general, because it's cached most of the time, as expected. Performance problems only happened here because a cache line is actively bouncing.

By the way, I drilled down to this by adding a REST API for calling mbind() and bisecting the relevant ELF segment with it. (I stashed an array somewhere into Scylla and recompiled it with various sizes to bisect the segment with a page boundary, and send the right half to the other NUMA node with mbind()). Sampling with perf never showed anything suspicious in this piece of code.

@michoecho
Copy link
Contributor Author

michoecho commented Aug 26, 2021

Also, I had to guess the relevant piece of memory by reading the assembly. Many symbols in the Scylla binary come out obfuscated for some reason (they look like this: $x.124). Why?
Edit: nevermind, in DWARF the symbols are fine. I could have just used info symbol in gdb.

@avikivity
Copy link
Member

Ah, so I guess the machine has directory-based coherence. If the cache line isn't local to the core, it asks the directory on the node that owns the memory that the line caches. So even though we cache line was always on node 0, we had to ask node 1 where it was.

@avikivity
Copy link
Member

It also explains why we saw the radix tree in the profiles!

@avikivity
Copy link
Member

Well done. The patch is fine, but I'll let @xemul admire it first.

@avikivity
Copy link
Member

It also explains why we saw the radix tree in the profiles!

Maybe I'm mixing this with a report from @pveentjer

if (_v != nullptr) {
// nil_root is shared between shards, so this nil_root check is necessary
// to avoid cache line bouncing caused by writing to _backref.
if (_v != nullptr && _v != &nil_root) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checkinf for node being nil_root is done in many places to decide whether the node in question is "empty" and should be, e.g., populated, not visited, skipped on erase, etc. If nil_root assignment is skipped then all the above logic breaks. And actually if debug tests continue to pass after this that's bad sign as well.

The motivation behind having the nil_root is to avoid extra node == nullptr checks on lookup. It only need to exist and contain no slots inside. What if you make the nil_root thread-local, does it help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the assignment is not skipped, is it? It's just the backref on nil_root that's not assigned and @michoecho wrote that backref on nil_root is never checked/read, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check it one more time, @haaawk ... yes, missed that. Then the patch is correct :)

@xemul xemul self-requested a review August 26, 2021 07:42
@haaawk
Copy link
Contributor

haaawk commented Aug 26, 2021

I'm highly impressed by your detective work @michoecho . If I could, I would give you all my Monster coins this quarter. Unfortunately I can't grant my coins to my own team so it has to be enough for you to know how happy I am to have such a talented engineer on my team.

@avikivity
Copy link
Member

@michoecho please file an issue and add a Fixes #, to satisfy the backport bureaucracy.

@michoecho
Copy link
Contributor Author

@michoecho please file an issue and add a Fixes #, to satisfy the backport bureaucracy.

Done. I also modified the code comment slightly.

Whenever a node_head_ptr is assigned to nil_root, the _backref inside it is
overwritten. But since nil_root is shared between shards, this causes severe
cache line bouncing. (It was observed to reduce the total write throughput
of Scylla by 90% on a large NUMA machine).

This backreference is never read anyway, so fix this bug by not writing it.

Fixes scylladb#9252
avikivity pushed a commit that referenced this pull request Aug 29, 2021
Whenever a node_head_ptr is assigned to nil_root, the _backref inside it is
overwritten. But since nil_root is shared between shards, this causes severe
cache line bouncing. (It was observed to reduce the total write throughput
of Scylla by 90% on a large NUMA machine).

This backreference is never read anyway, so fix this bug by not writing it.

Fixes #9252

Closes #9246

(cherry picked from commit 126baa7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Accidental cache line invalidation in compact-radix-tree kills performance on large machines
4 participants