Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't replicate token metadata when tokens don't change #2869

Closed
tgrabiec opened this issue Oct 5, 2017 · 13 comments
Closed

Don't replicate token metadata when tokens don't change #2869

tgrabiec opened this issue Oct 5, 2017 · 13 comments
Assignees
Labels
Milestone

Comments

@tgrabiec
Copy link
Contributor

tgrabiec commented Oct 5, 2017

When applying endpoint state, significant amount of CPU is spent in copying of token_metadata:

screenshot from 2017-10-04 19-15-16

We do it inefficiently, for each application state change, but we probably only have to do it when tokens change.

Refs #2855

@tgrabiec tgrabiec added the symptom/performance Issues causing performance problems label Oct 5, 2017
@tgrabiec tgrabiec changed the title Don't replicate token metadata when tokens don Don't replicate token metadata when tokens don't change Oct 5, 2017
@tgrabiec
Copy link
Contributor Author

tgrabiec commented Oct 5, 2017

Saw apply_state_locally() wall time cut in half when applying this.

@tgrabiec tgrabiec self-assigned this Oct 6, 2017
@tgrabiec tgrabiec added this to the 2.0 milestone Oct 6, 2017
@tgrabiec
Copy link
Contributor Author

This could cause latency regression in large clusters in 2.0 due to now frequent updates of CACHE_HITRATES. Changes in heartbeat don't trigger this.

@gleb-cloudius
Copy link
Contributor

gleb-cloudius commented Oct 10, 2017 via email

@avikivity
Copy link
Member

So, it was a mistake to add CACHE_HITRATES to gossip. It was based on my wrong assumption that gossip will move node A's hitrates to node B via intermediary node C, without A talking to C directly.

Perhaps we should just drop it, and rely on piggy-backs and resets on node restarts.

@gleb-cloudius
Copy link
Contributor

gleb-cloudius commented Oct 10, 2017 via email

@tgrabiec
Copy link
Contributor Author

tgrabiec commented Oct 10, 2017 via email

@avikivity
Copy link
Member

@tgrabiec I asked and was told it was point-to-point, but maybe there was a miscommunication.

@avikivity
Copy link
Member

It will cause initial spikes after restart because rebooted node will
think that all other nodes have zero cache hit rate and will send all
its traffic to itself before it learns otherwise.

@gleb-cloudius perhaps when we first learn about a node, we can ask it about hitrates, before we declare ourselves ready.

@gleb-cloudius
Copy link
Contributor

gleb-cloudius commented Oct 10, 2017 via email

@avikivity
Copy link
Member

Agree that gossip fixes are better.

btw, will gossip hitrate override piggyback hitrate? Because gossip hitrate is likely to be much older.

@gleb-cloudius
Copy link
Contributor

gleb-cloudius commented Oct 10, 2017 via email

@tgrabiec
Copy link
Contributor Author

tgrabiec commented Oct 10, 2017 via email

@gleb-cloudius
Copy link
Contributor

gleb-cloudius commented Oct 11, 2017 via email

@tzach tzach added the high label Oct 21, 2017
@slivne slivne assigned elcallio and unassigned tgrabiec Oct 29, 2017
avikivity pushed a commit that referenced this issue Nov 7, 2017
Fixes #2869

Message-Id: <20171101105629.22104-1-calle@scylladb.com>
(cherry picked from commit 8c257c4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants