New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't replicate token metadata when tokens don't change #2869
Comments
Saw |
This could cause latency regression in large clusters in 2.0 due to now frequent updates of CACHE_HITRATES. Changes in heartbeat don't trigger this. |
On Tue, Oct 10, 2017 at 05:52:49AM -0700, Tomasz Grabiec wrote:
This could cause latency regression in large clusters in 2.0 due to now frequent updates of CACHE_HITRATES. Changes in heartbeat don't trigger this.
I think this is the case. 50 node cluster test started to fail and the
failure was bisected to the patch that adds CACHE_HITRATES to gossiper
state.
…--
Gleb.
|
So, it was a mistake to add CACHE_HITRATES to gossip. It was based on my wrong assumption that gossip will move node A's hitrates to node B via intermediary node C, without A talking to C directly. Perhaps we should just drop it, and rely on piggy-backs and resets on node restarts. |
On Tue, Oct 10, 2017 at 02:36:34PM +0000, Avi Kivity wrote:
So, it was a mistake to add CACHE_HITRATES to gossip. It was based on my wrong assumption that gossip will move node A's hitrates to node B via intermediary node C, without A talking to C directly.
Perhaps we should just drop it, and rely on piggy-backs and resets on node restarts.
It will cause initial spikes after restart because rebooted node will
think that all other nodes have zero cache hit rate and will send all
its traffic to itself before it learns otherwise.
…--
Gleb.
|
10.10.2017 4:36 PM "Avi Kivity" <notifications@github.com> napisał(a):
So, it was a mistake to add CACHE_HITRATES to gossip. It was based on my
wrong assumption that gossip will move node A's hitrates to node B via
intermediary node C, without A talking to C directly.
It could do that, why do you think it's not the case? Nodes exchange
information about all other nodes in each exchange.
Perhaps we should just drop it, and rely on piggy-backs and resets on node
restarts.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#2869 (comment)>,
or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AARUL-mpjsi4OIRwg-sIrebaGfWCV2VOks5sq4DngaJpZM4PvwfZ>
.
|
@tgrabiec I asked and was told it was point-to-point, but maybe there was a miscommunication. |
@gleb-cloudius perhaps when we first learn about a node, we can ask it about hitrates, before we declare ourselves ready. |
On Tue, Oct 10, 2017 at 02:59:20PM +0000, Avi Kivity wrote:
> It will cause initial spikes after restart because rebooted node will
think that all other nodes have zero cache hit rate and will send all
its traffic to itself before it learns otherwise.
@gleb-cloudius perhaps when we first learn about a node, we can ask it about hitrates, before we declare ourselves ready.
Isn't this just a round of gossiper of a sort. But less efficient since
gossiper is not really point-to-point after all. Why not just fix a bug
here and do not replicate token metadata in case it did not change
instead?
…--
Gleb.
|
Agree that gossip fixes are better. btw, will gossip hitrate override piggyback hitrate? Because gossip hitrate is likely to be much older. |
On Tue, Oct 10, 2017 at 03:26:23PM +0000, Avi Kivity wrote:
Agree that gossip fixes are better.
btw, will gossip hitrate override piggyback hitrate? Because gossip hitrate is likely to be much older.
No, it will not. It is used only once when there is no info about
hitrate from other sources.
…--
Gleb.
|
2017-10-10 14:57 GMT+02:00 Gleb Natapov <notifications@github.com>:
On Tue, Oct 10, 2017 at 05:52:49AM -0700, Tomasz Grabiec wrote:
> This could cause latency regression in large clusters in 2.0 due to now
frequent updates of CACHE_HITRATES. Changes in heartbeat don't trigger this.
>
I think this is the case. 50 node cluster test started to fail and the
failure was bisected to the patch that adds CACHE_HITRATES to gossiper
state.
What kind of failure are you seeing?
|
On Tue, Oct 10, 2017 at 09:05:25AM -0700, Tomasz Grabiec wrote:
2017-10-10 14:57 GMT+02:00 Gleb Natapov ***@***.***>:
> On Tue, Oct 10, 2017 at 05:52:49AM -0700, Tomasz Grabiec wrote:
> > This could cause latency regression in large clusters in 2.0 due to now
> frequent updates of CACHE_HITRATES. Changes in heartbeat don't trigger this.
> >
> I think this is the case. 50 node cluster test started to fail and the
> failure was bisected to the patch that adds CACHE_HITRATES to gossiper
state.
>
What kind of failure are you seeing?
Not me, Shlomi. The kind you are debugging. Missing rows after adding a
node because streaming was not down to a node.
…--
Gleb.
|
When applying endpoint state, significant amount of CPU is spent in copying of token_metadata:
We do it inefficiently, for each application state change, but we probably only have to do it when tokens change.
Refs #2855
The text was updated successfully, but these errors were encountered: