Reduce unncessary CACHE_HITRATES updates in gossip #5971

asias · 2020-03-06T09:23:44Z

The size of CACHE_HITRATES message is O(n), n is the number of tables. The message can be very big.

We update CACHE_HITRATES unconditionally and periodically even if the values are not changed. I am wondering if we could avoid sending CACHE_HITRATES if it does not change since last update.

INFO  2020-03-06 17:13:50,054 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.3, app_state=CACHE_HITRATES, versioned_value=Value(system_distributed.view_build_status:0.000000;system_auth.role_members:0.00000
0;system_distributed.cdc_topology_description:0.153518;system_traces.sessions_time_idx:0.000000;system_auth.roles:1.000000;system_traces.node_slow_log_time_idx:0.000000;system_traces.sessions:0.000000;myks2.standard1:0.000000;syste
m_distributed.cdc_description:0.000000;system_traces.node_slow_log:0.000000;system_traces.events:0.000000;,732)

INFO  2020-03-06 17:13:52,055 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.3, app_state=CACHE_HITRATES, versioned_value=Value(system_distributed.view_build_status:0.000000;system_auth.role_members:0.00000
0;system_distributed.cdc_topology_description:0.153518;system_traces.sessions_time_idx:0.000000;system_auth.roles:1.000000;system_traces.node_slow_log_time_idx:0.000000;system_traces.sessions:0.000000;myks2.standard1:0.000000;syste
m_distributed.cdc_description:0.000000;system_traces.node_slow_log:0.000000;system_traces.events:0.000000;,740)   

INFO  2020-03-06 17:13:54,062 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=CACHE_HITRATES, versioned_value=Value(system_distributed.view_build_status:0.000000;system_auth.role_members:0.00000
0;system_distributed.cdc_topology_description:0.500000;system_traces.sessions_time_idx:0.000000;system_auth.roles:1.000000;system_traces.node_slow_log_time_idx:0.000000;system_traces.sessions:0.000000;myks2.standard1:0.000000;syste
m_distributed.cdc_description:0.500000;system_traces.node_slow_log:0.000000;system_traces.events:0.000000;,772)

The downside of such updates:

- Introduces more gossip exchange traffic

- Updates system.peers all the time

The text was updated successfully, but these errors were encountered:

asias · 2020-03-06T09:24:24Z

@gleb-cloudius What do you think?

slivne · 2020-03-17T21:14:58Z

we had

#5200

and AFAIK @haaawk even coded a suggestion related to this that was not merged

haaawk · 2020-03-17T21:17:48Z

POC is here https://github.com/haaawk/scylla/commits/5200-v1

asias · 2020-09-25T06:49:00Z

POC is here https://github.com/haaawk/scylla/commits/5200-v1

The link is dead. I want to revive this work. The extra unnecessary traffic is fine to a cluster in a good shape but when some of the node or shards are loaded, such messages and the handling of such messages can make the system even busy.

asias · 2020-10-13T00:31:43Z

Ping. It is not urgent.

OK, I saw you assigned someone.

bhalevy · 2022-07-05T06:38:19Z

@asias can you pick this up?

denesb · 2022-07-05T12:16:17Z

Can we use table ids in the state string, instead of table names? Table names can be arbitrary long, whereas id as UUIDs with a fixed size.

This patch avoids unncessary CACHE_HITRATES updates through gossip. After this patch: Publish CACHE_HITRATES in case: - We haven't published it at all - The diff is bigger enough and we haven't published in the last 5 seconds Note: A peer node can know the cache hitrate through read_data read_mutation_data and read_digest RPC verbs which have cache_temperature in the response. So there is no need to update CACHE_HITRATES through gossip in high frequency. We do the recalculation faster if the diff is bigger than 0.01. It is useful to do the calculation even if we do not publish the CACHE_HITRATES though gossip, since the recalculation will call the table->set_global_cache_hit_rate to set the hitrate. Fixes scylladb#5971

asias · 2022-07-20T02:47:33Z

I've sent an PR here: #11079

This patch avoids unncessary CACHE_HITRATES updates through gossip. After this patch: Publish CACHE_HITRATES in case: - We haven't published it at all - The diff is bigger than 1% and we haven't published in the last 5 seconds - The diff is really big 10% Note: A peer node can know the cache hitrate through read_data read_mutation_data and read_digest RPC verbs which have cache_temperature in the response. So there is no need to update CACHE_HITRATES through gossip in high frequency. We do the recalculation faster if the diff is bigger than 0.01. It is useful to do the calculation even if we do not publish the CACHE_HITRATES though gossip, since the recalculation will call the table->set_global_cache_hit_rate to set the hitrate. Fixes scylladb#5971

slivne added feature/enhancement symptom/performance Issues causing performance problems area/stability area/gossip labels Mar 17, 2020

slivne added this to the 3.x milestone Mar 17, 2020

haaawk assigned haaawk and StarostaGit and unassigned haaawk Oct 11, 2020

haaawk removed their assignment Feb 3, 2021

haaawk assigned haaawk and unassigned StarostaGit Mar 21, 2021

bhalevy unassigned haaawk Jul 5, 2022

bhalevy mentioned this issue Jul 19, 2022

RFC: Put gossiper messages on a diet #11054

Closed

asias mentioned this issue Jul 20, 2022

misc_services: Fix cache hitrate update #11079

Closed

scylladb-promoter closed this as completed in 39db15d Jul 21, 2022

DoronArazii modified the milestones: 5.x, 5.2 Nov 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce unncessary CACHE_HITRATES updates in gossip #5971

Reduce unncessary CACHE_HITRATES updates in gossip #5971

asias commented Mar 6, 2020

asias commented Mar 6, 2020

slivne commented Mar 17, 2020

haaawk commented Mar 17, 2020

asias commented Sep 25, 2020

asias commented Oct 13, 2020

bhalevy commented Jul 5, 2022

denesb commented Jul 5, 2022

asias commented Jul 20, 2022

Reduce unncessary CACHE_HITRATES updates in gossip #5971

Reduce unncessary CACHE_HITRATES updates in gossip #5971

Comments

asias commented Mar 6, 2020

asias commented Mar 6, 2020

slivne commented Mar 17, 2020

haaawk commented Mar 17, 2020

asias commented Sep 25, 2020

asias commented Oct 13, 2020

bhalevy commented Jul 5, 2022

denesb commented Jul 5, 2022

asias commented Jul 20, 2022