Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(advanced RPC compression): enable dictionary training and ZSTD compression #7401

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions sdcm/provision/scylla_yaml/scylla_yaml.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,11 @@ def set_endpoint_snitch(cls, endpoint_snitch: str):
internode_send_buff_size_in_bytes: int = None # 0
internode_recv_buff_size_in_bytes: int = None # 0
internode_compression: Literal['none', 'all', 'dc'] = None # "none"
rpc_dict_training_when: Literal['always', 'never', 'when_leader'] = 'when_leader'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend putting the actual defaults here (i.e. None)

and create an SCT configuration for specific test that would enabled it, for the sake of testing.
enabling it for all by default, should preferably done by scylla-core itself if decided as such.

see the configurations folder, and as example configurations/tablets-initial-32.yaml how a specific test can set it's own scylla.yaml parameters

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend putting the actual defaults here (i.e. None)

and create an SCT configuration for specific test that would enabled it, for the sake of testing. enabling it for all by default, should preferably done by scylla-core itself if decided as such.

Why? The good thing about enabling it by default is that it gives more coverage with little effort. What's the drawback?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the drawback is that we won't be testing the defaults from that point onwards.

if it's zero risk, change the default in scylla, if not the are reasons not to do so, they apply here in SCT as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend putting the actual defaults here (i.e. None)
and create an SCT configuration for specific test that would enabled it, for the sake of testing.

@fruch

In that case, would it be acceptable to enable dict training and internode compression for all longevity tests?

We can't enable this feature by default in Scylla because it has a performance cost. For some users, RPC compression is useless (e.g. because they have an on-premise deployment and they don't pay for the amount of data sent over network), and we don't want to regress them.

But we do want it to be stable and relatively cheap "by default". We might want to enable it by default in the cloud, because every cluster pays the network costs there.

the drawback is that we won't be testing the defaults from that point onwards.

Internode compression only adds code paths, it doesn't remove them. So when you enable it, you are testing strictly more code. And I think that's a good thing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fruch ^Ping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you are saying it's not gonna be enabled by default for core, cause of fear of regression.
so it might cause regression on some of the tests

and for the "cheap" argument, every time we said something like that, we regretted it (see the enabling audit by default, we after a while got reverted)

I would still argue you do want specific case that can show different compression option vs. other options

and in this PR, you are not enabling the internode_compression at all, so it would only affect some cases using it.
which are not that many cases

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyhow, we should be trying it on a few cases before merging it

  • one short longevity
  • one case of rolling upgrades

as followup work after merge:

  • one performance test - during operation(nemesis) one

rpc_dict_training_min_time_seconds: int = 900
rpc_dict_update_period_seconds: int = 150
rpc_dict_training_min_bytes: int = 100_000_000
internode_compression_zstd_max_cpu_fraction: float = 0.05
inter_dc_tcp_nodelay: bool = None # False
streaming_socket_timeout_in_ms: int = None # 0
start_native_transport: bool = None # True
Expand Down
Loading