-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature(advanced RPC compression): enable dictionary training and ZSTD compression #7401
base: master
Are you sure you want to change the base?
Conversation
…D compression This patch: - enables dictionary training (by changing `rpc_dict_training_when` from the default value of 'never' to 'when_leader), - increases (by 4x) the frequency of dictionary training and updates, to stress the feature more, - enables ZSTD internode compression (provided that `internode_compression` is also enabled) by giving it a nonzero CPU usage limit. However, this patch doesn't enable `internode_compression` itself, because it's a performance-affecting option. To actually put the feature to use, a test must enable `internode_compression` on its own.
Marking as draft, because I haven't tested it. This PR enables (I hope) dictionary training by default. It should be non-invasive, so there should be no reason to enable it only for a subset of tests. The next step would be to also enable |
I have no idea how to test this. |
I've enabled the labels, so we can have a basic sanity for it (even that it default to running 5.2, we'll need to run it again with but you should define a specific cases you want to try, i.e. a longevity that you can have a reference of the run to compare with you can start with a short variations, of some of those, like |
@@ -147,6 +147,11 @@ def set_endpoint_snitch(cls, endpoint_snitch: str): | |||
internode_send_buff_size_in_bytes: int = None # 0 | |||
internode_recv_buff_size_in_bytes: int = None # 0 | |||
internode_compression: Literal['none', 'all', 'dc'] = None # "none" | |||
rpc_dict_training_when: Literal['always', 'never', 'when_leader'] = 'when_leader' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend putting the actual defaults here (i.e. None)
and create an SCT configuration for specific test that would enabled it, for the sake of testing.
enabling it for all by default, should preferably done by scylla-core itself if decided as such.
see the configurations
folder, and as example configurations/tablets-initial-32.yaml
how a specific test can set it's own scylla.yaml parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend putting the actual defaults here (i.e. None)
and create an SCT configuration for specific test that would enabled it, for the sake of testing. enabling it for all by default, should preferably done by scylla-core itself if decided as such.
Why? The good thing about enabling it by default is that it gives more coverage with little effort. What's the drawback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the drawback is that we won't be testing the defaults from that point onwards.
if it's zero risk, change the default in scylla, if not the are reasons not to do so, they apply here in SCT as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend putting the actual defaults here (i.e. None)
and create an SCT configuration for specific test that would enabled it, for the sake of testing.
In that case, would it be acceptable to enable dict training and internode compression for all longevity tests?
We can't enable this feature by default in Scylla because it has a performance cost. For some users, RPC compression is useless (e.g. because they have an on-premise deployment and they don't pay for the amount of data sent over network), and we don't want to regress them.
But we do want it to be stable and relatively cheap "by default". We might want to enable it by default in the cloud, because every cluster pays the network costs there.
the drawback is that we won't be testing the defaults from that point onwards.
Internode compression only adds code paths, it doesn't remove them. So when you enable it, you are testing strictly more code. And I think that's a good thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fruch ^Ping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you are saying it's not gonna be enabled by default for core, cause of fear of regression.
so it might cause regression on some of the tests
and for the "cheap" argument, every time we said something like that, we regretted it (see the enabling audit by default, we after a while got reverted)
I would still argue you do want specific case that can show different compression option vs. other options
and in this PR, you are not enabling the internode_compression at all, so it would only affect some cases using it.
which are not that many cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyhow, we should be trying it on a few cases before merging it
- one short longevity
- one case of rolling upgrades
as followup work after merge:
- one performance test - during operation(nemesis) one
@fruch Do you mean that I should manually clone the relevant jenkins job and run it, or is there some better way? |
no there no better way, you can try out the new Argus clone command: |
In upgrade tests we start from 2024.1 or 5.4 - how this change affect these versions? Ignoring or failing? |
@#7401 (comment) I don't understand the question. None of these options are present in 2024.1/5.4, so it shouldn't affect their behaviour at all. |
I think @soyacz meant that we enable internode compression in rolling upgrades from 5.4/2024.1 |
This patch:
rpc_dict_training_when
from the default value of 'never' to 'when_leader),internode_compression
is also enabled) by giving it a nonzero CPU usage limit.However, this patch doesn't enable
internode_compression
itself, because it's a performance-affecting option. To actually put the feature to use, a test must enableinternode_compression
on its own.Testing
PR pre-checks (self review)
backport
labelsRefs #7364