Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: unexpected and silent intervention of scylla systemd scripts in the manual configuration of perftune.yaml #11385

Closed
vladzcloudius opened this issue Aug 25, 2022 · 13 comments
Milestone

Comments

@vladzcloudius
Copy link
Contributor

Installation details
HEAD: a03a33d
Regression introduced by 3a51e78

Description
The reason why cpuset.conf and perftune.yaml exist are in particular to allow specifying a compute and IRQ handling CPU sets.

If user wants to follow a default choice Scylla thinks is going to be best for the current setup then auto-generated values are going to be fine.

However, sometimes the user wants to define his/her own configuration, for example run multiple Scylla instances on the same big machine with multiple NICs separating them to different NUMA nodes.

In such a case a special value of cpu_mask needs to be provided in perftune.yaml and auto-generation of perftune.yaml by scylla_prepare can no longer be used because it assumes that the whole machine is available.

Before 3a51e78 the above could have been easily achieved because scylla_prepare would not touch perftune.yaml if it exists (as was intended!).

However after the patch above if a user would change something in /etc/defaults/scylla-server or in cpuset.conf after he/she wrote the content he/she wanted in /etc/scylla.d/perftune.yaml our scriptology is going to just go a overwrite it with an unwanted content on the next restart without a warning!

This behavior is less than expected and should be at best optional.

@vladzcloudius
Copy link
Contributor Author

@syuu1228 @avikivity @tarzanek FYI

@vladzcloudius
Copy link
Contributor Author

vladzcloudius commented Aug 25, 2022

3a51e78 is a very dangerous change

What was the motivation for it, @tarzanek?

The content of perftune.yaml is just as important as cpuset.conf (because it defines the total CPU set via cpu_mask). So, overwriting it should be done with the same care as overwriting cpuset.conf today. Which is: automation should not touch it if it exists and only the operator can explicitly modify it (a person or an Ansible Role is an example of such an "operator").

The patch above would only be safe if cpu_mask is defined somewhere, e.g. in scylla-server config file, and then used for generation of perftune.yaml.

@vladzcloudius
Copy link
Contributor Author

What was the motivation for it, @tarzanek?

I think I found it: #10121

Unfortunately the patch above doesn't represent a solution because perftune.yaml can now be deleted because of some irrelevant change in scylla-server configuration file.

The whole approach of deleting the perftune.yaml based on timestamps of other files is bogus.

perftune.yaml should be regenerated only when the corresponding content (e.g. CPUSET or NIC name) has changed - not just ANY change.

@vladzcloudius
Copy link
Contributor Author

I believe that the motivation for this change is not strong enough to keep this patch and a revert should be considered.

@avikivity @denesb FYI.

@avikivity
Copy link
Member

@tarzanek please comment

@vladzcloudius
Copy link
Contributor Author

I see that it's already in 5.1.
I'm sending a fix for this.

@tarzanek
Copy link
Contributor

this broke a contract in scylla_prepare where if perftune.yaml exists, it shouldn't be touched (I wasn't aware of this, I thought perftune.yaml is volatile and shouldn't have manual modification, hence the slip in the review)
so all logic needs to be moved to scylla_cpuset_setup only which explicitely removes perftune.yaml to force its refresh by scylla_prepare
so scylla_prepare needs a revert

@tarzanek
Copy link
Contributor

tarzanek commented Aug 25, 2022

the question is if scylla_server change impacts content of perftune.yaml or not
we have there whether to tune nic and disks IIRC
so does it influence whether perftune.py will write to tune section both

  • nic
  • disk
    ?

The answer is it only changes NIC name, disks are being tuned by default command from scylla_util

So changing default scylla-server IFNAME param will impact perftune.yaml too, but since it's direct edit and not by scylla_setup, user is responsible of fixing it in perftune.yaml, too

@vladzcloudius
Copy link
Contributor Author

The idea is as follows:

  • perftune.yaml is just as "untouchable" as cpuset.conf if it exists.
  • Just like cpuset.conf it can either be modified by a user (and then it's his/her responsibility to keep both files in sync) or it can be modified as a result of an explicit call to scylla_cpuset_setup.

In #10121 Lubos asked for scylla_cpuset_conf idempotency, which means that: we don't want cpuset.conf to be changed if there no reason for it to be changed.

However this wasn't supposed to change the original logic that said that IF cpuset.conf was changed as a result of a call of scylla_cpuconf_setup then perftune.yaml needs to be regenerated.

In simple, all the corresponding patch was supposed to have done was:

  1. Check if parameters of scylla_cpuconf_setup require a change in cpuset.conf.
  2. If "yes", apply the change and delete perftune.yaml (which will trigger its regeneration on the next scylla-server service restart.

This means - no change in scylla_prepare was required.

syuu1228 pushed a commit to syuu1228/scylla that referenced this issue Oct 8, 2022
…tent without introducing regressions

This patch fixes the regression introduced by 3a51e78 which broke
a very important contract: perftune.yaml should not be "touched"
by Scylla scriptology unless explicitly requested.

And a call for scylla_cpuset_setup is such an explicit request.

The issue that the offending patch was intending to fix was that
cpuset.conf was always generated anew for every call of
scylla_cpuset_setup - even if a resulting cpuset.conf would come
out exactly the same as the one present on the disk before tha call.

And since the original code was following the contract mentioned above
it was also deleting perftune.yaml every time too.
However, this was just an unavoidable side-effect of that cpuset.conf
re-generation.

The above also means that if scylla_cpuset_setup doesn't write to cpuset.conf
we should not "touch" perftune.yaml and vise versa.

This patch implements exactly that together with reverting the dangerous
logic introduced by 3a51e78.

Fixes scylladb#11385
Fixes scylladb#10121
@DoronArazii DoronArazii added this to the 5.2 milestone Nov 16, 2022
@mykaul
Copy link
Contributor

mykaul commented May 30, 2023

@DoronArazii - any idea why there's no 'Backport candidate' label?

vladzcloudius added a commit to vladzcloudius/scylla that referenced this issue May 30, 2023
…tent without introducing regressions

This patch fixes the regression introduced by 3a51e78 which broke
a very important contract: perftune.yaml should not be "touched"
by Scylla scriptology unless explicitly requested.

And a call for scylla_cpuset_setup is such an explicit request.

The issue that the offending patch was intending to fix was that
cpuset.conf was always generated anew for every call of
scylla_cpuset_setup - even if a resulting cpuset.conf would come
out exactly the same as the one present on the disk before tha call.

And since the original code was following the contract mentioned above
it was also deleting perftune.yaml every time too.
However, this was just an unavoidable side-effect of that cpuset.conf
re-generation.

The above also means that if scylla_cpuset_setup doesn't write to cpuset.conf
we should not "touch" perftune.yaml and vise versa.

This patch implements exactly that together with reverting the dangerous
logic introduced by 3a51e78.

Fixes scylladb#11385
Fixes scylladb#10121

(cherry picked from commit c538cc2)
@DoronArazii
Copy link

@DoronArazii - any idea why there's no 'Backport candidate' label?

I have no idea. @benipeled ?

@benipeled
Copy link
Contributor

The pattern looks fine but it's an 8-month-old commit - we don't have build history to look for

avikivity pushed a commit that referenced this issue Jun 4, 2023
…tent without introducing regressions

This patch fixes the regression introduced by 3a51e78 which broke
a very important contract: perftune.yaml should not be "touched"
by Scylla scriptology unless explicitly requested.

And a call for scylla_cpuset_setup is such an explicit request.

The issue that the offending patch was intending to fix was that
cpuset.conf was always generated anew for every call of
scylla_cpuset_setup - even if a resulting cpuset.conf would come
out exactly the same as the one present on the disk before tha call.

And since the original code was following the contract mentioned above
it was also deleting perftune.yaml every time too.
However, this was just an unavoidable side-effect of that cpuset.conf
re-generation.

The above also means that if scylla_cpuset_setup doesn't write to cpuset.conf
we should not "touch" perftune.yaml and vise versa.

This patch implements exactly that together with reverting the dangerous
logic introduced by 3a51e78.

\Fixes #11385
\Fixes #10121

(cherry picked from commit c538cc2)
@denesb
Copy link
Contributor

denesb commented Dec 14, 2023

Already backported, removing label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants