-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not use KillMode=process for salt-master #33792
Do not use KillMode=process for salt-master #33792
Conversation
We actually want salt-master processes in the same control group to be terminated and this was only introduced to fix the 'minion upgrades itself' problem, which applies to salt-minion, but not to salt-master. This change was introduced in d288539 to fix #29295, which discusses the need to include 'KillMode=process' for the salt-minion on Debian systems.
Ping @thatch45, @terminalmage. |
@dmurphy18 does this sound familiar? the logic presented by @BABILEN sounds valid, the master does not daemonize and processes it spins up like the minion does, so the control group should not be compromised. |
@terminalmage would like to get your input on the above. |
Yes, as I recall there was a really good reason here. I think that we need to document why the systemd unit files are the way they are and what the right thing is to make them better, since I think that getting to the point where we are more closely integrated with systemd would be better |
See #33665 (comment) for a bug that is caused by this option and a way to reproduce it. |
@BABILEN The comment in #33665 appears to fix an issue with vagrant. The KillMode=process has been present in the code since approx. 2015.8.3 and not caused issues. I think whether to have the line or remove it might be better handled when Salt better integrates with systemd and notifications, which task #33803 covers. |
Yes, we have already been on this ride for a while, the main thing to do it get systemd notifications working correctly for salt so that every process notifies systemd, |
Yes, sure. Integration with systemd is a longer process, but why was Yes, I have finally been able to provide one way to reliably reproduce the issue during the salt-master bootstrap and I don't see why that is specific to vagrant. And, even if it would be specific to vagrant it should be solved, shouldn't it? Why do you want to keep It looks like a race condition and presumably people are just not running into it because their nameservers are fast enough. @thatch45 Why are you confident that it'll break other systems? KillMode=process for the master was introduced in a PR that discusses the minion and nobody questioned it back then and it is not apparent to me why it would be needed. If it is necessary it should, naturally, be kept. |
#27243 is another report that might be related. |
@cachedout Do you know why |
@BABILEN I'm afraid I don't. I traced the commit history back to here [https://github.com//pull/32857] and then the trail grows a little cold. |
I remember @terminalmage going off on why it was needed, he likely knows |
@gtmanfred is the one that discovered the need for this, I believe. Specifically it was done to address the fact that when a package is updated, systemd's default way of bringing down the service is to send an immediate SIGTERM to all procs in the cgroup (see here). This has unfortunate side effects when upgrading the salt-minion package, as the salt-minion service (and thus the package manager command it was running) is killed in the middle of operation, resulting in a corrupted package database in most cases. I'm not sure if it is necessary to do this for the salt-master service as well, it doesn't seem necessary to me. I'm not sure why this change was made to anything but the salt-minion service unit. |
Should only have been needed on the minion to make sure that the original I never investigated the implications on the master.
|
@terminalmage, @gtmanfred, Great, thank you for confirming. It should be mentioned that it is also set for salt-api, but I haven't investigated the process management in that case, so can't say if it should or should not have |
It looks like we are pretty much in agreement then that the |
So this is good to merge then? |
Not quite, @BABILEN should
Additionally, we need to make sure that @dmurphy18 propagates these changes as necessary to salt-pack for subsequent package builds. |
@terminalmage Most of those files are simply symlinks to
I will make a different PR for salt-api as I'd prefer to keep this one specific to the salt-master. |
OK, I hadn't looked, I just did a recursive grep. Good to know. I'll merge, then. |
@terminalmage @BABILEN I shall propagate the changes in salt-pack, about to package 2016.3.1 and I'll get the changes into that point release. |
@dmurphy18 remember to get KillMode removed from salt-api.service as well. |
@terminalmage will do |
What does this PR do?
It removes
KillMode=process
from the salt-master systemd unit file and will thereby fallback to the systemd default ofKillMode=control-group
.https://www.freedesktop.org/software/systemd/man/systemd.kill.html discusses these options as:
What issues does this PR fix or reference?
KillMode=process
for salt-master was introduced in d288539 to fix #29295. That issue discusses the need to include 'KillMode=process' for the salt-minion on Debian systems.The reason this was included for salt-minion in the first place was to ensure that it can upgrade itself and the approach taken was to not terminate salt-minion child processes when the service is restarted. That way the salt-minion process that actually performs the upgrade can continue working, while the parent process has been "upgraded". This relies on the fact that salt-minion child processes are rather short lived.
The salt-master will never upgrade itself and you would typically want to terminate its child processes also when you stop the service. I am not sure why it was introduced to begin with as the referenced bug clearly discusses
salt-minion
.It would be good to hear some opinions concerning
salt-api
whereKillMode=process
seems to have been adopted without discussion too. Unfortunately I am not yet too familiar with the salt-api codebase, so I can't decide which behaviour would be most appropriate.