Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not use KillMode=process for salt-master #33792

Merged
merged 1 commit into from
Jun 9, 2016
Merged

Do not use KillMode=process for salt-master #33792

merged 1 commit into from
Jun 9, 2016

Conversation

wwentland
Copy link
Contributor

@wwentland wwentland commented Jun 6, 2016

What does this PR do?

It removes KillMode=process from the salt-master systemd unit file and will thereby fallback to the systemd default of KillMode=control-group.

https://www.freedesktop.org/software/systemd/man/systemd.kill.html discusses these options as:

KillMode=
Specifies how processes of this unit shall be killed. One of control-group, 
process, mixed, none.

If set to control-group, all remaining processes in the control group of 
this unit will bekilled on unit stop (for services: after the stop command
is executed, as configured with ExecStop=). If set to process, only the 
main process itself is killed. If set to mixed, the SIGTERM signal (see 
below) is sent to the main process while the subsequent SIGKILL signal
(see below) is sent to all remaining processes of the unit's control group.
If set to none, no process is killed. In this case, only the stop command
will be executed on unit stop, but no process be killed otherwise. 
Processes remaining alive after stop are left in their control group and the
control group continues to exist after stop unless it is empty.

Processes will first be terminated via SIGTERM (unless the signal 
to send is changed via KillSignal=). Optionally, this is immediately
followed by a SIGHUP (if enabled with SendSIGHUP=). Ifthen, after
a delay (configured via the TimeoutStopSec= option), processes still 
remain, thetermination request is repeated with the SIGKILL signal 
(unless this is disabled via the SendSIGKILL= option).

See kill(2) for more information.

Defaults to control-group.

What issues does this PR fix or reference?

KillMode=process for salt-master was introduced in d288539 to fix #29295. That issue discusses the need to include 'KillMode=process' for the salt-minion on Debian systems.

The reason this was included for salt-minion in the first place was to ensure that it can upgrade itself and the approach taken was to not terminate salt-minion child processes when the service is restarted. That way the salt-minion process that actually performs the upgrade can continue working, while the parent process has been "upgraded". This relies on the fact that salt-minion child processes are rather short lived.

The salt-master will never upgrade itself and you would typically want to terminate its child processes also when you stop the service. I am not sure why it was introduced to begin with as the referenced bug clearly discusses salt-minion.

It would be good to hear some opinions concerning salt-api where KillMode=process seems to have been adopted without discussion too. Unfortunately I am not yet too familiar with the salt-api codebase, so I can't decide which behaviour would be most appropriate.

We actually want salt-master processes in the same control group to be
terminated and this was only introduced to fix the 'minion upgrades
itself' problem, which applies to salt-minion, but not to salt-master.

This change was introduced in d288539 to fix #29295, which discusses the
need to include 'KillMode=process' for the salt-minion on Debian
systems.
@jfindlay
Copy link
Contributor

jfindlay commented Jun 6, 2016

Ping @thatch45, @terminalmage.

@thatch45
Copy link
Contributor

thatch45 commented Jun 6, 2016

@dmurphy18 does this sound familiar? the logic presented by @BABILEN sounds valid, the master does not daemonize and processes it spins up like the minion does, so the control group should not be compromised.

@dmurphy18
Copy link
Contributor

@terminalmage would like to get your input on the above.

@thatch45
Copy link
Contributor

thatch45 commented Jun 6, 2016

Yes, as I recall there was a really good reason here. I think that we need to document why the systemd unit files are the way they are and what the right thing is to make them better, since I think that getting to the point where we are more closely integrated with systemd would be better

@wwentland
Copy link
Contributor Author

See #33665 (comment) for a bug that is caused by this option and a way to reproduce it.

@dmurphy18
Copy link
Contributor

@BABILEN The comment in #33665 appears to fix an issue with vagrant. The KillMode=process has been present in the code since approx. 2015.8.3 and not caused issues. I think whether to have the line or remove it might be better handled when Salt better integrates with systemd and notifications, which task #33803 covers.

@thatch45
Copy link
Contributor

thatch45 commented Jun 7, 2016

Yes, we have already been on this ride for a while, the main thing to do it get systemd notifications working correctly for salt so that every process notifies systemd,
basically, we have a high degree of confidence that this change will break other systems...

@wwentland
Copy link
Contributor Author

wwentland commented Jun 7, 2016

Yes, sure. Integration with systemd is a longer process, but why was KillMode=process set in the first place? To me it looks as if it was introduced to solve a very specific problem that does not apply to the salt master but the minion (namely minion upgrading itself) and, somehow, found its way into all unit files in a "KillMode=process all the things" process.

Yes, I have finally been able to provide one way to reliably reproduce the issue during the salt-master bootstrap and I don't see why that is specific to vagrant. And, even if it would be specific to vagrant it should be solved, shouldn't it? Why do you want to keep KillMode=process for the salt master? What purpose does it serve?

It looks like a race condition and presumably people are just not running into it because their nameservers are fast enough.

@thatch45 Why are you confident that it'll break other systems?

KillMode=process for the master was introduced in a PR that discusses the minion and nobody questioned it back then and it is not apparent to me why it would be needed. If it is necessary it should, naturally, be kept.

@wwentland
Copy link
Contributor Author

#27243 is another report that might be related.

@wwentland
Copy link
Contributor Author

@cachedout Do you know why KillMode=process was introduced for the salt-master or why it is needed (or not) ?

@cachedout
Copy link
Contributor

cachedout commented Jun 8, 2016

@BABILEN I'm afraid I don't. I traced the commit history back to here [https://github.com//pull/32857] and then the trail grows a little cold.

@thatch45
Copy link
Contributor

thatch45 commented Jun 8, 2016

I remember @terminalmage going off on why it was needed, he likely knows

@terminalmage
Copy link
Contributor

terminalmage commented Jun 8, 2016

@gtmanfred is the one that discovered the need for this, I believe. Specifically it was done to address the fact that when a package is updated, systemd's default way of bringing down the service is to send an immediate SIGTERM to all procs in the cgroup (see here).

This has unfortunate side effects when upgrading the salt-minion package, as the salt-minion service (and thus the package manager command it was running) is killed in the middle of operation, resulting in a corrupted package database in most cases.

I'm not sure if it is necessary to do this for the salt-master service as well, it doesn't seem necessary to me. I'm not sure why this change was made to anything but the salt-minion service unit.

@gtmanfred
Copy link
Contributor

Should only have been needed on the minion to make sure that the original
process lives and returns.

I never investigated the implications on the master.
On Wed, Jun 8, 2016 at 5:39 PM Erik Johnson notifications@github.com
wrote:

@gtmanfred https://github.com/gtmanfred is the one that discovered the
need for this, I believe. Specifically it was done to address the fact that
when a package is updated, systemd's default way of bringing down the
service is to send a sigterm to all procs (see here
https://www.freedesktop.org/software/systemd/man/systemd.kill.html#KillMode=
).

This has unfortunate side effects when upgrading the salt-minion package,
as the salt-minion service (and thus the package manager command it was
running) is killed in the middle of operation, resulting in a corrupted
package database in most cases.

I'm not sure if it is necessary to do this for the salt-master service as
well, it doesn't seem necessary to me. I'm not sure why this change was
made to anything but the salt-minion service unit.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33792 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAssocK5t38Fwf83MjWNwTkT78WU95ZNks5qJ0SzgaJpZM4Iu75X
.

@wwentland
Copy link
Contributor Author

wwentland commented Jun 9, 2016

@terminalmage, @gtmanfred, Great, thank you for confirming.

It should be mentioned that it is also set for salt-api, but I haven't investigated the process management in that case, so can't say if it should or should not have KillMode=process.

@terminalmage
Copy link
Contributor

It looks like we are pretty much in agreement then that the KillMode change should never have been made anywhere but the minion.

@thatch45
Copy link
Contributor

thatch45 commented Jun 9, 2016

So this is good to merge then?

@terminalmage
Copy link
Contributor

terminalmage commented Jun 9, 2016

Not quite, @BABILEN should also make this change in the following files:

pkg/arch/salt-master.service
pkg/deb/salt-api.service
pkg/deb/salt-master.service
pkg/rpm/salt-master.service
pkg/salt-master.service

Additionally, we need to make sure that @dmurphy18 propagates these changes as necessary to salt-pack for subsequent package builds.

@wwentland
Copy link
Contributor Author

wwentland commented Jun 9, 2016

@terminalmage Most of those files are simply symlinks to ./pkg/salt-master.service which is being changed alongside with pkg/deb/salt-master.service in this PR.

 ~/src/github.com/saltstack/salt $ find . -name "*salt-master.service*" -exec ls -l {} \;
lrwxrwxrwx 1 babilen babilen 22 May 29 16:03 ./pkg/arch/salt-master.service -> ../salt-master.service
-rw-r--r-- 1 babilen babilen 248 Jun  9 20:09 ./pkg/deb/salt-master.service
-rw-r--r-- 1 babilen babilen 190 Jun  9 20:09 ./pkg/salt-master.service
lrwxrwxrwx 1 babilen babilen 22 May 29 16:03 ./pkg/rpm/salt-master.service -> ../salt-master.service

I will make a different PR for salt-api as I'd prefer to keep this one specific to the salt-master.

@terminalmage
Copy link
Contributor

OK, I hadn't looked, I just did a recursive grep. Good to know. I'll merge, then.

@terminalmage terminalmage merged commit 62edce6 into saltstack:develop Jun 9, 2016
@dmurphy18
Copy link
Contributor

@terminalmage @BABILEN I shall propagate the changes in salt-pack, about to package 2016.3.1 and I'll get the changes into that point release.

@terminalmage
Copy link
Contributor

@dmurphy18 remember to get KillMode removed from salt-api.service as well.

@dmurphy18
Copy link
Contributor

@terminalmage will do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

systemd's service file should use the 'process' KillMode option on Debian also
7 participants