Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use KillMode=process for salt-minion.service #44427

Merged
merged 1 commit into from
Jan 3, 2018
Merged

use KillMode=process for salt-minion.service #44427

merged 1 commit into from
Jan 3, 2018

Conversation

samodid
Copy link
Contributor

@samodid samodid commented Nov 7, 2017

What does this PR do?

Set KillMode=process for salt-minion systemd service.

What issues does this PR fix or reference?

With KillMode=process systemd will not kill all process in same group. (default is KillMode=control-group, man 5 systemd.kill )
That allow shutdown(upgrade) salt-minion without side effects.

Previous Behavior

Systemd kill all processes in same group.

New Behavior

only the main process itself is killed.

Tests written?

No

Commits signed with GPG?

No

Please review Salt's Contributing Guide for best practices.

See GitHub's page on GPG signing for more information about signing commits with GPG.

@vsamodidvs
Copy link

I think I need to add more description to attract attention :)

As you know systemd lock service processes in cgroup, and all child processes. And default behavior to during service stop/restart - kill all processes in that group( sometimes really annoying )

You can easily run in situation then systemd kills process that you do not expect.

# salt lxc-dev-co7-smaster-01 cmd.run 'dhclient -nw eth0' --show-jid -l quiet
jid: 20171109162243596210
lxc-dev-co7-smaster-01:


-------------------------------------------
Summary
-------------------------------------------
# of minions targeted: 1
# of minions returned: 1
# of minions that did not return: 0
# of minions with errors: 0
-------------------------------------------

check for dhclient process

# pgrep dhclient
15290
# grep 15290 /sys/fs/cgroup/systemd/system.slice/salt-minion.service/cgroup.procs 
15290

Stop salt-minion

# systemctl stop salt-minion
# grep 15290 /sys/fs/cgroup/systemd/system.slice/salt-minion.service/cgroup.procs 
grep: /sys/fs/cgroup/systemd/system.slice/salt-minion.service/cgroup.procs: No such file or directory
# pgrep dhclient
# 

it's not comfortably to use systemd-run for such processes, especially if we have that option for suse packages

https://github.com/saltstack/salt/blob/develop/pkg/suse/salt-minion.service#L10

@cachedout could you take a look ?

@cachedout cachedout requested review from gtmanfred and removed request for terminalmage November 10, 2017 16:26
@cachedout
Copy link
Contributor

@gtmanfred Can you please review this?

Copy link
Contributor

@gtmanfred gtmanfred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with doing this. But the last time we did this, I was told we ran into problems so we removed it, which is why @terminalmage removed it when he wrote all the systemd-run stuff to reparent the dpkg/apt and rpm/yum commands into a seperate scope so that it would not get killed in the middle of the run.

I would still say that erik needs to review it, then we need to make sure that @dmurphy18 knows about it to include it in the new packages in salt-pack.

@dmurphy18
Copy link
Contributor

dmurphy18 commented Nov 13, 2017

I want Erik's review given he has better knowledge on the issue.
Testing upgrade path appeared to work fine.

Copy link
Contributor

@dmurphy18 dmurphy18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want Erik's review given he has better knowledge on the issue. Testing upgrade path appeared to work fine.
I have no issues with the change

@samodid
Copy link
Contributor Author

samodid commented Nov 22, 2017

Hi guys.
Just want to update topic and try to move it. We use workaround - we have states which extend salt-minion.service

# cat /etc/systemd/system/salt-minion.service.d/salt-minion.conf
[Service]
KillMode=process

With that we achieve same behavior. It works for us and we do not have issues.

It will be good to have that option from packages.

@dmurphy18
Copy link
Contributor

@terminalmage do you have any opinions on this PR

@terminalmage
Copy link
Contributor

I honestly don't know what the best way to go is on this. I believe that we removed it because having it in place meant that Salt relied on our own signal handling to bring down the processes after the main PID exited, and there were certain circumstances in which they didn't.

I wouldn't recommend adding it back until it is thoroughly tested and we can confirm that we aren't leaving extra processes running.

@samodid
Copy link
Contributor Author

samodid commented Nov 28, 2017

@terminalmage
Thanks for reply.

I do not know all history to advocate, but if it could be an issue why do you enable that option for SUSE?have just checker rpm in salt repo
http://repo.saltstack.com/opensuse/SLE_12_SP3/x86_64/salt-minion-2016.11.4-13.5.x86_64.rpm

part of salt-minion.service from rpm

[Service]
Type=notify
NotifyAccess=all
LimitNOFILE=8192
ExecStart=/usr/bin/salt-minion
KillMode=process
Restart=on-failure
RestartSec=15

What test do you need to approve that change ?
Are any way how I can help to test ?

@gtmanfred
Copy link
Contributor

We do not manage the suse packages. SUSE packages those, and then we just pull them down to our repo page.

Daniel

@samodid
Copy link
Contributor Author

samodid commented Nov 28, 2017

ok, that adds more input for me

@terminalmage terminalmage removed their request for review December 5, 2017 14:28
@rallytime
Copy link
Contributor

I agree with @terminalmage on this one. Especially in regards to:

I wouldn't recommend adding it back until it is thoroughly tested and we can confirm that we aren't leaving extra processes running.

Since we have gone back and forth on this line somewhat recently, I'd really like to make sure this is thoroughly tested before making any changes so as not to cause regressions. I tracked down the PR that originally explains the issue that removing this line was attempting to solve. It's in #36806.

@rallytime
Copy link
Contributor

I followed up on this with @terminalmage today. @frogunder has tested this and given it the green light. Let's get this in. This also fixes #43340.

@dmurphy18 - FYI. This is going in here to develop, but I am also going to back-port it for 2017.7 as it fixes a blocking bug there.

Thanks @samodid :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ZZZ[Done]-back-ported-bf RETIRED The pull request has been back-ported to an older branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants