New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

salt-minion restart causes all spawned daemons to die on centos7 (systemd) #22993

Closed
jetpak opened this Issue Apr 23, 2015 · 11 comments

Comments

Projects
None yet
5 participants
@jetpak

jetpak commented Apr 23, 2015

Symptom: restart (or stop) of salt-minion on my systemd-based centos 7 box caused all my daemon processes that were started by the salt-minion to also be killed

Cause: looks like this is related to how systemd kills off all processes in the "control-group" of a service "unit" - in this case the salt-minion service

Suggested fix: change the [Service] section for /usr/lib/systemd/system/salt-minion.service to add:
KillMode=process

which, i believe, means when shutting down salt-minion, only send the kill signal to the salt-minion process itself - leave other spawned daemon processes alone

this fixed the issue for us

@jfindlay jfindlay added this to the Blocked milestone Apr 23, 2015

@jfindlay

This comment has been minimized.

Show comment
Hide comment
@jfindlay

jfindlay Apr 23, 2015

Contributor

@jetpak, what is the output of salt --versions-report? Thanks.

Contributor

jfindlay commented Apr 23, 2015

@jetpak, what is the output of salt --versions-report? Thanks.

@jfindlay jfindlay self-assigned this Apr 23, 2015

@jetpak

This comment has been minimized.

Show comment
Hide comment
@jetpak

jetpak Apr 23, 2015

the salt-master is at 2015.2.0rc2 and the salt-minions are at 2014.7.1-1.el7

here is the version report:

salt --versions-report

       Salt: 2015.2.0rc2
     Python: 2.7.5 (default, Jun 17 2014, 18:11:42)
     Jinja2: 2.7.2
   M2Crypto: 0.21.1

msgpack-python: 0.4.4
msgpack-pure: Not Installed
pycrypto: 2.6.1
libnacl: 1.4.2
PyYAML: 3.10
ioflo: Not Installed
PyZMQ: 14.3.1
RAET: Not Installed
ZMQ: 3.2.5
Mako: Not Installed

On Thu, Apr 23, 2015 at 11:04 AM, Justin Findlay notifications@github.com
wrote:

@jetpak https://github.com/jetpak, what is the output of salt
--versions-report? Thanks.


Reply to this email directly or view it on GitHub
#22993 (comment).

Jerome Park, Sr. DevOps Engineer
Phone: +1 408.480.3376
Email: jerome@symphony.com

jetpak commented Apr 23, 2015

the salt-master is at 2015.2.0rc2 and the salt-minions are at 2014.7.1-1.el7

here is the version report:

salt --versions-report

       Salt: 2015.2.0rc2
     Python: 2.7.5 (default, Jun 17 2014, 18:11:42)
     Jinja2: 2.7.2
   M2Crypto: 0.21.1

msgpack-python: 0.4.4
msgpack-pure: Not Installed
pycrypto: 2.6.1
libnacl: 1.4.2
PyYAML: 3.10
ioflo: Not Installed
PyZMQ: 14.3.1
RAET: Not Installed
ZMQ: 3.2.5
Mako: Not Installed

On Thu, Apr 23, 2015 at 11:04 AM, Justin Findlay notifications@github.com
wrote:

@jetpak https://github.com/jetpak, what is the output of salt
--versions-report? Thanks.


Reply to this email directly or view it on GitHub
#22993 (comment).

Jerome Park, Sr. DevOps Engineer
Phone: +1 408.480.3376
Email: jerome@symphony.com

@jfindlay jfindlay modified the milestones: Lithium, Blocked Apr 23, 2015

@jfindlay jfindlay removed their assignment Apr 23, 2015

@jfindlay jfindlay modified the milestones: Approved, Lithium Apr 23, 2015

@jfindlay jfindlay self-assigned this Apr 23, 2015

@jfindlay

This comment has been minimized.

Show comment
Hide comment
@jfindlay

jfindlay Apr 23, 2015

Contributor

@jetpak, thanks for the report.

Contributor

jfindlay commented Apr 23, 2015

@jetpak, thanks for the report.

@jfindlay

This comment has been minimized.

Show comment
Hide comment
@jfindlay

jfindlay Apr 23, 2015

Contributor

@jetpak, will you tell me how and what daemon processes you're creating with the minion? Thanks.

Contributor

jfindlay commented Apr 23, 2015

@jetpak, will you tell me how and what daemon processes you're creating with the minion? Thanks.

@jetpak

This comment has been minimized.

Show comment
Hide comment
@jetpak

jetpak Apr 23, 2015

we use a real variety of servers and server clusters: solr, hadoop, mongo,
etc. - not all have systemd startup scripts, so we start them as server
groups using salt

some are started and become daemons themselves, some are invoked with
nohup+&

in all cases, these daemons have a ppid of 1

i have tried (unsuccessfully) the idiom of starting daemons like restarting
salt-minions:
http://docs.saltstack.com/en/latest/faq.html#what-is-the-best-way-to-restart-a-salt-daemon-using-salt

systemd - is truly a weird beast...

On Thu, Apr 23, 2015 at 12:44 PM, Justin Findlay notifications@github.com
wrote:

@jetpak https://github.com/jetpak, will you tell me how and what daemon
processes you're creating with the minion? Thanks.


Reply to this email directly or view it on GitHub
#22993 (comment).

Jerome Park, Sr. DevOps Engineer
Phone: +1 408.480.3376
Email: jerome@symphony.com

jetpak commented Apr 23, 2015

we use a real variety of servers and server clusters: solr, hadoop, mongo,
etc. - not all have systemd startup scripts, so we start them as server
groups using salt

some are started and become daemons themselves, some are invoked with
nohup+&

in all cases, these daemons have a ppid of 1

i have tried (unsuccessfully) the idiom of starting daemons like restarting
salt-minions:
http://docs.saltstack.com/en/latest/faq.html#what-is-the-best-way-to-restart-a-salt-daemon-using-salt

systemd - is truly a weird beast...

On Thu, Apr 23, 2015 at 12:44 PM, Justin Findlay notifications@github.com
wrote:

@jetpak https://github.com/jetpak, will you tell me how and what daemon
processes you're creating with the minion? Thanks.


Reply to this email directly or view it on GitHub
#22993 (comment).

Jerome Park, Sr. DevOps Engineer
Phone: +1 408.480.3376
Email: jerome@symphony.com

jfindlay added a commit to jfindlay/salt that referenced this issue Apr 23, 2015

set systemd service killMode to process for minion
Fixes #22993.

The change is only made for the minion process because, theoretically,
only the minion could create the problem described.  salt-master and
salt-syndic do not theoretically spawn non-salt processes during the
lifetime of their processes, whereas salt-minion does this by design.

The default behavior for systemd killMode seems to be control-group,
which means all processes that share the same control group as the
minion process will also be killed by systemd when the minion service is
stopped (killed).

It is reasonable to expect that activity done on a system by a salt
minion should persist beyond the lifetime of the minion process, so
let's not kill procs that the minion starts even when the minion exits.
@jfindlay

This comment has been minimized.

Show comment
Hide comment
@jfindlay

jfindlay Apr 23, 2015

Contributor

@jetpak, I can agree with you there. :-)

Contributor

jfindlay commented Apr 23, 2015

@jetpak, I can agree with you there. :-)

@eliasp

This comment has been minimized.

Show comment
Hide comment
@eliasp

eliasp Apr 25, 2015

Member

we use a real variety of servers and server clusters: solr, hadoop, mongo,
etc. - not all have systemd startup scripts, so we start them as server
groups using salt
some are started and become daemons themselves, some are invoked with
nohup+&

@jetpak: Do I understand this correctly: you're using a cmd.run state to start those services?

Member

eliasp commented Apr 25, 2015

we use a real variety of servers and server clusters: solr, hadoop, mongo,
etc. - not all have systemd startup scripts, so we start them as server
groups using salt
some are started and become daemons themselves, some are invoked with
nohup+&

@jetpak: Do I understand this correctly: you're using a cmd.run state to start those services?

@jetpak

This comment has been minimized.

Show comment
Hide comment
@jetpak

jetpak Apr 25, 2015

correct, cmd.run combined with nohup + &

On Sat, Apr 25, 2015 at 4:19 AM, Elias Probst notifications@github.com
wrote:

we use a real variety of servers and server clusters: solr, hadoop, mongo,
etc. - not all have systemd startup scripts, so we start them as server
groups using salt
some are started and become daemons themselves, some are invoked with
nohup+&

@jetpak https://github.com/jetpak: Do I understand this correctly:
you're using a cmd.run state to start those services?


Reply to this email directly or view it on GitHub
#22993 (comment).

Jerome Park, Sr. DevOps Engineer
Phone: +1 408.480.3376
Email: jerome@symphony.com

jetpak commented Apr 25, 2015

correct, cmd.run combined with nohup + &

On Sat, Apr 25, 2015 at 4:19 AM, Elias Probst notifications@github.com
wrote:

we use a real variety of servers and server clusters: solr, hadoop, mongo,
etc. - not all have systemd startup scripts, so we start them as server
groups using salt
some are started and become daemons themselves, some are invoked with
nohup+&

@jetpak https://github.com/jetpak: Do I understand this correctly:
you're using a cmd.run state to start those services?


Reply to this email directly or view it on GitHub
#22993 (comment).

Jerome Park, Sr. DevOps Engineer
Phone: +1 408.480.3376
Email: jerome@symphony.com

@eliasp

This comment has been minimized.

Show comment
Hide comment
@eliasp

eliasp Apr 25, 2015

Member

@jetpak Ok, please don't do this - that's not how services are supposed to be spawned and controlled.
They'll inherit salt-minion's process hierarchy and environment, while a a properly running system service should start from an untainted environment.
Besides that, running a service under the supervision of systemd will give you things like:

  • a deterministic/clean/untainted service/process environment
  • proper service control (also outside of SaltStack) for stopping/restarting/inspecting the service
  • automatic crash recovery
  • a proper status test of the service
  • logging to journald
  • inter-service dependencies
  • proper resource control through systemd's per-service cgroup encapsulation
  • security features like network isolation, private /tmp dirs, capability restrictions, …

Just do this:

Member

eliasp commented Apr 25, 2015

@jetpak Ok, please don't do this - that's not how services are supposed to be spawned and controlled.
They'll inherit salt-minion's process hierarchy and environment, while a a properly running system service should start from an untainted environment.
Besides that, running a service under the supervision of systemd will give you things like:

  • a deterministic/clean/untainted service/process environment
  • proper service control (also outside of SaltStack) for stopping/restarting/inspecting the service
  • automatic crash recovery
  • a proper status test of the service
  • logging to journald
  • inter-service dependencies
  • proper resource control through systemd's per-service cgroup encapsulation
  • security features like network isolation, private /tmp dirs, capability restrictions, …

Just do this:

@stduolc

This comment has been minimized.

Show comment
Hide comment
@stduolc

stduolc Oct 25, 2017

I changed the service file to this. and It will fix it just now.

[Unit]
Description=The Salt Minion
Documentation=man:salt-minion(1) file:///usr/share/doc/salt/html/contents.html https://docs.saltstack.com/en/latest/contents.html
After=network.target salt-master.service

[Service]
Type=notify
PIDFile=/var/run/salt-minion.pid
KillMode=process
NotifyAccess=all
LimitNOFILE=8192
ExecStart=/usr/bin/salt-minion

[Install]
WantedBy=multi-user.target
```

stduolc commented Oct 25, 2017

I changed the service file to this. and It will fix it just now.

[Unit]
Description=The Salt Minion
Documentation=man:salt-minion(1) file:///usr/share/doc/salt/html/contents.html https://docs.saltstack.com/en/latest/contents.html
After=network.target salt-master.service

[Service]
Type=notify
PIDFile=/var/run/salt-minion.pid
KillMode=process
NotifyAccess=all
LimitNOFILE=8192
ExecStart=/usr/bin/salt-minion

[Install]
WantedBy=multi-user.target
```
@austinpapp

This comment has been minimized.

Show comment
Hide comment
@austinpapp

austinpapp Oct 25, 2017

Contributor

@stduolc i would recommend you remove KillMode=process. Without significant testing, you may run into a regression here.

i would first look into running any long running scripts/processes correctly with systemd. Any transient service can be executed via systemd-run. in fact, we've found that >229 allows for some flexibility on starting this transient service easily.

Contributor

austinpapp commented Oct 25, 2017

@stduolc i would recommend you remove KillMode=process. Without significant testing, you may run into a regression here.

i would first look into running any long running scripts/processes correctly with systemd. Any transient service can be executed via systemd-run. in fact, we've found that >229 allows for some flexibility on starting this transient service easily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment