Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

salt-minion restart causes all spawned daemons to die on centos7 (systemd) #22993

Closed
jetpak opened this issue Apr 23, 2015 · 11 comments
Closed
Labels
Bug broken, incorrect, or confusing behavior fixed-pls-verify fix is linked, bug author to confirm fix P1 Priority 1 Platform Relates to OS, containers, platform-based utilities like FS, system based apps severity-high 2nd top severity, seen by most users, causes major problems
Milestone

Comments

@jetpak
Copy link

jetpak commented Apr 23, 2015

Symptom: restart (or stop) of salt-minion on my systemd-based centos 7 box caused all my daemon processes that were started by the salt-minion to also be killed

Cause: looks like this is related to how systemd kills off all processes in the "control-group" of a service "unit" - in this case the salt-minion service

Suggested fix: change the [Service] section for /usr/lib/systemd/system/salt-minion.service to add:
KillMode=process

which, i believe, means when shutting down salt-minion, only send the kill signal to the salt-minion process itself - leave other spawned daemon processes alone

this fixed the issue for us

@jfindlay jfindlay added the info-needed waiting for more info label Apr 23, 2015
@jfindlay jfindlay added this to the Blocked milestone Apr 23, 2015
@jfindlay
Copy link
Contributor

@jetpak, what is the output of salt --versions-report? Thanks.

@jfindlay jfindlay self-assigned this Apr 23, 2015
@jetpak
Copy link
Author

jetpak commented Apr 23, 2015

the salt-master is at 2015.2.0rc2 and the salt-minions are at 2014.7.1-1.el7

here is the version report:

salt --versions-report

       Salt: 2015.2.0rc2
     Python: 2.7.5 (default, Jun 17 2014, 18:11:42)
     Jinja2: 2.7.2
   M2Crypto: 0.21.1

msgpack-python: 0.4.4
msgpack-pure: Not Installed
pycrypto: 2.6.1
libnacl: 1.4.2
PyYAML: 3.10
ioflo: Not Installed
PyZMQ: 14.3.1
RAET: Not Installed
ZMQ: 3.2.5
Mako: Not Installed

On Thu, Apr 23, 2015 at 11:04 AM, Justin Findlay notifications@github.com
wrote:

@jetpak https://github.com/jetpak, what is the output of salt
--versions-report? Thanks.


Reply to this email directly or view it on GitHub
#22993 (comment).

Jerome Park, Sr. DevOps Engineer
Phone: +1 408.480.3376
Email: jerome@symphony.com

@jfindlay jfindlay added Platform Relates to OS, containers, platform-based utilities like FS, system based apps Bug broken, incorrect, or confusing behavior severity-high 2nd top severity, seen by most users, causes major problems P1 Priority 1 and removed info-needed waiting for more info labels Apr 23, 2015
@jfindlay jfindlay modified the milestones: Lithium, Blocked Apr 23, 2015
@jfindlay jfindlay removed their assignment Apr 23, 2015
@jfindlay jfindlay modified the milestones: Approved, Lithium Apr 23, 2015
@jfindlay jfindlay self-assigned this Apr 23, 2015
@jfindlay
Copy link
Contributor

@jetpak, thanks for the report.

@jfindlay
Copy link
Contributor

@jetpak, will you tell me how and what daemon processes you're creating with the minion? Thanks.

@jetpak
Copy link
Author

jetpak commented Apr 23, 2015

we use a real variety of servers and server clusters: solr, hadoop, mongo,
etc. - not all have systemd startup scripts, so we start them as server
groups using salt

some are started and become daemons themselves, some are invoked with
nohup+&

in all cases, these daemons have a ppid of 1

i have tried (unsuccessfully) the idiom of starting daemons like restarting
salt-minions:
http://docs.saltstack.com/en/latest/faq.html#what-is-the-best-way-to-restart-a-salt-daemon-using-salt

systemd - is truly a weird beast...

On Thu, Apr 23, 2015 at 12:44 PM, Justin Findlay notifications@github.com
wrote:

@jetpak https://github.com/jetpak, will you tell me how and what daemon
processes you're creating with the minion? Thanks.


Reply to this email directly or view it on GitHub
#22993 (comment).

Jerome Park, Sr. DevOps Engineer
Phone: +1 408.480.3376
Email: jerome@symphony.com

@jfindlay
Copy link
Contributor

@jetpak, I can agree with you there. :-)

@jfindlay jfindlay added the fixed-pls-verify fix is linked, bug author to confirm fix label Apr 23, 2015
@eliasp
Copy link
Contributor

eliasp commented Apr 25, 2015

we use a real variety of servers and server clusters: solr, hadoop, mongo,
etc. - not all have systemd startup scripts, so we start them as server
groups using salt
some are started and become daemons themselves, some are invoked with
nohup+&

@jetpak: Do I understand this correctly: you're using a cmd.run state to start those services?

@jetpak
Copy link
Author

jetpak commented Apr 25, 2015

correct, cmd.run combined with nohup + &

On Sat, Apr 25, 2015 at 4:19 AM, Elias Probst notifications@github.com
wrote:

we use a real variety of servers and server clusters: solr, hadoop, mongo,
etc. - not all have systemd startup scripts, so we start them as server
groups using salt
some are started and become daemons themselves, some are invoked with
nohup+&

@jetpak https://github.com/jetpak: Do I understand this correctly:
you're using a cmd.run state to start those services?


Reply to this email directly or view it on GitHub
#22993 (comment).

Jerome Park, Sr. DevOps Engineer
Phone: +1 408.480.3376
Email: jerome@symphony.com

@eliasp
Copy link
Contributor

eliasp commented Apr 25, 2015

@jetpak Ok, please don't do this - that's not how services are supposed to be spawned and controlled.
They'll inherit salt-minion's process hierarchy and environment, while a a properly running system service should start from an untainted environment.
Besides that, running a service under the supervision of systemd will give you things like:

  • a deterministic/clean/untainted service/process environment
  • proper service control (also outside of SaltStack) for stopping/restarting/inspecting the service
  • automatic crash recovery
  • a proper status test of the service
  • logging to journald
  • inter-service dependencies
  • proper resource control through systemd's per-service cgroup encapsulation
  • security features like network isolation, private /tmp dirs, capability restrictions, …

Just do this:

@houming818
Copy link

I changed the service file to this. and It will fix it just now.

[Unit]
Description=The Salt Minion
Documentation=man:salt-minion(1) file:///usr/share/doc/salt/html/contents.html https://docs.saltstack.com/en/latest/contents.html
After=network.target salt-master.service

[Service]
Type=notify
PIDFile=/var/run/salt-minion.pid
KillMode=process
NotifyAccess=all
LimitNOFILE=8192
ExecStart=/usr/bin/salt-minion

[Install]
WantedBy=multi-user.target
```

@austinpapp
Copy link
Contributor

@stduolc i would recommend you remove KillMode=process. Without significant testing, you may run into a regression here.

i would first look into running any long running scripts/processes correctly with systemd. Any transient service can be executed via systemd-run. in fact, we've found that >229 allows for some flexibility on starting this transient service easily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior fixed-pls-verify fix is linked, bug author to confirm fix P1 Priority 1 Platform Relates to OS, containers, platform-based utilities like FS, system based apps severity-high 2nd top severity, seen by most users, causes major problems
Projects
None yet
Development

No branches or pull requests

5 participants