Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zero mq is spinning - salt-master cpu 100% - not responding #41612

Closed
amitsehgal opened this issue Jun 6, 2017 · 4 comments
Closed

zero mq is spinning - salt-master cpu 100% - not responding #41612

amitsehgal opened this issue Jun 6, 2017 · 4 comments
Labels
info-needed waiting for more info stale
Milestone

Comments

@amitsehgal
Copy link

amitsehgal commented Jun 6, 2017

Description of Issue/Question

salt-master has stopped responding in my environment. I have ~220 minions on single salt-master with lot of resources. I don't see file descriptor being an issue. Only ~99 fd are used by master process and overall 9k used system wide.

Setup

We are not using any custom configs and not SLS. We did have one disk full no space left event. After which we have removed files and space used is around 80%...20% (~5G) is still available. And did restart the salt-master afterwards. However, its again hanging (logs below)

Steps to Reproduce Issue

Here's salt stack trace generated.
======== Salt Debug Stack Trace ========= File "/usr/bin/salt-master", line 22, in <module> salt_master() File "/usr/lib/python2.7/site-packages/salt/scripts.py", line 47, in salt_master master.start() File "/usr/lib/python2.7/site-packages/salt/cli/daemons.py", line 207, in start self.master.start() File "/usr/lib/python2.7/site-packages/salt/master.py", line 589, in start self.process_manager.add_process(self.run_reqserver, kwargs=kwargs, name='ReqServer') File "/usr/lib/python2.7/site-packages/salt/utils/process.py", line 319, in add_process process.start() File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/usr/lib64/python2.7/multiprocessing/forking.py", line 126, in __init__ code = process_obj._bootstrap() File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python2.7/site-packages/salt/master.py", line 502, in run_reqserver reqserv.run() File "/usr/lib/python2.7/site-packages/salt/utils/process.py", line 616, in _run return self._original_run() File "/usr/lib/python2.7/site-packages/salt/master.py", line 723, in run self.__bind() File "/usr/lib/python2.7/site-packages/salt/master.py", line 715, in __bind name=name File "/usr/lib/python2.7/site-packages/salt/utils/process.py", line 317, in add_process process.start() File "/usr/lib/python2.7/site-packages/salt/utils/process.py", line 662, in start super(SignalHandlingMultiprocessingProcess, self).start() File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/usr/lib64/python2.7/multiprocessing/forking.py", line 126, in __init__ code = process_obj._bootstrap() File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/site-packages/salt/utils/process.py", line 616, in _run return self._original_run() File "/usr/lib/python2.7/site-packages/salt/master.py", line 886, in run self.__bind() File "/usr/lib/python2.7/site-packages/salt/master.py", line 816, in __bind self.io_loop.start() File "/usr/lib64/python2.7/site-packages/zmq/eventloop/ioloop.py", line 162, in start super(ZMQIOLoop, self).start() File "/usr/lib64/python2.7/site-packages/tornado/ioloop.py", line 840, in start event_pairs = self._impl.poll(poll_timeout) File "/usr/lib64/python2.7/site-packages/zmq/eventloop/ioloop.py", line 122, in poll z_events = self._poller.poll(1000*timeout) File "/usr/lib64/python2.7/site-packages/zmq/sugar/poll.py", line 99, in poll return zmq_poll(self.sockets, timeout=timeout) =========================================

Versions Report

`salt --versions-report
Salt Version:
Salt: 2016.3.3

Dependency Versions:
cffi: Not Installed
cherrypy: 3.2.2
dateutil: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.7.2
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.8
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pygit2: Not Installed
Python: 2.7.5 (default, Nov 6 2016, 00:28:07)
python-gnupg: Not Installed
PyYAML: 3.10
PyZMQ: 15.3.0
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.1.4

System Versions:
dist: centos 7.3.1611 Core
machine: x86_64
release: 3.10.0-514.16.1.el7.x86_64
system: Linux
version: CentOS Linux 7.3.1611 Core`

@garethgreenaway garethgreenaway added the info-needed waiting for more info label Jun 6, 2017
@garethgreenaway
Copy link
Contributor

@amitsehgal Sounds like maybe the salt master didn't restart correctly. If you stop the process using the correct methods do you see rogue salt-master processes remaining?

@amitsehgal
Copy link
Author

amitsehgal commented Jun 7, 2017

@garethgreenaway Thanks for prompt response. master was responding fine untill disk space filled up. Also, I did restart salt-master and verified systemctl status salt-master confirms one parent and ~10 child processes. However, I also increased the disk space on my box yesterday. salt-master now looks happy. I would like to observe for few hours before I close issue.

@gtmanfred gtmanfred added this to the Blocked milestone Jun 7, 2017
@amitsehgal
Copy link
Author

amitsehgal commented Jul 12, 2017

@garethgreenaway The salt master is keeping up and i'm no longer seeing this issue. Thanks fir the prompt response and help.

@stale
Copy link

stale bot commented Nov 14, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

@stale stale bot added the stale label Nov 14, 2018
@stale stale bot closed this as completed Nov 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
info-needed waiting for more info stale
Projects
None yet
Development

No branches or pull requests

3 participants