You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
salt-master has stopped responding in my environment. I have ~220 minions on single salt-master with lot of resources. I don't see file descriptor being an issue. Only ~99 fd are used by master process and overall 9k used system wide.
Setup
We are not using any custom configs and not SLS. We did have one disk full no space left event. After which we have removed files and space used is around 80%...20% (~5G) is still available. And did restart the salt-master afterwards. However, its again hanging (logs below)
Steps to Reproduce Issue
Here's salt stack trace generated. ======== Salt Debug Stack Trace ========= File "/usr/bin/salt-master", line 22, in <module> salt_master() File "/usr/lib/python2.7/site-packages/salt/scripts.py", line 47, in salt_master master.start() File "/usr/lib/python2.7/site-packages/salt/cli/daemons.py", line 207, in start self.master.start() File "/usr/lib/python2.7/site-packages/salt/master.py", line 589, in start self.process_manager.add_process(self.run_reqserver, kwargs=kwargs, name='ReqServer') File "/usr/lib/python2.7/site-packages/salt/utils/process.py", line 319, in add_process process.start() File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/usr/lib64/python2.7/multiprocessing/forking.py", line 126, in __init__ code = process_obj._bootstrap() File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python2.7/site-packages/salt/master.py", line 502, in run_reqserver reqserv.run() File "/usr/lib/python2.7/site-packages/salt/utils/process.py", line 616, in _run return self._original_run() File "/usr/lib/python2.7/site-packages/salt/master.py", line 723, in run self.__bind() File "/usr/lib/python2.7/site-packages/salt/master.py", line 715, in __bind name=name File "/usr/lib/python2.7/site-packages/salt/utils/process.py", line 317, in add_process process.start() File "/usr/lib/python2.7/site-packages/salt/utils/process.py", line 662, in start super(SignalHandlingMultiprocessingProcess, self).start() File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/usr/lib64/python2.7/multiprocessing/forking.py", line 126, in __init__ code = process_obj._bootstrap() File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/site-packages/salt/utils/process.py", line 616, in _run return self._original_run() File "/usr/lib/python2.7/site-packages/salt/master.py", line 886, in run self.__bind() File "/usr/lib/python2.7/site-packages/salt/master.py", line 816, in __bind self.io_loop.start() File "/usr/lib64/python2.7/site-packages/zmq/eventloop/ioloop.py", line 162, in start super(ZMQIOLoop, self).start() File "/usr/lib64/python2.7/site-packages/tornado/ioloop.py", line 840, in start event_pairs = self._impl.poll(poll_timeout) File "/usr/lib64/python2.7/site-packages/zmq/eventloop/ioloop.py", line 122, in poll z_events = self._poller.poll(1000*timeout) File "/usr/lib64/python2.7/site-packages/zmq/sugar/poll.py", line 99, in poll return zmq_poll(self.sockets, timeout=timeout) =========================================
Versions Report
`salt --versions-report
Salt Version:
Salt: 2016.3.3
Dependency Versions:
cffi: Not Installed
cherrypy: 3.2.2
dateutil: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.7.2
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.8
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pygit2: Not Installed
Python: 2.7.5 (default, Nov 6 2016, 00:28:07)
python-gnupg: Not Installed
PyYAML: 3.10
PyZMQ: 15.3.0
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.1.4
System Versions:
dist: centos 7.3.1611 Core
machine: x86_64
release: 3.10.0-514.16.1.el7.x86_64
system: Linux
version: CentOS Linux 7.3.1611 Core`
The text was updated successfully, but these errors were encountered:
@amitsehgal Sounds like maybe the salt master didn't restart correctly. If you stop the process using the correct methods do you see rogue salt-master processes remaining?
@garethgreenaway Thanks for prompt response. master was responding fine untill disk space filled up. Also, I did restart salt-master and verified systemctl status salt-master confirms one parent and ~10 child processes. However, I also increased the disk space on my box yesterday. salt-master now looks happy. I would like to observe for few hours before I close issue.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.
Description of Issue/Question
salt-master has stopped responding in my environment. I have ~220 minions on single salt-master with lot of resources. I don't see file descriptor being an issue. Only ~99 fd are used by master process and overall 9k used system wide.
Setup
We are not using any custom configs and not SLS. We did have one disk full no space left event. After which we have removed files and space used is around 80%...20% (~5G) is still available. And did restart the salt-master afterwards. However, its again hanging (logs below)
Steps to Reproduce Issue
Here's salt stack trace generated.
======== Salt Debug Stack Trace ========= File "/usr/bin/salt-master", line 22, in <module> salt_master() File "/usr/lib/python2.7/site-packages/salt/scripts.py", line 47, in salt_master master.start() File "/usr/lib/python2.7/site-packages/salt/cli/daemons.py", line 207, in start self.master.start() File "/usr/lib/python2.7/site-packages/salt/master.py", line 589, in start self.process_manager.add_process(self.run_reqserver, kwargs=kwargs, name='ReqServer') File "/usr/lib/python2.7/site-packages/salt/utils/process.py", line 319, in add_process process.start() File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/usr/lib64/python2.7/multiprocessing/forking.py", line 126, in __init__ code = process_obj._bootstrap() File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python2.7/site-packages/salt/master.py", line 502, in run_reqserver reqserv.run() File "/usr/lib/python2.7/site-packages/salt/utils/process.py", line 616, in _run return self._original_run() File "/usr/lib/python2.7/site-packages/salt/master.py", line 723, in run self.__bind() File "/usr/lib/python2.7/site-packages/salt/master.py", line 715, in __bind name=name File "/usr/lib/python2.7/site-packages/salt/utils/process.py", line 317, in add_process process.start() File "/usr/lib/python2.7/site-packages/salt/utils/process.py", line 662, in start super(SignalHandlingMultiprocessingProcess, self).start() File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/usr/lib64/python2.7/multiprocessing/forking.py", line 126, in __init__ code = process_obj._bootstrap() File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/site-packages/salt/utils/process.py", line 616, in _run return self._original_run() File "/usr/lib/python2.7/site-packages/salt/master.py", line 886, in run self.__bind() File "/usr/lib/python2.7/site-packages/salt/master.py", line 816, in __bind self.io_loop.start() File "/usr/lib64/python2.7/site-packages/zmq/eventloop/ioloop.py", line 162, in start super(ZMQIOLoop, self).start() File "/usr/lib64/python2.7/site-packages/tornado/ioloop.py", line 840, in start event_pairs = self._impl.poll(poll_timeout) File "/usr/lib64/python2.7/site-packages/zmq/eventloop/ioloop.py", line 122, in poll z_events = self._poller.poll(1000*timeout) File "/usr/lib64/python2.7/site-packages/zmq/sugar/poll.py", line 99, in poll return zmq_poll(self.sockets, timeout=timeout) =========================================
Versions Report
`salt --versions-report
Salt Version:
Salt: 2016.3.3
Dependency Versions:
cffi: Not Installed
cherrypy: 3.2.2
dateutil: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.7.2
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.8
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pygit2: Not Installed
Python: 2.7.5 (default, Nov 6 2016, 00:28:07)
python-gnupg: Not Installed
PyYAML: 3.10
PyZMQ: 15.3.0
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.1.4
System Versions:
dist: centos 7.3.1611 Core
machine: x86_64
release: 3.10.0-514.16.1.el7.x86_64
system: Linux
version: CentOS Linux 7.3.1611 Core`
The text was updated successfully, but these errors were encountered: