New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Live salt-master Profiling with SIGUSR2 fails #24276

Closed
markuskramerIgitt opened this Issue May 31, 2015 · 13 comments

Comments

Projects
None yet
5 participants
@markuskramerIgitt
Contributor

markuskramerIgitt commented May 31, 2015

According to http://docs.saltstack.com/en/latest/topics/troubleshooting/master.html#live-salt-master-profiling
killall -SIGUSR2 salt-master
should turn on profiling.

Instead, the command turnes salt-master into a zombie process.


salt --versions-report
                  Salt: 2014.7.4
                Python: 2.7.3 (default, Mar 13 2014, 11:03:55)
                Jinja2: 2.6
              M2Crypto: 0.21.1
        msgpack-python: 0.1.10
          msgpack-pure: Not Installed
              pycrypto: 2.6
               libnacl: Not Installed
                PyYAML: 3.10
                 ioflo: Not Installed
                 PyZMQ: 13.1.0
                  RAET: Not Installed
                   ZMQ: 3.2.3
                  Mako: 0.7.0
 Debian source package: 2014.7.4+ds-1~bpo70+1

@jacksontj

This comment has been minimized.

Contributor

jacksontj commented Jun 2, 2015

If you look in the master logs, were there any exceptions? I've used this feature a few times and haven't had issues with it.

@markuskramerIgitt

This comment has been minimized.

Contributor

markuskramerIgitt commented Jun 3, 2015

/var/log/salt/master was empty

I started the salt-master in the foreground with
sudo service salt-master stop && sudo salt-master -l info
and got a to-screen logging that partly also goes to /var/log/salt/master

I issued
killall -SIGUSR2 salt-master

On the screen I instantly got

Process Reactor-4:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/dist-packages/salt/utils/event.py", line 687, in run
    for data in self.event.iter_events(full=True):
  File "/usr/lib/python2.7/dist-packages/salt/utils/event.py", line 374, in iter_events
    data = self.get_event(tag=tag, full=full)
  File "/usr/lib/python2.7/dist-packages/salt/utils/event.py", line 345, in get_event
    ret = self._get_event(wait, tag, pending_tags)
  File "/usr/lib/python2.7/dist-packages/salt/utils/event.py", line 288, in _get_event
    socks = dict(self.poller.poll(wait * 1000))
  File "/usr/lib/python2.7/dist-packages/zmq/sugar/poll.py", line 97, in poll
    return zmq_poll(list(self.sockets.items()), timeout=timeout)
  File "_poll.pyx", line 116, in zmq.core._poll.zmq_poll (zmq/core/_poll.c:1598)
  File "checkrc.pxd", line 21, in zmq.core.checkrc._check_rc (zmq/core/_poll.c:1965)
ZMQError: Interrupted system call

/var/log/salt/master does not contain the above traceback, only a KeyboardInterrupt Exception.

One salt-master process turned into a zombie

I stopped the salt-master service
sudo service salt-master stop

And started the salt-master service
sudo service salt-master start

jacksontj added a commit to jacksontj/salt that referenced this issue Jun 4, 2015

Fix for saltstack#24276
poller.poll ends up calling the poll system call under the hood. If the process which called poll() gets a signal it will raise `ZMQError: Interrupted system call` which we should catch and move continue.
@jacksontj

This comment has been minimized.

Contributor

jacksontj commented Jun 4, 2015

Relatively easy fix (#24405). Basically that process that threw the backtrace was the reactor. That process sits and watches the event but and then reacts to those events. The problem was that the call to poll() wasn't within the try/except block, meaning that if it got a signal while calling poll() it would throw this exception (which for a relatively quiet master-- this is where it will spend all of its time).

Generally speaking I'd suggest being more specific with the SIGUSR2 (catching the process you intend) unless the goal is to get all of the profiling data-- which this will do. In case you didn't know if you install python-setproctitle it will actually set the name (in ps) for each salt-master process to what it actually does.

thatch45 added a commit that referenced this issue Jun 4, 2015

@cachedout

This comment has been minimized.

Contributor

cachedout commented Jun 5, 2015

Nice catch, @jacksontj !

@jfindlay

This comment has been minimized.

Contributor

jfindlay commented Jun 5, 2015

@markuskramerIgitt, close this issue if you think @jacksontj's fix solved your problem, thanks.

@markuskramerIgitt

This comment has been minimized.

Contributor

markuskramerIgitt commented Jun 6, 2015

Hello @jacksontj,
thank you a lot for the fix!
I indeed use reactor and orchestration.

Thank you also for your suggestion to send SIGUSR2 only to one process.
I can easily identify the only process that causes high load because it has 17-19 threads and consumes about 3 times as much CPU than all other salt-master processes, which only have 7-9 threads, each.
(Over 24h we generally need to restart the salt-master because at that time it remains at 100% CPU while it does not need much CPU after restart) I hope to get more clarity through profiling. What can I do for clarification in the meantime?

Thank you a third time for pointing to setproctitle, which I never heard of.
I look forward to use it, but I have to install it first.

Are these the correct steps for installation on Debian? Do I have to configure something?

sudo apt-get install python-setuptools
sudo easy_install setproctitle
@markuskramerIgitt

This comment has been minimized.

Contributor

markuskramerIgitt commented Jun 6, 2015

Hello @jfindlay
Which Salt version will contain the fix from @jacksontj?

@jfindlay

This comment has been minimized.

Contributor

jfindlay commented Jun 6, 2015

@markuskramerIgitt, the next release will be 2015.5.3, which will come out in about a month.

@rallytime

This comment has been minimized.

Contributor

rallytime commented Sep 9, 2015

@markuskramerIgitt Do you consider this issue resolved, now that 2015.5.3 (and 2015.5.5) have been released? Or does more work need to be done here?

@markuskramerIgitt

This comment has been minimized.

Contributor

markuskramerIgitt commented Sep 13, 2015

Hi @rallytime, thank you for the reminder.
With 2015.5.3 (this is the highest packaged version on Debian) I see an improvement, but the issue is not resolved.
The salt-master no longer turns into a zombie but I cannot find the yuppi result.

I reread https://docs.saltstack.com/en/latest/topics/troubleshooting/master.html#live-salt-master-profiling

Running a background salt-master
Issuing sudo killall -SIGUSR2 salt-master
The salt-master continues to work.
Issuing some salt commands
Issuing a second sudo killall -SIGUSR2 salt-master
There is no file at /tmp.
/var/log/salt/master is empty

Running a foreground salt-master by issuing
sudo service salt-master stop && sudo salt-master -l info
Issuing sudo killall -SIGUSR2 salt-master
The foreground salt-master does not "report filename for the results"
The salt-master continues to work.
Issuing some salt commands
Issuing a second sudo killall -SIGUSR2 salt-master
Terminating the foreground salt-master with Ctrl-C
There is no file at /tmp.
/var/log/salt/master does not mention yuppi

sudo salt --versions-report
                  Salt: 2015.5.3
                Python: 2.7.3 (default, Mar 13 2014, 11:03:55)
                Jinja2: 2.6
              M2Crypto: 0.21.1
        msgpack-python: 0.1.10
          msgpack-pure: Not Installed
              pycrypto: 2.6
               libnacl: Not Installed
                PyYAML: 3.10
                 ioflo: Not Installed
                 PyZMQ: 13.1.0
                  RAET: Not Installed
                   ZMQ: 3.2.3
                  Mako: 0.7.0
               Tornado: Not Installed
 Debian source package: 2015.5.3+ds-1~bpo70+2
@rallytime

This comment has been minimized.

Contributor

rallytime commented Sep 14, 2015

Thanks for the update @markuskramerIgitt. I'll remove the Fixed Pending Verification label.

@jacksontj or @cachedout: just an FYI ping on this one.

@markuskramerIgitt

This comment has been minimized.

Contributor

markuskramerIgitt commented Sep 16, 2015

Hello @rallytime, I would gladly verify the fix - I just don't find the yappi output and I don't know how to look for it.

Btw, I installed setproctitle on Debian with:

apt-get -V install python-pip
apt-get -V install python-dev
pip install setproctitle
service salt-master stop
service salt-master start

resulting in this htop output:

/sbin/init
├─ /usr/bin/python /usr/bin/salt-master ProcessManager
│  ├─ /usr/bin/python /usr/bin/salt-master ReqServer_ProcessManager
│  │  ├─ /usr/bin/python /usr/bin/salt-master MWorkerQueue
│  │  ├─ /usr/bin/python /usr/bin/salt-master MWorker
│  │  ├─ /usr/bin/python /usr/bin/salt-master MWorker
│  │  ├─ /usr/bin/python /usr/bin/salt-master MWorker
│  │  ├─ /usr/bin/python /usr/bin/salt-master MWorker
│  │  └─ /usr/bin/python /usr/bin/salt-master MWorker
│  ├─ /usr/bin/python /usr/bin/salt-master EventPublisher
│  ├─ /usr/bin/python /usr/bin/salt-master Publisher
│  └─ /usr/bin/python /usr/bin/salt-master Maintenance
@stale

This comment has been minimized.

stale bot commented Jan 3, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

@stale stale bot added the stale label Jan 3, 2018

@stale stale bot closed this Jan 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment