Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
RFC: Add additional ZMQ tuning parameters necessary for 1k+ minions per master [WIP] #27606
changed the title from
RFC: Add additional ZMQ tuning parameters necessary for 1,000+ minions per server
RFC: Add additional ZMQ tuning parameters necessary for 1k+ minions per master
Oct 1, 2015
referenced this pull request
Oct 1, 2015
These specific tuning parameters were used with the following configuration:
It seems that the most significant thing for servicing large numbers of minions is CPU cores and GHz. If the ZMQ queues get backed-up into memory too far then the machine will get swamped, fall behind and the OOM killer will start wacking processes. Additionally this can all be compromised by a slow job cache.
The vast majority of cases end up being that REQ socket (at least in my experience). When you do a publish-- that initially goes through the REQ socket (locally). In addition, if a lot of minions return data at close to the same time-- that all goes through the REQ socket.
What might be a bit confusing-- is I don't mean the zmq.REQ socket on the minion-- rather I mean the router/dealer/req sockets on the master.
added a commit
this pull request
Oct 16, 2015
Oct 16, 2015
2 of 5 checks passed
20:47:15 Process EventPublisher-18: 20:47:15 Traceback (most recent call last): 20:47:15 File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap 20:47:15 self.run() 20:47:15 File "/testing/salt/utils/event.py", line 882, in run 20:47:15 self.epub_sock.setsockopt(zmq.SNDHWM, self.opts.get('event_publisher_pub_hwm')) 20:47:15 File "zmq/backend/cython/socket.pyx", line 390, in zmq.backend.cython.socket.Socket.set (zmq/backend/cython/socket.c:4138) 20:47:15 TypeError: expected int, got: None
@jtand, your stack trace is due to some objects incorrectly passing opts. I thought all of those locations had been fixed. I'll look into it.
What has happened in the past is that default option values were put into config.py as well as the location where the option was used because opts wasn't correctly passed in to object initialization. The object initialization should be fixed rather than scattering default values into multiple files.
I already fixed this issue earlier today.
On Tue, Oct 20, 2015 at 4:02 PM, plastikos email@example.com wrote: