nsqd: high CPU when idle w/ lots (thousands) of topics/consumers #236

Merged
merged 1 commit into from Sep 3, 2013

Projects

None yet

4 participants

@chetan
chetan commented Jul 29, 2013

Hi,

I ran a simple test to create thousands of topics with a single consumer each which led to nsqd bombing with the following error:

runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0x7fff8cd2f212

and then a stack trace for, I guess, every running goroutine.

goroutine 1 [chan receive]:
main.main()
    /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/nsqgopath.TQxJw0U2/src/github.com/bitly/nsq/nsqd/main.go:145 +0xde6

goroutine 2 [syscall]:

goroutine 4 [syscall]:
os/signal.loop()
    /usr/local/Cellar/go/1.1.1/src/pkg/os/signal/signal_unix.go:21 +0x1c
created by os/signal.init·1
    /usr/local/Cellar/go/1.1.1/src/pkg/os/signal/signal_unix.go:27 +0x2f

goroutine 6 [chan receive]:
main.func·009()
    /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/nsqgopath.TQxJw0U2/src/github.com/bitly/nsq/nsqd/main.go:102 +0x39
created by main.main
    /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/nsqgopath.TQxJw0U2/src/github.com/bitly/nsq/nsqd/main.go:104 +0x697

goroutine 7 [select]:
main.(*NSQd).idPump(0xc20013a000)
    /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/nsqgopath.TQxJw0U2/src/github.com/bitly/nsq/nsqd/nsqd.go:353 +0x2bd
main.func·010()
    /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/nsqgopath.TQxJw0U2/src/github.com/bitly/nsq/nsqd/nsqd.go:92 +0x27
github.com/bitly/nsq/util.func·001()
    /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/nsqgopath.TQxJw0U2/src/github.com/bitly/nsq/util/wait_group_wrapper.go:14 +0x2c
created by github.com/bitly/nsq/util.(*WaitGroupWrapper).Wrap
    /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/nsqgopath.TQxJw0U2/src/github.com/bitly/nsq/util/wait_group_wrapper.go:16 +0xa6

[ ... etc ... ]

Here's the ruby program that triggered it:

require 'nsq'
reader = NSQ::Reader.new(:nsqd_tcp_addresses => '127.0.0.1:4150')
5000.times do |i|
  reader.subscribe("foo-#{i)}", "bar") do
  end
end
reader.run

This may not be a valid use case for nsq, but perhaps some throttling of topics may be necessary.

@michaelhood
Contributor

pthread_create failed: Resource temporarily unavailable

that failure is environment specific.. you've exhausted RLIMIT_NPROC (or equivalent) so pthread_create is returning EAGAIN.

it looks like you're on OS X, so check out man 2 setrlimit and ulimit -u.

@mreiferson
Member

managing limits on OSX is a complete shit show so I did some testing in a linux VM and was able to scale up to 4k connections each w/ a topic/channel at ~860mb resident and 40% CPU

so I think @michaelhood is right on the money, the crash you're seeing is just limits in OSX...

@chetan
chetan commented Jul 30, 2013

Ok, that makes sense. Maybe some warnings could be printed if ulimit -n/u is too low? I believe nginx does this (among others).

@mreiferson
Member

I did some profiling of nsqd while running with that many topics/channels/clients. It seems the bulk of the CPU time is spent in the runtime managing the heap of timers that are used in various places.

There is probably room for improvement with these code paths.

@chetan I'm not terribly keen on writing cross platform resource limit checks, but if you want to take a pass I will promise to review it :)

I'm going to close this specific issue since theres nothing to actually "fix". I will open up follow up issues when I better understand where nsqd is specifically spending its time at this scale.

@mreiferson mreiferson closed this Jul 30, 2013
@mreiferson
Member

screen shot 2013-07-30 at 9 42 47 am

@chetan
chetan commented Jul 30, 2013

Fair enough. A documentation note should suffice, perhaps mentioning that the normal use case is to have < 100 topics/channels.

@mreiferson
Member

Yea, that's a good suggestion. That wasn't the intention, though, so it looks like we have some work to do to be able to support those specific types of use cases.

@mreiferson
Member

actually, gonna re-open (updated the title)

@mreiferson mreiferson reopened this Jul 30, 2013
@mreiferson
Member

it's possible that http://golang.org/pkg/sync/#Cond might be useful in re-architecting the proliferation of timers

@mreiferson
Member

Please see the book I wrote in the commit message for the attached commit.

cc @jehiah, @michaelhood, and @jayridge would love to hear your thoughts...

@jehiah jehiah and 1 other commented on an outdated diff Sep 3, 2013
nsqd/client_v2.go
@@ -281,7 +281,7 @@ func (c *ClientV2) SetOutputBufferTimeout(desiredTimeout int) error {
timeout = -1
case desiredTimeout == 0:
// do nothing (use default)
- case desiredTimeout >= 5 &&
+ case desiredTimeout >= 250 &&
@jehiah
jehiah Sep 3, 2013 Member

I don't think we need to change this here. It should still be possible for a client to request a 5ms buffer timeout.

This change here limits the min to 250ms. We only want to change the default from 5ms to 250ms.

(good commit comments btw)

@mreiferson
mreiferson Sep 3, 2013 Member

Yea, I was thinking about that. I'm not convinced that a 5ms lower bound makes sense... to your point, maybe it shouldn't be 250ms either though.

There are three states:

  1. max_in_flight = 1 behaves essentially unbuffered regardless of volume.
  2. max_in_flight > 1 and low-volume topic. If you want to prioritize delivery latency, the appropriate configuration is to disable output buffering (we should document this).
  3. max_in_flight > 1 and higher-volume topic. You can either prefer throughput (at the potential cost of a maximum delivery latency that can be set as low as this value) or disable output buffering to prioritize delivery latency.

I guess my argument is that you can't really have everything (which is what I feel like this was trying to do before). I like how it's still possible to prioritize latency with this value lower bounded at 250ms, it just means consciously choosing that tradeoff.

Ultimately, it's not the end of the world to allow it to be configured that low, it just needs to come with some disclaimers. But is there a better lower bound?

@jehiah
jehiah Sep 3, 2013 Member

You have a fair point about arbitrarily picking a lower bound. We should allow >=1ms and just document the tradeoff between max latency and overhead. That should probably go in the docs about IDENTIFY options http://bitly.github.io/nsq/clients/tcp_protocol_spec.html but we should also have a paragraph in http://bitly.github.io/nsq/clients/building_client_libraries.html about the actual IDENTIFY options a client should support.

@mreiferson mreiferson nsqd: increase default output buffer timeout:
This change significantly impacts CPU usage on an `nsqd`
with many clients by greatly reducing the amount of work
the Go runtime has to do to maintain its internal heap
of timers.

Previously, the default was `5ms` which meant that 200 times
per second a timer event would fire (and consequently sift
the heap).

If you multiply that by a large # of clients, say 500, you
can quickly see why this would be significant.

    200 x 500 = 100,000 heap sifts per second

Obviously, it's not just the *number* of timers but their
*resolution* that's important.

The other interesting point is that despite the `V2` protocol
`messagePump` *ignoring* the timer when it looped after a previous
flush, the Go runtime still needs to maintain the timer and attempt
to write to the timer's output channel... costing precious cycles
and context switches.

By increasing the default resolution of the timer,
we significantly reduce the work the Go runtime needs to do.

In testing this reduced CPU usage on an *idle* nsqd from 33% to 3%
with 250 clients.

What does this impact?

The client output buffer timeout actually does two things...

 1. It determines the amount of time a client will accumulate
    data in its output buffer before flushing and writing to
    the socket.

 2. It bounds the maximum amount of time a client with
    `max_in_flight > 1` will wait for additional messages
    before flushing.

The second case is the interesting one.  To elaborate, the
pathological case is that for a client subscribed to a
*low-volume* topic, with a `max_in_flight > 1`, messages
can be delivered with up to `output_buffer_timeout` latency.

The `max_in_flight > 1` is important, because `messagePump`
automatically flushes when a client is in a state where it
is not ready to receive messages (i.e. `RDY = 0`).

This means that for clients that absolutely *need* to have
`max_in_flight > 1` on *low-volume* topics *and* prioritize
latency, they should disable output buffering, which we already
support via `IDENTIFY`.
043b79a
@mreiferson
Member

amended with updated lower bound, documented in nsqio/nsq@c0cac78

@jehiah jehiah merged commit 4c9328d into nsqio:master Sep 3, 2013

1 check passed

default The Travis CI build passed
Details
@mreiferson mreiferson referenced this pull request Feb 10, 2014
Closed

topic capacity #307

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment