Leak in select for a UNIXSocket (at least) #3588

headius · 2016-01-08T23:39:22Z

The following code leaks like a sieve...up to 1.5GB of total memory in under a minute for me:

s = UNIXServer.new("foofoo")
Thread.new {
  loop {
    c1 = s.accept
    IO.select([c1])
    c1.read(5)
    c1.write("bubye")
    c1.close
  }
}

loop {
  c = UNIXSocket.new("foofoo")
  c.write("hello")
  IO.select([c])
  c.read(5)
  c.close
  print "."
}

It did not leak without the IO.select calls, so I suspect that something in jnr-enxio's implementation of selection there's native memory leaking.

headius · 2016-01-08T23:41:39Z

This does not appear to affect 1.7, so it may actually be something we're doing wrong in JRuby.

headius · 2016-01-20T08:17:16Z

This appears to be in jnr-enxio, as it affects selects against stdio too. Note I have only tested on OS X, which uses kqueue; Linux uses epoll and has a separate backend in jnr-enxio.

headius · 2016-01-20T08:39:31Z

Ok, there's a few problems here, but I'm not sure which is the trigger.

ENXIO selectors are not pooled, mostly because we can't mix native selectors and JDK selectors when selecting for multiple heterogeneous channels. As a result, SelectorExector has complicated logic to open new, attach via pipe, and eventually shut down ENXIO selectors on the way into and out of IO.select.
The selector itself (KQueueSelector) allocates buffers for the kqueue call, but does not directly deallocate those same buffers. So at the very least it is relying on finalization to clear away dead memory pointers, which is unreliable. At worst, it is walking away from allocated memory.

The first point could be remedied by improving SelectorExecutor to just use the ENXIO selector when all the channels are ENXIO channels, rather than always using this complicated pipe-based connection between ENXIO and a native selector. In that case we could also pool the ENXIO selector. It may be possible to pool the ENXIO selector even with the pipe connection, but we'd still need to clean up that pipe.

The second point I believe I can fix by making ENXIO actively clean up those pointers when you close the selector. This seems to be the lowest impact right now, so I'll look into that direction.

headius · 2016-01-20T09:18:32Z

Huzzah! I found a simple fix.

The way we're using SelectorPool is wrong. When requesting a Selector from SelectorPool.get, it assumes you're going to be pooling that selector with a later call to SelectorPool.put. Basically, it takes ownership of the selector's lifecycle. And to do that, it keeps a reference to the selector as an "open" or checked-out selector that will need to be cleaned up at some point.

Unfortunately in the ENXIO case, we only use SelectorPool.get as a shortcut for channel.provider().openSelector(), and never put it back. This means that SelectorPool holds a hard reference to the selector in its openSelectors list, and as a result, the selector never finalizes. Because it never finalizes, it never deallocates the memory associated with the aforementioned buffers.

Modifying our SelectorExecutor logic to directly open ENXIO selectors rather than going through SelectorPool appears to fix the issue. On a fast select loop, we're still relying on finalization to clean up buffers, but when GC/finalization eventually runs we do go back down to a base memory size. A simple select(timeout = 0.001) loop that previously grew without bounds now hovers between 190 and 300MB, depending on the timing of GC.

Fix coming. We still need to rework SelectExecutor (again) and accommodate the simple homogeneous cases, so we can pool ENXIO selectors. We use ENXIO a lot more in 9k.

headius · 2016-01-20T09:22:36Z

Note that 1.7 is not affected because its IO.select implementation does not attempt to pool any selectors.

headius modified the milestones: JRuby 1.7.24, JRuby 9.0.5.0 Jan 8, 2016

headius added core ffi JRuby 1.7.x JRuby 9000 labels Jan 8, 2016

headius removed the JRuby 1.7.x label Jan 8, 2016

headius closed this as completed in a9a1184 Jan 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leak in select for a UNIXSocket (at least) #3588

Leak in select for a UNIXSocket (at least) #3588

headius commented Jan 8, 2016

headius commented Jan 8, 2016

headius commented Jan 20, 2016

headius commented Jan 20, 2016

headius commented Jan 20, 2016

headius commented Jan 20, 2016

Leak in select for a UNIXSocket (at least) #3588

Leak in select for a UNIXSocket (at least) #3588

Comments

headius commented Jan 8, 2016

headius commented Jan 8, 2016

headius commented Jan 20, 2016

headius commented Jan 20, 2016

headius commented Jan 20, 2016

headius commented Jan 20, 2016