New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File descriptors from stdlib jars are leaked when ScriptingContainers are terminated #3928

Closed
cprice404 opened this Issue May 26, 2016 · 4 comments

Comments

Projects
None yet
3 participants
@cprice404
Contributor

cprice404 commented May 26, 2016

Environment

JRuby 9.1.1.0, run from a Java program via ScriptingContainer, with setCompileMode set to OFF.

Expected Behavior

ScriptingContainer instances should be able to be created and terminated without leaking memory or other resources, such as file descriptors.

Actual Behavior

When a ScriptingContainer is constructed and executes any Ruby code that results in certain jar-based libraries being loaded from jruby-stdlib, there are file descriptors opened pointing to the relevant library jars. When a ScriptingContainer is terminated, these descriptors (which seem to be referenced in a global cache) are not cleaned up.

The cache appears to use weak references, so presumably at some point the memory used by these objects could be reclaimed, but because there is also an open file descriptor associated with each cache entry, it is likely that a program that is recycling ScriptingContainer instances will exceed the open file descriptor limit and crash before there is enough memory pressure to cause these objects to be GC'd.

Here is a reproducer that illustrates the issue:

https://github.com/cprice404/jruby9k-benchmarks/tree/651c7ee063d0bf652293fe5c48b5ced3b80e6ade/jar-file-desc-leak

In the reproducer, we just do requires on a few libraries like openssl, psych, json/ext. Here is some lsof output that shows file descriptors for jars that definitely seem related to those libraries (bouncycastle, etc.):

java    16802 cprice  145r   REG                8,1   673715  1848479 /tmp/jruby-16802/jruby4231261797459246050bcpkix-jdk15on-1.54.jar
java    16802 cprice  146r   REG                8,1  3277268  1848480 /tmp/jruby-16802/jruby2526682390458341590bcprov-jdk15on-1.54.jar
java    16802 cprice  147r   REG                8,1  1029206  1848481 /tmp/jruby-16802/jruby8134376450898266715jopenssl.jar
java    16802 cprice  148r   REG                8,1    29184  1848482 /tmp/jruby-16802/jruby2150106223798361691parser.jar
java    16802 cprice  149r   REG                8,1    36598  1848483 /tmp/jruby-16802/jruby8382658838024689723generator.jar
java    16802 cprice  150r   REG                8,1   673715  1848486 /tmp/jruby-16802/jruby791041166383033485bcpkix-jdk15on-1.54.jar
java    16802 cprice  151r   REG                8,1  3277268  1848487 /tmp/jruby-16802/jruby8183181496442107777bcprov-jdk15on-1.54.jar
java    16802 cprice  152r   REG                8,1  1029206  1848488 /tmp/jruby-16802/jruby2262201270422384640jopenssl.jar
java    16802 cprice  153r   REG                8,1    29184  1848489 /tmp/jruby-16802/jruby7381052665382428377parser.jar
java    16802 cprice  154r   REG                8,1    36598  1848490 /tmp/jruby-16802/jruby8249386243245178423generator.jar
java    16802 cprice  155r   REG                8,1   673715  1848493 /tmp/jruby-16802/jruby2517646537378734847bcpkix-jdk15on-1.54.jar
java    16802 cprice  156r   REG                8,1  3277268  1848494 /tmp/jruby-16802/jruby6109779259907062980bcprov-jdk15on-1.54.jar
java    16802 cprice  157r   REG                8,1  1029206  1848495 /tmp/jruby-16802/jruby4734983808548108497jopenssl.jar
java    16802 cprice  158r   REG                8,1    29184  1848496 /tmp/jruby-16802/jruby1658078054671173220parser.jar
java    16802 cprice  159r   REG                8,1    36598  1848497 /tmp/jruby-16802/jruby3451778285607680674generator.jar
@camlow325

This comment has been minimized.

Show comment
Hide comment
@camlow325

camlow325 Jun 1, 2016

Contributor

We've experimented with a hack to workaround this problem for now. The hack involves calling .release() on each of the org.jruby.util.JarCache$JarIndex objects and .remove(jarIndexObj) on the JarCache.indexCache object prior to the ScriptingContainer being terminated. With this change in place, the output of the reproducer no longer includes any of the extra "/tmp/jruby-...jar" file descriptors and the file descriptor count holds constant across iterations in the reproducer.

Here's a commit which includes this hask:

camlow325/jruby9k-benchmarks@e1a05cc

This obviously isn't an ideal fix for a number of reasons, including the use of reflection to access various private / package-private classes and fields and lack of regard for thread-safety around the use of the jar cache. Also, I'm not sure if the JRubyClassLoader associated with a ScriptingContainer might hold references to jars which are shared with other containers - and therefore should not be closed when just one of the referencing containers is terminated. That doesn't appear to be the case for this simple example but that may be because the sample uses a LocalContextScope.SINGLETHREAD container and other LocalContextScope values might exhibit different behavior. Also, it seems that these jar files are extracted to a temp directory with a somewhat randomized name per ScriptingContainer - like /tmp/jruby-16802/jruby2150106223798361691parser.jar. After each jar file is closed, it might also be better to remove the jar file from disk as well, assuming nothing else could be using it.

We're hoping that there would be a more straightforward way that this issue could be addressed within the JRuby library itself.

Contributor

camlow325 commented Jun 1, 2016

We've experimented with a hack to workaround this problem for now. The hack involves calling .release() on each of the org.jruby.util.JarCache$JarIndex objects and .remove(jarIndexObj) on the JarCache.indexCache object prior to the ScriptingContainer being terminated. With this change in place, the output of the reproducer no longer includes any of the extra "/tmp/jruby-...jar" file descriptors and the file descriptor count holds constant across iterations in the reproducer.

Here's a commit which includes this hask:

camlow325/jruby9k-benchmarks@e1a05cc

This obviously isn't an ideal fix for a number of reasons, including the use of reflection to access various private / package-private classes and fields and lack of regard for thread-safety around the use of the jar cache. Also, I'm not sure if the JRubyClassLoader associated with a ScriptingContainer might hold references to jars which are shared with other containers - and therefore should not be closed when just one of the referencing containers is terminated. That doesn't appear to be the case for this simple example but that may be because the sample uses a LocalContextScope.SINGLETHREAD container and other LocalContextScope values might exhibit different behavior. Also, it seems that these jar files are extracted to a temp directory with a somewhat randomized name per ScriptingContainer - like /tmp/jruby-16802/jruby2150106223798361691parser.jar. After each jar file is closed, it might also be better to remove the jar file from disk as well, assuming nothing else could be using it.

We're hoping that there would be a more straightforward way that this issue could be addressed within the JRuby library itself.

@camlow325

This comment has been minimized.

Show comment
Hide comment
@camlow325

camlow325 Jul 8, 2016

Contributor

FWIW, I have noticed that when running some duration tests with my attached hack / workaround for this problem that we occasionally see error messages from Ruby code being run in a separate ScriptingContainer around the time that the jar files associated with the just terminated ScriptingContainer have been closed. The error messages have "zip file closed" in the description. I haven't spent further time instrumenting this to see if it is directly related to the nature of the change but it seems likely.

This may suggest that these jar file handles may be utilized by more than a single ScriptingContainer at a time - not sure. It's also possible, though, that the work done to access these static objects with no locking mechanism would make errors like these inevitable. Ultimately, it would seem that a better integrated method in JRuby proper would be able to handle the appropriate cleanup in a more robust way.

Contributor

camlow325 commented Jul 8, 2016

FWIW, I have noticed that when running some duration tests with my attached hack / workaround for this problem that we occasionally see error messages from Ruby code being run in a separate ScriptingContainer around the time that the jar files associated with the just terminated ScriptingContainer have been closed. The error messages have "zip file closed" in the description. I haven't spent further time instrumenting this to see if it is directly related to the nature of the change but it seems likely.

This may suggest that these jar file handles may be utilized by more than a single ScriptingContainer at a time - not sure. It's also possible, though, that the work done to access these static objects with no locking mechanism would make errors like these inevitable. Ultimately, it would seem that a better integrated method in JRuby proper would be able to handle the appropriate cleanup in a more robust way.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Sep 26, 2016

Member

Link with #4112.

Member

headius commented Sep 26, 2016

Link with #4112.

camlow325 added a commit to camlow325/jruby-utils that referenced this issue Apr 12, 2017

(SERVER-1630) Add logic to cleanup jar files on container termination
This commit adds some hacky logic to the InternalScriptingContainer
class to cleanup jar files and associated file descriptors when the
container is terminated.  This code is compensating for work that will
hopefully be done in JRuby itself at some point, as part of a resolution
for jruby/jruby#3928.

camlow325 added a commit to camlow325/jruby-utils that referenced this issue Apr 20, 2017

(SERVER-1630) Add logic to cleanup jar files on container termination
This commit adds some hacky logic to the InternalScriptingContainer
class to cleanup jar files and associated file descriptors when the
container is terminated.  This code is compensating for work that will
hopefully be done in JRuby itself at some point, as part of a resolution
for jruby/jruby#3928.

camlow325 added a commit to camlow325/jruby-utils that referenced this issue Apr 20, 2017

(SERVER-1630) Add logic to cleanup jar files on container termination
This commit adds some hacky logic to the InternalScriptingContainer
class to cleanup jar files and associated file descriptors when the
container is terminated.  This code is compensating for work that will
hopefully be done in JRuby itself at some point, as part of a resolution
for jruby/jruby#3928.

camlow325 added a commit to camlow325/jruby-utils that referenced this issue Apr 21, 2017

(SERVER-1630) Add logic to cleanup jar files on container termination
This commit adds some hacky logic to the InternalScriptingContainer
class to cleanup jar files and associated file descriptors when the
container is terminated.  This code is compensating for work that will
hopefully be done in JRuby itself at some point, as part of a resolution
for jruby/jruby#3928.

@headius headius added this to the JRuby 9.1.14.0 milestone Oct 31, 2017

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Oct 31, 2017

Member

Fixed by #4747.

Member

headius commented Oct 31, 2017

Fixed by #4747.

@headius headius closed this Oct 31, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment