-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChannelDescriptor leak after close of ScriptingContainer #2333
Comments
This was code I added years ago, and it continues to bite me in the ass. Statics are bad. There is a workaround in environments where you control the whole classloader into which JRuby was loaded: clear the list yourself using reflection to access the list in ChannelDescriptor. I know that's not ideal, but a few other folks doing heavy embedding have done this. The fix is to make this list per-runtime, as in JRuby 9k (which has largely reimplemented all of IO). I would have attempted this a few months ago...but I hoped we'd have 9k done sooner :-( I'll see what I can do. |
Thanks for the quick response, @headius. Yeah, I have confirmed that if we get the private Unfortunately, I'm not sure if that will be a workable solution for Puppet Server. A Puppet Server may have many It's good to know that JRuby 9k will have a good solution for this. We're definitely excited about JRuby 9k. Thanks for anything you can think of to address this in the 1.7.X timeframe! |
Generally the fileno map is only used for IO.from_fd/new forms that take a fileno and expect an IO back. If you are able, it would be worth testing the clearing logic to see if it actually breaks anything. I'm working on a regression right now but will put this issue next on my list. |
I have a fix but I request review from relevant folks (@enebo, mostly). diff --git a/core/src/main/java/org/jruby/Ruby.java b/core/src/main/java/org/jruby/Ruby.java
index e3ca02d..3d7b5d3 100644
--- a/core/src/main/java/org/jruby/Ruby.java
+++ b/core/src/main/java/org/jruby/Ruby.java
@@ -3223,6 +3223,11 @@ public final class Ruby {
getJRubyClassLoader().tearDown(isDebug());
}
+ // Unregister stdio ChannelDescriptors so they don't leak
+ ChannelDescriptor.unregisterDescriptor(getFilenoExtMap(0));
+ ChannelDescriptor.unregisterDescriptor(getFilenoExtMap(1));
+ ChannelDescriptor.unregisterDescriptor(getFilenoExtMap(2));
+
if (config.isProfilingEntireRun()) {
// not using logging because it's formatted
ProfileCollection profileCollection = threadService.getMainThread().getContext().getProfileCollection();
diff --git a/core/src/main/java/org/jruby/util/io/ChannelDescriptor.java b/core/src/main/java/org/jruby/util/io/ChannelDescriptor.java
index 201665e..9b53220 100644
--- a/core/src/main/java/org/jruby/util/io/ChannelDescriptor.java
+++ b/core/src/main/java/org/jruby/util/io/ChannelDescriptor.java
@@ -863,7 +863,7 @@ public class ChannelDescriptor {
filenoDescriptorMap.put(descriptor.getFileno(), descriptor);
}
- private static void unregisterDescriptor(int aFileno) {
+ public static void unregisterDescriptor(int aFileno) {
filenoDescriptorMap.remove(aFileno);
}
This makes my trivial test case and your repository behave as expected:
Concerns I have:
|
This is the commit where I fixed handling teardown in SINGLETON containers: ef58d57 |
I think that's a reasonable tradeoff. From our perspective at least, this shouldn't pose a problem since we do not attempt to use a
That sounds reasonable and is consistent with what we've seen as well. |
Talked with @headius about this on channel. This fix also needs a second fix @headius applied to master involving terminate with SINGLETON containers. Since we want to release in the next day or so we wil defer this for 1.7.19 and you will get a patch/branch with this fix on it now to give us a gold stamp of approval. |
I have pushed fix + test to test-fix-2333 branch. As @enebo mentioned, it's too late/risky to put in 1.7.18, but it's probably best that you verify it before we put it into a release. |
@headius I tested with your commit locally merged into a clone of JRuby 1.7.17. Change looks good - leak is gone. |
Merged to 1.7. Thanks for confirming! |
In doing some memory leak analysis, I found that some ChannelDescriptors allocated for a ScriptingContainer instance are not cleaned up when
terminate
is called. The following snippet of code was sufficient for me to reproduce the problem:ChannelDescriptor has a static field called filenoDescriptorMap which is used to hold onto the various ChannelDescriptors that have been registered. The map has three extra descriptors registered in it after each iteration of a loop containing the lines of code above. I found that the descriptors were allocated from RubyGlobal.createGlobals (https://github.com/jruby/jruby/blob/1.7.17/core/src/main/java/org/jruby/RubyGlobal.java#L211-L213):
As a hack, I had tried adding the following three lines to the tearDown method in org.jruby.Ruby:
With these changes in place, the ChannelDescriptor objects no longer were leaked, but the underlying STDIN, STDOUT, and STDERR streams were all closed. This doesn't, therefore, seem to be a workable solution to the problem. Seems like there would need to be a better way exposed to unregister those descriptors from the map without having the underlying streams be closed.
I have the complete code for this reproduction case at https://github.com/camlow325/ScriptingContainer_LeakRepro. I've been able to reproduce the issue with JRuby 1.7.15 and 1.7.17.
For reference, the Puppet Server project from Puppet Labs creates SINGLETHREAD ScriptingContainers. See https://github.com/puppetlabs/puppet-server/blob/master/src/clj/puppetlabs/services/jruby/jruby_puppet_core.clj#L148. The Puppet Server project recently added a feature which entails terminating and starting replacement ScriptingContainers periodically while the service is still running - to allow users to periodically free stale Ruby state for those containers based on changes to underlying Ruby code deployed on the server. Doing memory leak analysis around that feature allowed us to find this issue.
The text was updated successfully, but these errors were encountered: