Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChannelDescriptor is not removed from the map after ScriptContainer is terminated #4446

Closed
seanguo opened this issue Jan 19, 2017 · 16 comments
Closed
Labels
Milestone

Comments

@seanguo
Copy link

@seanguo seanguo commented Jan 19, 2017

Environment

JRuby 1.7.21
Linux
Using as embedded JSR223 engine

I found a similar issue #2333 but that is already fixed in 1.7.21. And the code path is not the same. The ChannelDescriptor is not removed in our case.

Sample Code Snippet

ScriptContext context = new SimpleScriptContext(); ScriptEngine engine = ScriptEngineManager.getEngineByName(); context.setAttribute("org.jruby.embed.termination", true, ScriptContext.ENGINE_SCOPE); engine.eval(scriptFile, context);
We have a pool of these engines and will try to reuse them for each script run.

Unexpected Behavior

After the execution I logged the ChannelDescriptor.filenoDescriptorMap, it still has the following values inside:
{16=org.jruby.util.io.ChannelDescriptor@7771425e, 15=org.jruby.util.io.ChannelDescriptor@34022cb2}

I added some stacktrace to track where these two ChannelDescriptor are added, got the following:
15 is added java.lang.Exception at org.jruby.util.io.ChannelDescriptor.registerDescriptor(ChannelDescriptor.java:865) at org.jruby.util.io.ChannelDescriptor.<init>(ChannelDescriptor.java:180) at org.jruby.util.io.ChannelDescriptor.<init>(ChannelDescriptor.java:197) at org.jruby.util.io.ChannelDescriptor.<init>(ChannelDescriptor.java:294) at org.jruby.RubyIO.<init>(RubyIO.java:138) at org.jruby.embed.jsr223.Utils.getRubyIO(Utils.java:179) at org.jruby.embed.jsr223.Utils.setWriter(Utils.java:150) at org.jruby.embed.jsr223.Utils.preEval(Utils.java:113) at org.jruby.embed.jsr223.JRubyEngine.eval(JRubyEngine.java:88) ... at java.lang.Thread.run(Thread.java:745) 16 is added java.lang.Exception at org.jruby.util.io.ChannelDescriptor.registerDescriptor(ChannelDescriptor.java:865) at org.jruby.util.io.ChannelDescriptor.<init>(ChannelDescriptor.java:180) at org.jruby.util.io.ChannelDescriptor.<init>(ChannelDescriptor.java:197) at org.jruby.util.io.ChannelDescriptor.<init>(ChannelDescriptor.java:294) at org.jruby.RubyIO.<init>(RubyIO.java:138) at org.jruby.embed.jsr223.Utils.getRubyIO(Utils.java:179) at org.jruby.embed.jsr223.Utils.setErrorWriter(Utils.java:171) at org.jruby.embed.jsr223.Utils.preEval(Utils.java:114) at org.jruby.embed.jsr223.JRubyEngine.eval(JRubyEngine.java:88)

Please suggest.
Thanks.

@seanguo seanguo changed the title ChannelDescriptor is not removed from the map after ScriptContainer is removed ChannelDescriptor is not removed from the map after ScriptContainer is terminated Jan 19, 2017
@headius
Copy link
Member

@headius headius commented Jan 24, 2017

Are you able to test this on a more recent JRuby 1.7? I don't know that it was fixed, but we do a lot of work on every point release.

On 9k this should be a non-issue since ChannelDescriptor and its registry are no longer used.

@headius headius added this to the JRuby 1.7.27 milestone Jan 24, 2017
@seanguo
Copy link
Author

@seanguo seanguo commented Feb 8, 2017

Thanks for your reply. I tried the latest which had the same behavior.

@headius
Copy link
Member

@headius headius commented Feb 9, 2017

@seanguo Ok, here's a couple suggestions.

  1. You can clear the descriptors yourself by accessing those collections from either Ruby or Java. That can be a temporary workaround for you. You may or may not want to try to close the descriptors as you remove them.
  2. You may be able to find why the descriptors are not getting removed by adding some logging or printStackTrace to the various paths that unregister them. I believe we unregister all of them on teardown, but teardown leaves a runtime in an unusable state.
  3. Try forcing GC and confirming that the descriptors remain.
  4. Try pulling a heap dump after you've confirmed the descriptors are leaking and inspect all references to ChannelDescriptor instances. This will show if we're keeping them alive somewhere.

If you can come up with a simple way for us to reproduce this, we'll be able to help much more easily.

@headius headius added the core label Feb 9, 2017
@seanguo
Copy link
Author

@seanguo seanguo commented Feb 10, 2017

@headius

  1. It's hard for me find the right time to clear that map since it's global and shared across all containers. We are running different container concurrently. I think we need to fix this in the JRuby container level.
  2. As you can see from my previous comment. I've added stacktrace at the registerDescriptor method and added map dump at the unregisterDescriptor. I confirmed that the ChannelDescriptor are still there after the evaluation even after full GC. Let me know if this is not the correct way to debug.

We found this issue by finding that one of our production environment is busy with full GC activity. And after a heap dump analysis we found the heap is full of the ChannelDescriptor in the filenoDescriptorMap.

I'll try to give you a sample code to reproduce later.

Thanks.

@seanguo
Copy link
Author

@seanguo seanguo commented Feb 10, 2017

You can use the following code to reproduce:

package com.test.testMaven;

import javax.script.ScriptContext;
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import javax.script.ScriptException;
import javax.script.SimpleScriptContext;

import org.jruby.util.io.ChannelDescriptor;

public class App {
  public static void main(String[] args) throws ScriptException, InterruptedException {
    ScriptEngineManager scriptEngineManager = new ScriptEngineManager();
    ScriptEngine engine = scriptEngineManager.getEngineByName("jruby");
    ScriptContext scriptContext = new SimpleScriptContext();
    scriptContext.setAttribute("org.jruby.embed.termination", true, ScriptContext.ENGINE_SCOPE);

    engine.eval("print 'test'", scriptContext);

    System.gc();
    Thread.sleep(5000);
    
    System.out.println(ChannelDescriptor.getFilenoDescriptorMapReadOnly());
  }
}

Thanks.

@headius
Copy link
Member

@headius headius commented Feb 13, 2017

I meant to add Thread.dumpStack or catch + printStackTrace to the paths that call unregisterDescriptor, to see if they are quietly failing to remove the descriptor.

In any case, your example looks simple enough. We will try to look into it. Any additional investigation you can do would be helpful, since we are trying to end-of-life JRuby 1.7 at this point.

@headius
Copy link
Member

@headius headius commented Feb 13, 2017

Ugh...I think I've found it.

It appears that engine.eval registers new ChannelDescriptors every time. I modified your example to call eval ten times, and there's ten times as many descriptors left in the map.

So this may be a simple fix. Either the scripting container needs to unregister those descriptors every time, or it needs to register them once when initialized. Thoughts @enebo?

@headius
Copy link
Member

@headius headius commented Feb 13, 2017

Here's a stack trace from one of the offending adds:

Registering descriptor 24
java.lang.Exception: Stack trace
	at java.lang.Thread.dumpStack(Thread.java:1333)
	at org.jruby.util.io.ChannelDescriptor.registerDescriptor(ChannelDescriptor.java:876)
	at org.jruby.util.io.ChannelDescriptor.<init>(ChannelDescriptor.java:182)
	at org.jruby.util.io.ChannelDescriptor.<init>(ChannelDescriptor.java:199)
	at org.jruby.util.io.ChannelDescriptor.<init>(ChannelDescriptor.java:296)
	at org.jruby.RubyIO.<init>(RubyIO.java:139)
	at org.jruby.embed.jsr223.Utils.getRubyIO(Utils.java:179)
	at org.jruby.embed.jsr223.Utils.setWriter(Utils.java:150)
	at org.jruby.embed.jsr223.Utils.preEval(Utils.java:113)
	at org.jruby.embed.jsr223.JRubyEngine.eval(JRubyEngine.java:88)
	at Blah.main(Blah.java:25)
@enebo
Copy link
Member

@enebo enebo commented Feb 13, 2017

@headius strikes me in this case we invisibly ask for an IO but then do not record it and then close/cleanup. It sticking around seems like it would only make sense if this io persisted between evals. Is there a scenario where we would expect this to escape and be a valid channeldescriptor?

@headius
Copy link
Member

@headius headius commented Feb 13, 2017

@enebo It looks like this may be a specific peculiarity of JSR-223. The code inside our implementation needs to grab readers/writers out of the passed in ScriptContext each time eval is called. In order for those to be accessible to Ruby, they get registered. And then they never get unregistered because eval isn't really meaningful as a lifecycle.

So like you say one possibility would be not registering them. I'm not sure we have a way to do that in the API but it could be added. All you'd lose is the ability to go from the (fake) fileno to the IO, as in IO.from_fd. That seems acceptable.

The other option is just unregistering them on the way out. It would add no more overhead and filenos could still be used to look up IO, but any code running asynchronously between calls might behave oddly. This also would not fix the issue that repeat calls with the SAME readers/writers would produce DIFFERENT filenos, which seems almost as bad as not registering filenos at all.

@headius
Copy link
Member

@headius headius commented Feb 13, 2017

Ok, as far as I can see, 9k does not have this issue. We do have a mapping of internal pseudo-filenos but at least for this case the number of registered descriptors seems to remain constant.

It's possible this is just digging out the "real" fileno, which would not get registered. This would be the case if the default ScriptingContext's streams are just stdio, which I believe they are. Given completely artificial streams, it may still leak.

@headius
Copy link
Member

@headius headius commented Feb 13, 2017

It looks like unregistering these may be a bit problematic. We also need to take a long hard look at the lifecycle for JSR-223.

The additional problem I ran into is that while it's setting up those streams on a per-eval basis, it also sets global variables like $stdout. Obviously this isn't really threadsafe, since multiple concurrent evals might want different streams. We need a better mapping from the context streams to the IO we use internally, and ideally we would keep them the same across calls.

But the problem with unregistering is that we'd need to maintain a mapping of context+streams to the IO objects we want to unregister. And that means more state in the containers and potential issues if two eval calls want to use the same streams. The second thread might unregister them while the first is still using them.

I think for now the best option would be to modify the code in the JSR-223 code to not register fileno for incoming streams, since they're not really "real" streams anyway and there's all sorts of problems trying to tidy them up.

@headius
Copy link
Member

@headius headius commented Feb 13, 2017

I just pushed #4491 that avoids registering these two fileno for each eval. Actually, it just immediately unregisters them, since we do not have a way to initialize a RubyIO without registering its fileno. This seems like the least invasive fix.

@seanguo Can you test out the headius:fix-fileno-leak branch with your app and see how it looks?

@seanguo
Copy link
Author

@seanguo seanguo commented Feb 14, 2017

@headius Thank you for the quick fix. I've tested your branch build in the test app and it works. We will do load testing to further validation. By the way do we have a estimated date to release the 1.7.27 which includes the fix?
Thanks.

@headius
Copy link
Member

@headius headius commented Feb 14, 2017

At the moment, no. We had hoped for 1.7.26 to be the last release in the 1.7 line, but we have not addressed all migration issues nor released the "1.7 killer" 9.2 yet. I'll confer with @enebo about this today.

@headius
Copy link
Member

@headius headius commented Mar 15, 2017

We are looking at doing a 1.7.27 within the next week. I'll talk to @enebo about this fix.

headius added a commit that referenced this issue Apr 19, 2017
Immediately unregister out and err for JSR223 evals. Fixes #4446.
@headius headius closed this Apr 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.