Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Circular exception handling can cause infinite loop since 9.3.4.0 #7267

Closed
rsim opened this issue Jul 7, 2022 · 9 comments
Closed

Circular exception handling can cause infinite loop since 9.3.4.0 #7267

rsim opened this issue Jul 7, 2022 · 9 comments
Milestone

Comments

@rsim
Copy link

rsim commented Jul 7, 2022

Environment Information

  • jruby 9.3.6.0 (2.6.8) 2022-06-27 7a2cbcd376 OpenJDK 64-Bit Server VM 11.0.12+7 on 11.0.12+7 +jit [x86_64-linux]
  • Linux production-app1 4.4.0-210-generic #242-Ubuntu SMP Fri Apr 16 09:57:56 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Expected Behavior

As a result of bd2595c which solved #7035 the checkCircularCause method was introduced. It's purpose is to detect circular loops of exception causes.

Actual Behavior

After upgrading our Rails app from JRuby 9.3.3.0 to 9.3.6.0 we discovered an infinite loop in the Java Finalizer thread. In each thread dump we see that it is stuck in the checkCircularCause method:

"Finalizer" - Thread t@3
   java.lang.Thread.State: RUNNABLE
        at org.jruby.RubyException.checkCircularCause(RubyException.java:399)
        at org.jruby.RubyException.setCause(RubyException.java:370)
        at org.jruby.Ruby.newRaiseException(Ruby.java:4299)
        at org.jruby.Ruby.newArgumentError(Ruby.java:3631)
        at org.jruby.Ruby.newArgumentError(Ruby.java:3626)
        at org.jruby.runtime.Arity.checkArity(Arity.java:167)
        at org.jruby.runtime.Arity.checkArity(Arity.java:161)
        at org.jruby.ext.ffi.jffi.NativeInvoker.call(NativeInvoker.java:70)
        at org.jruby.ext.ffi.jffi.DefaultMethod.call(DefaultMethod.java:106)
        at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:222)
        at org.jruby.RubyMethod.call(RubyMethod.java:119)
        at org.jruby.RubyMethod$INVOKER$i$call.call(RubyMethod$INVOKER$i$call.gen)
        at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodZeroOrOneOrNBlock.call(JavaMethod.java:354)
        at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:173)
        at home.rails.eazybi.production.shared.bundle.jruby.$2_dot_6_dot_0.gems.ffi_minus_1_dot_15_dot_5_minus_java.lib.ffi.autopointer.invokeOther1:call(/home/rails/eazybi/production/shared/bundle/jruby/2.6.0/gems/ffi-1.15.5-java/lib/ffi/autopointer.rb:175)
        at home.rails.eazybi.production.shared.bundle.jruby.$2_dot_6_dot_0.gems.ffi_minus_1_dot_15_dot_5_minus_java.lib.ffi.autopointer.RUBY$method$release$0(/home/rails/eazybi/production/shared/bundle/jruby/2.6.0/gems/ffi-1.15.5-java/lib/ffi/autopointer.rb:175)
        at java.lang.invoke.LambdaForm$DMH/0x0000000801af7840.invokeStatic(LambdaForm$DMH)
        at java.lang.invoke.LambdaForm$MH/0x00000008006f2040.invokeExact_MT(LambdaForm$MH)
        at org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:165)
        at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:185)
        at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:218)
        at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:173)
        at home.rails.eazybi.production.shared.bundle.jruby.$2_dot_6_dot_0.gems.ffi_minus_1_dot_15_dot_5_minus_java.lib.ffi.autopointer.invokeOther3:release(/home/rails/eazybi/production/shared/bundle/jruby/2.6.0/gems/ffi-1.15.5-java/lib/ffi/autopointer.rb:150)
        at home.rails.eazybi.production.shared.bundle.jruby.$2_dot_6_dot_0.gems.ffi_minus_1_dot_15_dot_5_minus_java.lib.ffi.autopointer.RUBY$method$call$0(/home/rails/eazybi/production/shared/bundle/jruby/2.6.0/gems/ffi-1.15.5-java/lib/ffi/autopointer.rb:150)
        at java.lang.invoke.LambdaForm$DMH/0x0000000801af7440.invokeStatic(LambdaForm$DMH)
        at java.lang.invoke.LambdaForm$MH/0x00000008006f2040.invokeExact_MT(LambdaForm$MH)
        at org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:139)
        at org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:162)
        at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:185)
        at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:218)
        at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:173)
        at org.jruby.RubyBasicObject$Finalizer.callFinalizer(RubyBasicObject.java:1971)
        at org.jruby.RubyBasicObject$Finalizer.finalize(RubyBasicObject.java:1960)
        at java.lang.System$2.invokeFinalize(System.java:2125)
        at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:87)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:171)

It seems that it is caused by an invalid FFI::AutoPointer definition in the appsignal gem (I reported the issue to them appsignal/appsignal-ruby#854) which causes org.jruby.runtime.Arity.checkArity to fail and which raises an ArgumentError which then causes an infinite loop in checkCircularCause.

I was not able to reproduce this infinite loop with a simple script. But as I was looking at the source of this method

private void checkCircularCause(IRubyObject cause) {
    IRubyObject currentCause = cause;
    while (currentCause instanceof RubyException) {
        if (currentCause == this) {
            RaiseException runtimeError = getRuntime().newRuntimeError("circular causes");
            runtimeError.getException().setCause(cause);
            throw runtimeError;
        }
        currentCause = ((RubyException) currentCause).cause;
    }
}

I suspect that in this case there is a circular loop of causes but which does not include the current exception (this). I think that it would be better to add prevention of potential infinite loop by

  • adding a loop counter and exiting the loop if the counter is too large
  • or using a Set where all causes are added and raising an exception if any duplicate cause is found in this loop.
@headius
Copy link
Member

headius commented Aug 1, 2022

I think the latter would be the best fix, perhaps using existing logic for recursion detection. I will look into this.

@headius
Copy link
Member

headius commented Aug 1, 2022

I have a patch to use a Set, but no reproduction. Can you provide a reproduction or test this patch out?

@headius
Copy link
Member

headius commented Aug 1, 2022

diff --git a/core/src/main/java/org/jruby/RubyException.java b/core/src/main/java/org/jruby/RubyException.java
index 0c48097acd..8148adeff2 100644
--- a/core/src/main/java/org/jruby/RubyException.java
+++ b/core/src/main/java/org/jruby/RubyException.java
@@ -55,7 +55,9 @@ import org.jruby.runtime.marshal.UnmarshalStream;
 
 import java.io.IOException;
 import java.io.PrintStream;
+import java.util.HashSet;
 import java.util.List;
+import java.util.Set;
 
 import static org.jruby.runtime.Visibility.PRIVATE;
 import static org.jruby.util.RubyStringBuilder.str;
@@ -390,12 +392,15 @@ public class RubyException extends RubyObject {
 
     private void checkCircularCause(IRubyObject cause) {
         IRubyObject currentCause = cause;
+        Set causeSet = new HashSet();
+        causeSet.add(this);
         while (currentCause instanceof RubyException) {
-            if (currentCause == this) {
+            if (causeSet.contains(currentCause)) {
                 RaiseException runtimeError = getRuntime().newRuntimeError("circular causes");
                 runtimeError.getException().setCause(cause);
                 throw runtimeError;
             }
+            causeSet.add(currentCause);
             currentCause = ((RubyException) currentCause).cause;
         }
     }

@headius headius added this to the JRuby 9.3.7.0 milestone Aug 1, 2022
@rsim
Copy link
Author

rsim commented Aug 2, 2022

@headius I tried to reproduce these circular cause references but couldn't.

But then I started to investigate the original stack trace and noticed that checkCircularCause appeared only in the Finalizer thread stack trace. And then I started to think that probably there is not an endless loop, but just the stack of linked causes gets deeper all the time.

Here is a test script for this hypothesis:

obj = nil
10.times do |i|
  obj = Object.new
  ObjectSpace.define_finalizer(obj, Proc.new do
    STDOUT.puts ">>> finalizing #{i}"
    e = $!
    while e
      STDOUT.puts e.inspect
      e = e.cause
    end
    STDOUT.puts "<<< finalized #{i}"
    raise "error #{i}"
  end)
  obj = nil
  sleep 0.1
  java.lang.System.gc
  sleep 0.1
end

which produced the output

>>> finalizing 0
<<< finalized 0
>>> finalizing 1
#<RuntimeError: error 0>
<<< finalized 1
>>> finalizing 2
#<RuntimeError: error 1>
#<RuntimeError: error 0>
<<< finalized 2
...
>>> finalizing 9
#<RuntimeError: error 8>
#<RuntimeError: error 7>
#<RuntimeError: error 6>
#<RuntimeError: error 5>
#<RuntimeError: error 4>
#<RuntimeError: error 3>
#<RuntimeError: error 2>
#<RuntimeError: error 1>
#<RuntimeError: error 0>
<<< finalized 9

As you see, if the object finalizer raises an error then the next finalizer in the same Finalizer thread inherits the current error $! (RubyThread.errorInfo) in this thread and sets it as a cause for the new error.

So if there are badly defined object finalizers (as was the case in appsignal/appsignal-ruby#854 which is now fixed) and if they are called a lot of times, then the Finalizer thread will spend more and more time in the checkCircularCause method and after some while it will use 100% of one CPU core.

It would be better to clear RubyThread.errorInfo if the object finalizer raises an exception to prevent leaking of it to the next object finalizer.

I tested this on MRI 2.6 and 3.1 and checked that $! from the previous finalizer is not exposed to the next finalizer.

As it seems that there was no infinite loop in checkCircularCause then probably this change to use Set is not needed. But if it is not adding significant overhead, then probably it could be still used to prevent an infinite loop if RubyThread.errorInfo is modified in some other unpredicted way.

@headius
Copy link
Member

headius commented Aug 4, 2022

Great discovery about the finalizer! It makes sense that the finalizer thread would keep accumulating causes, since nothing occurs to clear them. I will investigate CRuby a bit to see how they handle exceptions raised from a finalizer; I suspect they avoid setting $! at all, or else clear it on the way out of the finalizer call.

@headius
Copy link
Member

headius commented Aug 4, 2022

Yup, good sleuthing... CRuby captures $! before invoking the finalizer and restores it afterwards.

CRuby's finalizers have traditionally run on the same thread the GC runs on, which is often the thread that happens to be running at the time (so they want to restore any in-flight error before returning control to user code. With JDK's separate finalizer, it may make more sense for us to just clear it after each invocation, since capturing and restoring it would have the same effect.

@headius
Copy link
Member

headius commented Aug 4, 2022

headius added a commit to headius/jruby that referenced this issue Aug 4, 2022
In jruby#7267 we had a report of endless exception cause processing
that turned out to be triggered by a bad finalizer (that allowed
an exception to bubble out) stacking up causes from previous calls
of that finalizer.

The fix here mimics what CRuby does: where they reset the errinfo
to what it was prior to the finalizer running (because CRuby's GC
often/usually runs on the current user thread), we simply clear it
after each finalizer has run (because the JDK runs finalizers on
a separate thread, as will our future non-JVM-finalizer version of
this logic).

No spec is provided yet due to the difficulty of testing
GC-triggered events across VMs. See ruby/spec#935
for more details.

Fixes jruby#7267
headius added a commit to headius/jruby that referenced this issue Aug 9, 2022
In jruby#7267 we had a report of endless exception cause processing
that turned out to be triggered by a bad finalizer (that allowed
an exception to bubble out) stacking up causes from previous calls
of that finalizer.

The fix here mimics what CRuby does: where they reset the errinfo
to what it was prior to the finalizer running (because CRuby's GC
often/usually runs on the current user thread), we simply clear it
after each finalizer has run (because the JDK runs finalizers on
a separate thread, as will our future non-JVM-finalizer version of
this logic).

No spec is provided yet due to the difficulty of testing
GC-triggered events across VMs. See ruby/spec#935
for more details.

Fixes jruby#7267
@headius headius closed this as completed Aug 9, 2022
@evaniainbrooks
Copy link
Contributor

Related to #7316 ?
I took a crack at it here #7321

Relevant CRuby code: ruby/ruby@1849288

@headius
Copy link
Member

headius commented Aug 22, 2022

@evaniainbrooks I don't think this is related to #7316, at least not directly, but I will look at your proposed fix for that today!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants