New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"org.jruby.runtime.scopes.DynamicScope1 only supports scopes with 1 variables" with Deeply Nested Blocks #4739

Closed
original-brownbear opened this Issue Aug 16, 2017 · 14 comments

Comments

Projects
None yet
3 participants
@original-brownbear
Contributor

original-brownbear commented Aug 16, 2017

EDIT:

I was able to create a non-Logstash reproducer by creating a somewhat arbitrary set of classes with a bunch of blocks and multi-threading.

See reproduced error here:
https://github.com/original-brownbear/jruby-jit-issue-reproducer#jit-dynamicscope-size-error (code is in that repo, link points at the error reproduced (top of the readme contains a second error that I reproduced using this code, it seems we're randomly going into an endless loop here, but I haven't 100% verified that it's not just stupidity on my end introducing recursion of some kind) the the command used to trigger it )

In JRuby 9.1.12.0 we can still reproduce #4516 with the latest Logstash versions in a case of many nested blocks and a number of threads likely trying to JIT them at the same time.

The exception I see is:

{ :exception => "org.jruby.runtime.scopes.DynamicScope1 only supports scopes with 1 variables",
  :backtrace => [
    "org.jruby.runtime.scopes.DynamicScope1.sizeError(Unknown Source)",
    "org.jruby.runtime.scopes.DynamicScope1.getValueDepthZero(Unknown Source)",
    "org.jruby.runtime.DynamicScope.getValueDepthZeroOrNil(DynamicScope.java:257)",
    "org.jruby.runtime.DynamicScope.getValueOrNil(DynamicScope.java:249)",
    "org.jruby.runtime.DynamicScope.getValueOrNil(DynamicScope.java:247)",
    "Users.brownbear.src.logstash.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_filter_minus_grok_minus_3_dot_4_dot_2.lib.logstash.filters.grok.RUBY$block$match_against_groks$1(/Users/brownbear/src/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-grok-3.4.2/lib/logstash/filters/grok.rb:348)",
    "org.jruby.runtime.CompiledIRBlockBody.yieldDirect(CompiledIRBlockBody.java:156)", 
    "org.jruby.runtime.IRBlockBody.yieldSpecific(IRBlockBody.java:80)", 
    "org.jruby.runtime.Block.yieldSpecific(Block.java:134)", 
    "org.jruby.ir.runtime.IRRuntimeHelpers.yieldSpecific(IRRuntimeHelpers.java:419)", 
    "org.jruby.ir.targets.YieldSite.yieldSpecific(YieldSite.java:114)",
    "Users.brownbear.src.logstash.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_filter_minus_grok_minus_3_dot_4_dot_2.lib.logstash.filters.grok.timeout_enforcer.RUBY$method$grok_till_timeout$0(/Users/brownbear/src/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-grok-3.4.2/lib/logstash/filters/grok/timeout_enforcer.rb:21)", "Users.brownbear.src.logstash.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_filter_minus_grok_minus_3_dot_4_dot_2.lib.logstash.filters.grok.RUBY$block$match_against_groks$1(/Users/brownbear/src/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-grok-3.4.2/lib/logstash/filters/grok.rb:348)", "org.jruby.runtime.CompiledIRBlockBody.yieldDirect(CompiledIRBlockBody.java:156)", "org.jruby.runtime.BlockBody.yield(BlockBody.java:114)", "org.jruby.runtime.Block.yield(Block.java:165)", "org.jruby.RubyArray.each(RubyArray.java:1734)", "Users.brownbear.src.logstash.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_filter_minus_grok_minus_3_dot_4_dot_2.lib.logstash.filters.grok.RUBY$method$match_against_groks$0(/Users/brownbear/src/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-grok-3.4.2/lib/logstash/filters/grok.rb:345)", "Users.brownbear.src.logstash.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_filter_minus_grok_minus_3_dot_4_dot_2.lib.logstash.filters.grok.RUBY$method$match$0(/Users/brownbear/src/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-grok-3.4.2/lib/logstash/filters/grok.rb:334)", "Users.brownbear.src.logstash.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_filter_minus_grok_minus_3_dot_4_dot_2.lib.logstash.filters.grok.RUBY$block$filter$1(/Users/brownbear/src/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-grok-3.4.2/lib/logstash/filters/grok.rb:300)", "org.jruby.runtime.CompiledIRBlockBody.yieldDirect(CompiledIRBlockBody.java:156)", "org.jruby.runtime.BlockBody.yield(BlockBody.java:114)", "org.jruby.runtime.Block.yield(Block.java:165)", "org.jruby.RubyHash$12.visit(RubyHash.java:1362)", "org.jruby.RubyHash$12.visit(RubyHash.java:1359)", "org.jruby.RubyHash.visitLimited(RubyHash.java:662)", "org.jruby.RubyHash.visitAll(RubyHash.java:647)", "org.jruby.RubyHash.iteratorVisitAll(RubyHash.java:1319)", "org.jruby.RubyHash.each_pairCommon(RubyHash.java:1354)", "org.jruby.RubyHash.each(RubyHash.java:1343)", "Users.brownbear.src.logstash.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_filter_minus_grok_minus_3_dot_4_dot_2.lib.logstash.filters.grok.RUBY$method$filter$0(/Users/brownbear/src/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-grok-3.4.2/lib/logstash/filters/grok.rb:299)",
    "Users.brownbear.src.logstash.logstash_minus_core.lib.logstash.filters.base.RUBY$method$do_filter$0(/Users/brownbear/src/logstash/logstash-core/lib/logstash/filters/base.rb:145)",
    "Users.brownbear.src.logstash.logstash_minus_core.lib.logstash.filters.base.RUBY$block$multi_filter$1(/Users/brownbear/src/logstash/logstash-core/lib/logstash/filters/base.rb:164)",
    "org.jruby.runtime.CompiledIRBlockBody.yieldDirect(CompiledIRBlockBody.java:156)",
    "org.jruby.runtime.BlockBody.yield(BlockBody.java:114)",
    "org.jruby.runtime.Block.yield(Block.java:165)",
    "org.jruby.RubyArray.each(RubyArray.java:1734)",
    "Users.brownbear.src.logstash.logstash_minus_core.lib.logstash.filters.base.RUBY$method$multi_filter$0(/Users/brownbear/src/logstash/logstash-core/lib/logstash/filters/base.rb:161)",
    "Users.brownbear.src.logstash.logstash_minus_core.lib.logstash.filter_delegator.RUBY$method$multi_filter$0(/Users/brownbear/src/logstash/logstash-core/lib/logstash/filter_delegator.rb:48)",
    "org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:103)",
    "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:163)",
    "org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:200)",
    "org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:161)",
    "org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:314)",
    "org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:73)",
    "org.jruby.ir.interpreter.Interpreter.INTERPRET_BLOCK(Interpreter.java:132)",
    "org.jruby.runtime.MixedModeIRBlockBody.commonYieldPath(MixedModeIRBlockBody.java:148)",
    "org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:73)",
    "org.jruby.runtime.Block.call(Block.java:124)",
    "org.jruby.RubyProc.call(RubyProc.java:289)",
    "org.jruby.internal.runtime.methods.ProcMethod.call(ProcMethod.java:63)",
    "org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:204)",
    "org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:200)",
    "Users.brownbear.src.logstash.logstash_minus_core.lib.logstash.pipeline.RUBY$block$filter_batch$1(/Users/brownbear/src/logstash/logstash-core/lib/logstash/pipeline.rb:499)",
    "org.jruby.runtime.CompiledIRBlockBody.callDirect(CompiledIRBlockBody.java:145)",
    "org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:71)", 
    "org.jruby.runtime.Block.call(Block.java:124)",
    "org.jruby.RubyProc.call(RubyProc.java:289)",
    "org.jruby.RubyProc.call19(RubyProc.java:273)",
    "Users.brownbear.src.logstash.logstash_minus_core.lib.logstash.util.wrapped_synchronous_queue.RUBY$block$each$1(/Users/brownbear/src/logstash/logstash-core/lib/logstash/util/wrapped_synchronous_queue.rb:227)",
    "org.jruby.runtime.CompiledIRBlockBody.yieldDirect(CompiledIRBlockBody.java:156)",
    "org.jruby.runtime.BlockBody.yield(BlockBody.java:114)",
    "org.jruby.runtime.Block.yield(Block.java:165)",
    "org.jruby.javasupport.ext.JavaLang$Iterable.each(JavaLang.java:93)",
    "org.jruby.javasupport.ext.JavaUtil$Collection.each(JavaUtil.java:133)",
    "Users.brownbear.src.logstash.logstash_minus_core.lib.logstash.util.wrapped_synchronous_queue.RUBY$method$each$0(/Users/brownbear/src/logstash/logstash-core/lib/logstash/util/wrapped_synchronous_queue.rb:226)",
    "Users.brownbear.src.logstash.logstash_minus_core.lib.logstash.pipeline.RUBY$method$filter_batch$0(/Users/brownbear/src/logstash/logstash-core/lib/logstash/pipeline.rb:498)",
    "Users.brownbear.src.logstash.logstash_minus_core.lib.logstash.pipeline.RUBY$method$worker_loop$0(/Users/brownbear/src/logstash/logstash-core/lib/logstash/pipeline.rb:477)",
    "Users.brownbear.src.logstash.logstash_minus_core.lib.logstash.pipeline.RUBY$method$worker_loop$0$__VARARGS__(/Users/brownbear/src/logstash/logstash-core/lib/logstash/pipeline.rb)",
    "org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:77)",
    "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:93)",
    "org.jruby.ir.targets.InvokeSite.invoke(InvokeSite.java:145)",
    "Users.brownbear.src.logstash.logstash_minus_core.lib.logstash.pipeline.RUBY$block$start_workers$2(/Users/brownbear/src/logstash/logstash-core/lib/logstash/pipeline.rb:437)",
    "org.jruby.runtime.CompiledIRBlockBody.callDirect(CompiledIRBlockBody.java:145)",
    "org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:71)",
    "org.jruby.runtime.Block.call(Block.java:124)",
    "org.jruby.RubyProc.call(RubyProc.java:289)",
    "org.jruby.RubyProc.call(RubyProc.java:246)",
    "org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:104)",
    "java.lang.Thread.run(Thread.java:748)"
  ], :class => "Java::JavaLang::RuntimeException"
}

the last few steps here are:

    def match_against_groks(groks, field, input, event)
      input = input.to_s
      matched = false
      groks.each do |grok|
        # Convert anything else to string (number, hash, etc)

        matched = @timeout_enforcer.grok_till_timeout(event, grok, field, input) { grok.execute(input) }
        if matched
          grok.capture(matched) {|field, value| handle(field, value, event)}
          break if @break_on_match
        end
      end
      
      matched
    end

using:

  def start_thread_groking(thread)
    # Clear any interrupts from any previous invocations that were not caught by Joni
    thread.interrupted
    synchronize do
      @threads_to_start_time[thread] = java.lang.System.nanoTime()
    end
  end
  def synchronize
    # The JRuby Mutex uses lockInterruptibly which is what we DO NOT want
    @state_lock.lock
    yield
  ensure
    @state_lock.unlock
  end
  def grok_till_timeout(event, grok, field, value)
    begin
      thread = java.lang.Thread.currentThread()
      start_thread_groking(thread)
      yield
    rescue InterruptedRegexpError => e
      raise ::LogStash::Filters::Grok::TimeoutException.new(grok, field, value)
    ensure
      stop_thread_groking(thread)
      # Clear any interrupts from any previous invocations that were not caught by Joni
      # It may appear that this should go in #stop_thread_groking but that would actually
      # break functionality! If this were moved there we would clear the interrupt
      # immediately after setting it in #cancel_timed_out, hence this MUST be here
      thread.interrupted
    end
  end

Sorry for not having a shorter reproducer, but as far as I understand the JRuby codebase and the explainations in #4516 and related issue this probably very hard to reproduce in isolation.

  • I'm pretty certain this is a concurrency issue since we can tune the number of threads concurrently going through this code, and the higher that number goes the more likely this exception is to show up.
  • Enabling AOT (compile mode FORCE) compilation makes the issue go away (or at least so unlikely that I couldn't reproduce it so far)
  • We haven't see this in 1.7.x ever
  • I tried running this code in isolation across multiple threads but couldn't reproduce the problem, so I guess it matters how much unrelated load is on the JIT(/Ruby Runtime) in other threads concurrently?
  • I can reproduce this with a JRuby I build from today's master 158574d as well

I'll try to get you more details to work with.

@original-brownbear

This comment has been minimized.

Show comment
Hide comment
@original-brownbear

original-brownbear Aug 17, 2017

Contributor

Update:

Outright forcing global sync on org.jruby.runtime.MixedModeIRBlockBody#commonYieldPath (see original-brownbear@6388200#diff-a17497dede7a4079b652e5637aa340d4R126) I can confirm fixes the issue, so it's some concurrency problem with the jitting happening downstream from here I guess.
I'll try to find a proper fix for that if no one finds time for it, though any help would be very much appreciated :)

Contributor

original-brownbear commented Aug 17, 2017

Update:

Outright forcing global sync on org.jruby.runtime.MixedModeIRBlockBody#commonYieldPath (see original-brownbear@6388200#diff-a17497dede7a4079b652e5637aa340d4R126) I can confirm fixes the issue, so it's some concurrency problem with the jitting happening downstream from here I guess.
I'll try to find a proper fix for that if no one finds time for it, though any help would be very much appreciated :)

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 17, 2017

Member

Looking into this.

Member

headius commented Aug 17, 2017

Looking into this.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 17, 2017

Member

I was unable to reproduce using JRuby master and your scripts. Switching to 9.1.13 branch now.

Member

headius commented Aug 17, 2017

I was unable to reproduce using JRuby master and your scripts. Switching to 9.1.13 branch now.

@original-brownbear

This comment has been minimized.

Show comment
Hide comment
@original-brownbear

original-brownbear Aug 17, 2017

Contributor

@headius I ran

jruby -J-Djruby.jit.logging=true -J-Djruby.jit.threshold=0 -J-Djruby.jit.logging.verbose=true -Xcompile.invokedynamic=true reproducer.rb > /dev/null

and I can still reproduce it with current master.

Contributor

original-brownbear commented Aug 17, 2017

@headius I ran

jruby -J-Djruby.jit.logging=true -J-Djruby.jit.threshold=0 -J-Djruby.jit.logging.verbose=true -Xcompile.invokedynamic=true reproducer.rb > /dev/null

and I can still reproduce it with current master.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 17, 2017

Member

I was unable to get it to fail with OracleJDK 8 but when I switched to a Zulu build of Java 7 I got the DynamicScope error right away. I'm guessing this may be a problem with concurrent classloading in Java 7 that was fixed later on. Working on a patch.

Member

headius commented Aug 17, 2017

I was unable to get it to fail with OracleJDK 8 but when I switched to a Zulu build of Java 7 I got the DynamicScope error right away. I'm guessing this may be a problem with concurrent classloading in Java 7 that was fixed later on. Working on a patch.

@original-brownbear

This comment has been minimized.

Show comment
Hide comment
@original-brownbear

original-brownbear Aug 17, 2017

Contributor

@headius not so sure about that, for me it fails reliably on

➜  $ java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
Contributor

original-brownbear commented Aug 17, 2017

@headius not so sure about that, for me it fails reliably on

➜  $ java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 17, 2017

Member

It's very intermittent but I've managed to get both failures to appear.

The DynamicScope error I suspect is a concurrent classloading problem.

The stack overflow is more likely a race condition in IR. It appears to be getting mixed up which block it should be passing to a call and ending up looping back on itself.

Member

headius commented Aug 17, 2017

It's very intermittent but I've managed to get both failures to appear.

The DynamicScope error I suspect is a concurrent classloading problem.

The stack overflow is more likely a race condition in IR. It appears to be getting mixed up which block it should be passing to a call and ending up looping back on itself.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 18, 2017

Member

This is a bugger to hunt down but I'm proceeding from your workaround of synchronizing all of block JIT.

Member

headius commented Aug 18, 2017

This is a bugger to hunt down but I'm proceeding from your workaround of synchronizing all of block JIT.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 18, 2017

Member

A few other findings:

  • indy and threshold=0 are not prerequisites for any of the errors I have seen. They will happen, more or less, with other options.
  • It's definitely relating to JIT, but I think we've established that.
  • If it's going to fail it fails pretty early in this example, but with normal JIT thresholds it could definitely be more random.
Member

headius commented Aug 18, 2017

A few other findings:

  • indy and threshold=0 are not prerequisites for any of the errors I have seen. They will happen, more or less, with other options.
  • It's definitely relating to JIT, but I think we've established that.
  • If it's going to fail it fails pretty early in this example, but with normal JIT thresholds it could definitely be more random.

headius added a commit that referenced this issue Aug 18, 2017

Synchronize bytecode JIT to avoid concurrency issues.
This effectively means the background JIT will usually be limited
to one thread operating at a time. How much this will slow down
or bottleneck the JIT is not clear but this fix will let us know
if we need a deeper look at synchronization of the JIT and IR
structures modified at JIT time.

Part of work for #4739.

headius added a commit that referenced this issue Aug 18, 2017

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 18, 2017

Member

I've pushed a few commits that seem to have helped things. They're on the jit-conc-fixes branch for you to test out.

  • Synchronized JIT on a per-runtime basis. This should probably be temporary, but if it works at least we know we're closer to synchronizing what's important. I was unable to get a stack overflow failure after this change.
  • Synchronize the constructor logic in StaticScope. I suspect two threads may have gotten different constructors and tripped each other up mid-flight with different resulting scope classes.
  • Test under sync whether block has jitted already.

Give that branch a try and let us know how it looks.

Member

headius commented Aug 18, 2017

I've pushed a few commits that seem to have helped things. They're on the jit-conc-fixes branch for you to test out.

  • Synchronized JIT on a per-runtime basis. This should probably be temporary, but if it works at least we know we're closer to synchronizing what's important. I was unable to get a stack overflow failure after this change.
  • Synchronize the constructor logic in StaticScope. I suspect two threads may have gotten different constructors and tripped each other up mid-flight with different resulting scope classes.
  • Test under sync whether block has jitted already.

Give that branch a try and let us know how it looks.

@original-brownbear

This comment has been minimized.

Show comment
Hide comment
@original-brownbear

original-brownbear Aug 22, 2017

Contributor

@headius as I said in the linked LS issue, this fix looks very promising, so far I couldn't reproduce the reported problems with it :)
Happy to see this merged! :)

Contributor

original-brownbear commented Aug 22, 2017

@headius as I said in the linked LS issue, this fix looks very promising, so far I couldn't reproduce the reported problems with it :)
Happy to see this merged! :)

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 25, 2017

Member

Fixes merged to master and 9.1.13.0.

Member

headius commented Aug 25, 2017

Fixes merged to master and 9.1.13.0.

@headius headius closed this Aug 25, 2017

@original-brownbear

This comment has been minimized.

Show comment
Hide comment
@original-brownbear

original-brownbear Aug 25, 2017

Contributor

@headius thanks a lot! Any ETA on the 9.1.13.0 release?

Contributor

original-brownbear commented Aug 25, 2017

@headius thanks a lot! Any ETA on the 9.1.13.0 release?

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Sep 5, 2017

Member

We should release 9.1.13 this week or next.

Member

headius commented Sep 5, 2017

We should release 9.1.13 this week or next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment