Error when matching regex in multiple threads #3670

Closed
marshalium opened this Issue Feb 15, 2016 · 17 comments

Comments

Projects
None yet
10 participants
@marshalium

1.7.24 and 9.0.5.0 both have a pretty serious regression when matching against utf8 strings in multiple threads at the same time. This bug does not appear to be present in 1.7.23 or 9.0.4.0.

If multiple threads are running code like this, with completely unshared string and regex objects:

str = "foobar"
str.force_encoding("UTF-8")
str.gsub(/foo/i, '')

eventually one of the threads will throw an error like this:

Exception in thread "Ruby-0-Thread-6: ./recreate_utf8_bug.rb:3" java.lang.ArrayIndexOutOfBoundsException: 6
  at org.jcodings.specific.BaseUTF8Encoding.mbcCaseFold(BaseUTF8Encoding.java:167)
  at org.jcodings.specific.UTF8Encoding.mbcCaseFold(UTF8Encoding.java:24)
  at org.joni.SearchAlgorithm$SLOW_IC.lowerCaseMatch(SearchAlgorithm.java:238)
  at org.joni.SearchAlgorithm$SLOW_IC.search(SearchAlgorithm.java:206)
  at org.joni.Matcher.forwardSearchRange(Matcher.java:140)
  at org.joni.Matcher.searchInterruptible(Matcher.java:451)
  at org.jruby.RubyRegexp$SearchMatchTask.run(RubyRegexp.java:273)
  at org.jruby.RubyThread.executeBlockingTask(RubyThread.java:1066)
  at org.jruby.RubyRegexp.matcherSearch(RubyRegexp.java:235)
  at org.jruby.RubyString.gsubCommon19(RubyString.java:3123)
  at org.jruby.RubyString.gsubCommon19(RubyString.java:3106)
  at org.jruby.RubyString.gsub19(RubyString.java:3101)
  at org.jruby.RubyString.gsub19(RubyString.java:3069)
  at org.jruby.RubyString$INVOKER$i$gsub19.call(RubyString$INVOKER$i$gsub19.gen)
  at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodOneOrTwoOrNBlock.call(JavaMethod.java:367)
  at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:202)
  at $_dot_.recreate_utf8_bug.block_2$RUBY$__file__(./recreate_utf8_bug.rb:13)
  at $_dot_$recreate_utf8_bug$block_2$RUBY$__file__.call($_dot_$recreate_utf8_bug$block_2$RUBY$__file__)
  at org.jruby.runtime.CompiledBlock19.yieldSpecificInternal(CompiledBlock19.java:117)
  at org.jruby.runtime.CompiledBlock19.yieldSpecific(CompiledBlock19.java:92)
  at org.jruby.runtime.Block.yieldSpecific(Block.java:111)
  at org.jruby.RubyFixnum.times(RubyFixnum.java:275)
  at org.jruby.RubyFixnum$INVOKER$i$0$0$times.call(RubyFixnum$INVOKER$i$0$0$times.gen)
  at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:143)
  at org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)
  at $_dot_.recreate_utf8_bug.chained_0_rescue_1$RUBY$SYNTHETIC__file__(./recreate_utf8_bug.rb:10)
  at $_dot_.recreate_utf8_bug.block_1$RUBY$__file__(./recreate_utf8_bug.rb:9)
  at $_dot_$recreate_utf8_bug$block_1$RUBY$__file__.call($_dot_$recreate_utf8_bug$block_1$RUBY$__file__)
  at org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:159)
  at org.jruby.runtime.CompiledBlock19.call(CompiledBlock19.java:87)
  at org.jruby.runtime.Block.call(Block.java:101)
  at org.jruby.RubyProc.call(RubyProc.java:300)
  at org.jruby.RubyProc.call(RubyProc.java:230)
  at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:99)
  at java.lang.Thread.run(Thread.java:745)

The error doesn't occur if I remove the call to #force_encoding or if I remove the "i" flag from the regex.

See this gist for a full reproducible test case and examples of running it on different JRuby versions:
https://gist.github.com/marshalium/3e62c2affbd2ce95757f

@enebo enebo added this to the JRuby 1.7.25 milestone Feb 15, 2016

@enebo enebo added the core label Feb 15, 2016

@kares kares added the regression label Feb 16, 2016

@rsim

This comment has been minimized.

Show comment
Hide comment
@rsim

rsim Feb 22, 2016

We are experiencing this issue as well and it resulted in a large number of corrupted data in YAML serialized ActiveRecord columns.

rsim commented Feb 22, 2016

We are experiencing this issue as well and it resulted in a large number of corrupted data in YAML serialized ActiveRecord columns.

@rsim

This comment has been minimized.

Show comment
Hide comment
@rsim

rsim Feb 22, 2016

Based on the original example, I created a simple test case which always fails with ArrayIndexOutOfBoundsException is some threads:

10.times.map{Thread.new{100000.times{"foobar".force_encoding("UTF-8") =~ /foo/i}}}.map(&:value)

I would recommend that http://jruby.org/2016/01/20/jruby-1-7-24 is updated with this critical bug information and with recommendation not to use this version in multi-threaded production applications. This bug most probably will not be caught by application test suites and might cause critical problems in production.

rsim commented Feb 22, 2016

Based on the original example, I created a simple test case which always fails with ArrayIndexOutOfBoundsException is some threads:

10.times.map{Thread.new{100000.times{"foobar".force_encoding("UTF-8") =~ /foo/i}}}.map(&:value)

I would recommend that http://jruby.org/2016/01/20/jruby-1-7-24 is updated with this critical bug information and with recommendation not to use this version in multi-threaded production applications. This bug most probably will not be caught by application test suites and might cause critical problems in production.

lopex added a commit to jruby/joni that referenced this issue Feb 22, 2016

@lopex

This comment has been minimized.

Show comment
Hide comment
@lopex

lopex Feb 22, 2016

Member

Temporary fix is in jruby/joni@f23bffa by making SLOW_IC stateless and moving the cached buffer into matcher instance. In the future we need to get rid of those stateful caches and IntHolder by making them local.
@headius @enebo No idea why it might behave as a regression, this was always broken.

Member

lopex commented Feb 22, 2016

Temporary fix is in jruby/joni@f23bffa by making SLOW_IC stateless and moving the cached buffer into matcher instance. In the future we need to get rid of those stateful caches and IntHolder by making them local.
@headius @enebo No idea why it might behave as a regression, this was always broken.

@lopex

This comment has been minimized.

Show comment
Hide comment
@lopex

lopex Feb 23, 2016

Member

Commit that revealed that bug in EXACT_IC: jruby/joni@eda1681

Member

lopex commented Feb 23, 2016

Commit that revealed that bug in EXACT_IC: jruby/joni@eda1681

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 23, 2016

Member

@lopex Perhaps the caching of Regex was not as good and something got fixed around these releases? That would trigger more sharing of this field, wouldn't it?

Member

headius commented Feb 23, 2016

@lopex Perhaps the caching of Regex was not as good and something got fixed around these releases? That would trigger more sharing of this field, wouldn't it?

@lopex

This comment has been minimized.

Show comment
Hide comment
@lopex

lopex Feb 23, 2016

Member

@headius jruby/joni@eda1681 changed the way heuristics applied fast skipping / fast failing by favouring EXACT_IC for this very case. EXACT_IC had a buffer as an instance field and since it itself is scoped by Regexp instance, the buffer essentially turned out to be Regexp global (in case of a literal regexp). The fix just moved that buffer into Matcher instance for potential reuse when jumping in/out of bytecode interpreter. So caching of regexps never had any effect on this.

Member

lopex commented Feb 23, 2016

@headius jruby/joni@eda1681 changed the way heuristics applied fast skipping / fast failing by favouring EXACT_IC for this very case. EXACT_IC had a buffer as an instance field and since it itself is scoped by Regexp instance, the buffer essentially turned out to be Regexp global (in case of a literal regexp). The fix just moved that buffer into Matcher instance for potential reuse when jumping in/out of bytecode interpreter. So caching of regexps never had any effect on this.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 24, 2016

Member

I updated ruby-2.3 to use new joni, but we'll need a release for 9.1.

Member

headius commented Feb 24, 2016

I updated ruby-2.3 to use new joni, but we'll need a release for 9.1.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 24, 2016

Member

Oh, and we'll need to apply it to 1.7 of course.

Member

headius commented Feb 24, 2016

Oh, and we'll need to apply it to 1.7 of course.

@ncuesta

This comment has been minimized.

Show comment
Hide comment
@ncuesta

ncuesta Mar 2, 2016

Hi all, we are experiencing this same issue. Which is the recommended workaround until new versions including a fix for this are released? Should we downgrade to 9.0.4.0 (or 1.7.23 for the 1.7 branch)?

Thanks

ncuesta commented Mar 2, 2016

Hi all, we are experiencing this same issue. Which is the recommended workaround until new versions including a fix for this are released? Should we downgrade to 9.0.4.0 (or 1.7.23 for the 1.7 branch)?

Thanks

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Mar 2, 2016

Member

@ncuesta downgrading I think is the safest bet until we put out new releases. We will hopefully have a new release out in the next week to 2 weeks at most. At this point we are tidying up after the last dev cycle...

Member

enebo commented Mar 2, 2016

@ncuesta downgrading I think is the safest bet until we put out new releases. We will hopefully have a new release out in the next week to 2 weeks at most. At this point we are tidying up after the last dev cycle...

@ncuesta

This comment has been minimized.

Show comment
Hide comment

ncuesta commented Mar 2, 2016

Thanks @enebo!

@colinsurprenant

This comment has been minimized.

Show comment
Hide comment
@colinsurprenant

colinsurprenant Apr 1, 2016

we also started to notice this in the new logstash 2.3 release which includes 1.7.24, in useragent filter logstash-plugins/logstash-filter-useragent#25 and we are tracking this regex issue in elastic/logstash#4977

we also started to notice this in the new logstash 2.3 release which includes 1.7.24, in useragent filter logstash-plugins/logstash-filter-useragent#25 and we are tracking this regex issue in elastic/logstash#4977

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Apr 11, 2016

Member

joni has been updated and released for a few weeks with this addressed. Closing.

Member

enebo commented Apr 11, 2016

joni has been updated and released for a few weeks with this addressed. Closing.

@enebo enebo closed this Apr 11, 2016

@MartinKoerner

This comment has been minimized.

Show comment
Hide comment
@MartinKoerner

MartinKoerner Apr 18, 2016

Will there be a new 9.0.x release for this fix or do we have to wait until 9.1?

The JDBC MSSQL adapter code is full of //i Regex expressions, which produce 'nice' exceptions like 'no columns for table: ' or 'undefined method `[]' for nil:NilClass'

Will there be a new 9.0.x release for this fix or do we have to wait until 9.1?

The JDBC MSSQL adapter code is full of //i Regex expressions, which produce 'nice' exceptions like 'no columns for table: ' or 'undefined method `[]' for nil:NilClass'

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Apr 18, 2016

Member

@MartinKoerner There will only be a 9.1 release which should be this week...

Member

enebo commented Apr 18, 2016

@MartinKoerner There will only be a 9.1 release which should be this week...

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo May 11, 2016

Member

@DzmitryNikitsin I see this comment sent to me as email but not in github issues. Was this
comment withdrawn?

On Fri, May 6, 2016 at 9:15 PM, DzmitryNikitsin notifications@github.com
wrote:

@enebo https://github.com/enebo i've just updated to 9.1.0.0, using
regexp match with multithreading, still seeing the same issue :(


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#3670 (comment)

blog: http://blog.enebo.com twitter: tom_enebo
mail: tom.enebo@gmail.com

Member

enebo commented May 11, 2016

@DzmitryNikitsin I see this comment sent to me as email but not in github issues. Was this
comment withdrawn?

On Fri, May 6, 2016 at 9:15 PM, DzmitryNikitsin notifications@github.com
wrote:

@enebo https://github.com/enebo i've just updated to 9.1.0.0, using
regexp match with multithreading, still seeing the same issue :(


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#3670 (comment)

blog: http://blog.enebo.com twitter: tom_enebo
mail: tom.enebo@gmail.com

@DzmitryNikitsin

This comment has been minimized.

Show comment
Hide comment
@DzmitryNikitsin

DzmitryNikitsin May 11, 2016

Yes, issue was fixed on 9.1.0.0.

Thanks,
Dzmitry

From: Thomas E Enebo notifications@github.com
Reply-To: jruby/jruby reply@reply.github.com
Date: Wednesday, May 11, 2016 at 2:30 PM
To: jruby/jruby jruby@noreply.github.com
Cc: Dzmitry Nikitsin dzmitry.nikitsin@gmail.com, Comment comment@noreply.github.com
Subject: Re: [jruby/jruby] Error when matching regex in multiple threads (#3670)

I see this comment sent to me as email but not in github issues. Was this
comment withdrawn?

On Fri, May 6, 2016 at 9:15 PM, DzmitryNikitsin notifications@github.com
wrote:

@enebo https://github.com/enebo i've just updated to 9.1.0.0, using
regexp match with multithreading, still seeing the same issue :(


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#3670 (comment)

blog: http://blog.enebo.com twitter: tom_enebo
mail: tom.enebo@gmail.com


You are receiving this because you commented.
Reply to this email directly or view it on GitHub

Yes, issue was fixed on 9.1.0.0.

Thanks,
Dzmitry

From: Thomas E Enebo notifications@github.com
Reply-To: jruby/jruby reply@reply.github.com
Date: Wednesday, May 11, 2016 at 2:30 PM
To: jruby/jruby jruby@noreply.github.com
Cc: Dzmitry Nikitsin dzmitry.nikitsin@gmail.com, Comment comment@noreply.github.com
Subject: Re: [jruby/jruby] Error when matching regex in multiple threads (#3670)

I see this comment sent to me as email but not in github issues. Was this
comment withdrawn?

On Fri, May 6, 2016 at 9:15 PM, DzmitryNikitsin notifications@github.com
wrote:

@enebo https://github.com/enebo i've just updated to 9.1.0.0, using
regexp match with multithreading, still seeing the same issue :(


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#3670 (comment)

blog: http://blog.enebo.com twitter: tom_enebo
mail: tom.enebo@gmail.com


You are receiving this because you commented.
Reply to this email directly or view it on GitHub

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment