New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFI memory leak regression #4625

Closed
marshalium opened this Issue May 25, 2017 · 7 comments

Comments

Projects
None yet
5 participants
@marshalium

Environment

$ jruby -v
jruby 9.1.9.0 (2.3.3) 2017-05-15 28aa830 Java HotSpot(TM) 64-Bit Server VM 25.121-b13 on 1.8.0_121-b13 +jit [darwin-x86_64]
$ uname -a
Darwin marshall-work-lappy.local 15.6.0 Darwin Kernel Version 15.6.0: Tue Apr 11 16:00:51 PDT 2017; root:xnu-3248.60.11.5.3~1/RELEASE_X86_64 x86_64

No additional gems installed or loaded.

I discovered this bug in a production service that uses the posix_spawn() version of https://github.com/enkessler/childprocess to execute subprocesses.

Expected Behavior

I expect JRuby to not run out of memory when repeatedly allocating and freeing a constant number of FFI::MemoryPointer objects.

In the test script below there is a single thread that allocates and then immediately frees what it allocated. I've run this for several minutes successfully on JRuby 9.1.7.0 and MRI 2.4.1 but on JRuby 9.1.8.0 and 9.1.9.0 it runs out of memory within a few seconds.

require 'ffi'

puts RUBY_DESCRIPTION

i = 0
loop do
  ptr = FFI::MemoryPointer.from_string('some random test string')
  ptr.free
  i += 1
  puts "#{Time.now}: created #{i} ptrs" if i % 1_000_000 == 0
end

Actual Behavior

The JVM runs out of memory after a few seconds. It doesn't look like the native memory allocations are ever freed.

The result of running the above script on JRuby 9.1.9.0:

$ jruby -w ffi_mem_leak.rb 
jruby 9.1.9.0 (2.3.3) 2017-05-15 28aa830 Java HotSpot(TM) 64-Bit Server VM 25.121-b13 on 1.8.0_121-b13 +jit [darwin-x86_64]
2017-05-24 16:28:50 -0700: created 1000000 ptrs
2017-05-24 16:28:50 -0700: created 2000000 ptrs
2017-05-24 16:28:52 -0700: created 3000000 ptrs
2017-05-24 16:28:52 -0700: created 4000000 ptrs
2017-05-24 16:28:52 -0700: created 5000000 ptrs
2017-05-24 16:28:53 -0700: created 6000000 ptrs
2017-05-24 16:28:53 -0700: created 7000000 ptrs
2017-05-24 16:28:54 -0700: created 8000000 ptrs
2017-05-24 16:28:56 -0700: created 9000000 ptrs
2017-05-24 16:28:57 -0700: created 10000000 ptrs
2017-05-24 16:28:58 -0700: created 11000000 ptrs
2017-05-24 16:29:04 -0700: created 12000000 ptrs
Error: Your application used more memory than the safety cap of 500M.
Specify -J-Xmx####M to increase it (#### = cap size in MB).
java.lang.OutOfMemoryError: GC overhead limit exceeded
  at org.jruby.RubyFixnum.newFixnum(RubyFixnum.java:206)
  at org.jruby.Ruby.newFixnum(Ruby.java:3390)
  at org.jruby.RubyFixnum.moduloFixnum(RubyFixnum.java:667)
  at org.jruby.RubyFixnum.moduloFixnum(RubyFixnum.java:653)
  at org.jruby.RubyFixnum.op_mod(RubyFixnum.java:643)
  at org.jruby.RubyFixnum$INVOKER$i$1$0$op_mod.call(RubyFixnum$INVOKER$i$1$0$op_mod.gen)
  at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:161)
  at ffi_mem_leak.invokeOther5:%(ffi_mem_leak.rb:10)
  at ffi_mem_leak.RUBY$block$\=ffi_mem_leak\,rb$0(ffi_mem_leak.rb:10)
  at java.lang.invoke.LambdaForm$DMH/949057310.invokeStatic_L8_L(LambdaForm$DMH)
  at java.lang.invoke.LambdaForm$MH/1188753216.invokeExact_MT(LambdaForm$MH)
  at org.jruby.runtime.CompiledIRBlockBody.yieldDirect(CompiledIRBlockBody.java:156)
  at org.jruby.runtime.IRBlockBody.yieldSpecific(IRBlockBody.java:80)
  at org.jruby.runtime.Block.yieldSpecific(Block.java:134)
  at org.jruby.RubyKernel.loop(RubyKernel.java:1298)
  at org.jruby.RubyKernel$INVOKER$s$0$0$loop.call(RubyKernel$INVOKER$s$0$0$loop.gen)
  at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:328)
  at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:141)
  at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:145)
  at ffi_mem_leak.invokeOther17:loop(ffi_mem_leak.rb:6)
  at ffi_mem_leak.RUBY$script(ffi_mem_leak.rb:6)
  at java.lang.invoke.LambdaForm$DMH/1670782018.invokeStatic_L7_L(LambdaForm$DMH)
  at java.lang.invoke.LambdaForm$BMH/842326585.reinvoke(LambdaForm$BMH)
  at java.lang.invoke.LambdaForm$MH/1297149880.invoker(LambdaForm$MH)
  at java.lang.invoke.LambdaForm$MH/1876631416.invokeExact_MT(LambdaForm$MH)
  at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627)
  at org.jruby.ir.Compiler$1.load(Compiler.java:95)
  at org.jruby.Ruby.runScript(Ruby.java:827)
  at org.jruby.Ruby.runNormally(Ruby.java:746)
  at org.jruby.Ruby.runNormally(Ruby.java:764)
  at org.jruby.Ruby.runFromMain(Ruby.java:577)
  at org.jruby.Main.doRunFromMain(Main.java:417)

Just before it crashed the heap histogram looked like this:

$ jmap -histo 82144 | head -n 10

 num     #instances         #bytes  class name
----------------------------------------------
   1:      10903078      348898496  org.jruby.ext.ffi.jffi.CachingNativeMemoryAllocator$MemoryAllocation
   2:         21342       44049888  [Lorg.jruby.ext.ffi.jffi.CachingNativeMemoryAllocator$MemoryAllocation;
   3:         26188        1453600  [C
   4:         37948        1214336  java.util.concurrent.ConcurrentHashMap$Node
   5:          9247         813736  java.lang.reflect.Method
   6:          7358         808264  java.lang.Class
   7:         21342         682944  org.jruby.ext.ffi.jffi.CachingNativeMemoryAllocator$AllocationGroup
@mrmargolis

This comment has been minimized.

Show comment
Hide comment
@mrmargolis

mrmargolis May 26, 2017

Seeing this on a large-ish app after ~18 hours of runtime.

jruby 9.1.9.0 (2.3.3) 2017-05-15 
 num     #instances         #bytes  class name
----------------------------------------------
   1:        173179     1067074992  [Lorg.jruby.ext.ffi.jffi.CachingNativeMemoryAllocator$MemoryAllocation;
   2:       8929365      357174600  org.jruby.RubyHash$RubyHashEntry
   3:       4445921      269973688  [B
   4:       4535390      181415600  org.jruby.RubyString
   5:       4260862      170434480  org.jruby.util.ByteList
   6:       2039577      130638416  [Lorg.jruby.RubyHash$RubyHashEntry;
   7:       2037585      114104760  org.jruby.RubyHash
   8:        391182       84617600  [I
   9:       2079424       82845488  [Lorg.jruby.runtime.builtin.IRubyObject;
  10:        746880       75174440  [Ljava.lang.Object;
  11:        859867       69482504  [C
  12:       2067700       66166400  org.jruby.ext.ffi.jffi.CachingNativeMemoryAllocator$MemoryAllocation

mrmargolis commented May 26, 2017

Seeing this on a large-ish app after ~18 hours of runtime.

jruby 9.1.9.0 (2.3.3) 2017-05-15 
 num     #instances         #bytes  class name
----------------------------------------------
   1:        173179     1067074992  [Lorg.jruby.ext.ffi.jffi.CachingNativeMemoryAllocator$MemoryAllocation;
   2:       8929365      357174600  org.jruby.RubyHash$RubyHashEntry
   3:       4445921      269973688  [B
   4:       4535390      181415600  org.jruby.RubyString
   5:       4260862      170434480  org.jruby.util.ByteList
   6:       2039577      130638416  [Lorg.jruby.RubyHash$RubyHashEntry;
   7:       2037585      114104760  org.jruby.RubyHash
   8:        391182       84617600  [I
   9:       2079424       82845488  [Lorg.jruby.runtime.builtin.IRubyObject;
  10:        746880       75174440  [Ljava.lang.Object;
  11:        859867       69482504  [C
  12:       2067700       66166400  org.jruby.ext.ffi.jffi.CachingNativeMemoryAllocator$MemoryAllocation
@camlow325

This comment has been minimized.

Show comment
Hide comment
@camlow325

camlow325 May 26, 2017

Contributor

I'm not sure if this is related, but we've also been seeing an ffi-related memory leak in JRuby 9.1.9.0. The later jnr-ffi versions, starting with 2.1.3, seem to have caused the problem. I filed an issue, jnr/jnr-ffi#117, and possible PR fix, jnr/jnr-ffi#116, for this today.

Contributor

camlow325 commented May 26, 2017

I'm not sure if this is related, but we've also been seeing an ffi-related memory leak in JRuby 9.1.9.0. The later jnr-ffi versions, starting with 2.1.3, seem to have caused the problem. I filed an issue, jnr/jnr-ffi#117, and possible PR fix, jnr/jnr-ffi#116, for this today.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius May 30, 2017

Member

Ok, I'm trying to come up with a reasonable test for this, but it's kinda nasty. Unfortunately a simple struct like the one created for stat does not appear to leak the same way.

Member

headius commented May 30, 2017

Ok, I'm trying to come up with a reasonable test for this, but it's kinda nasty. Unfortunately a simple struct like the one created for stat does not appear to leak the same way.

@camlow325

This comment has been minimized.

Show comment
Hide comment
@camlow325

camlow325 May 30, 2017

Contributor

@headius Did you see the more direct jnr-ffi test case that I included in jnr/jnr-ffi#117? I suppose that one's probably a little lower level than would be useful for testing from JRuby but just wanted to point that out in case you hadn't seen it.

Contributor

camlow325 commented May 30, 2017

@headius Did you see the more direct jnr-ffi test case that I included in jnr/jnr-ffi#117? I suppose that one's probably a little lower level than would be useful for testing from JRuby but just wanted to point that out in case you hadn't seen it.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius May 30, 2017

Member

@camlow325 I had not seen it until now. I just committed a test to JRuby in 6b6f27b, but perhaps your case from jnr/jnr-ffi#117 could be turned into a proper test suite test case?

Member

headius commented May 30, 2017

@camlow325 I had not seen it until now. I just committed a test to JRuby in 6b6f27b, but perhaps your case from jnr/jnr-ffi#117 could be turned into a proper test suite test case?

@marshalium

This comment has been minimized.

Show comment
Hide comment
@marshalium

marshalium May 30, 2017

Awesome. Thanks for fixing this so fast @camlow325 and @headius!

Awesome. Thanks for fixing this so fast @camlow325 and @headius!

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius May 31, 2017

Member

FYI, we probably won't be able to spin 9.1.11.0 until mid-June due to folks being out on vacation. But we'll do it as soon as possible! Please test out master and verify that everything looks good for those of you that have had issues.

Member

headius commented May 31, 2017

FYI, we probably won't be able to spin 9.1.11.0 until mid-June due to folks being out on vacation. But we'll do it as soon as possible! Please test out master and verify that everything looks good for those of you that have had issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment