Exception NegativeArraySizeException during JSON.dump of large Hash #6265

jbaiza · 2020-06-04T12:42:06Z

Environment Information

Provide at least:

JRuby version (jruby -v) and command line (flags, JRUBY_OPTS, etc)
Originally detected on 9.2.9.0:
jruby 9.2.9.0 (2.5.7) 2019-10-30 458ad3e OpenJDK 64-Bit Server VM 11.0.5+10 on 11.0.5+10 [darwin-x86_64]
but can be reproduced also on the latest 9.2.11.1:
jruby 9.2.11.1 (2.5.7) 2020-03-25 b1f55b1 OpenJDK 64-Bit Server VM 11.0.5+10 on 11.0.5+10 [darwin-x86_64]
Operating system and platform (e.g. uname -a)
Linux staging-app1 4.4.0-170-generic Allow the RubyClass to be determined when extending BigDecimal #199-Ubuntu SMP Thu Nov 14 01:45:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
also on Dev box:
Darwin JBA-MacBook-Pro.local 19.4.0 Darwin Kernel Version 19.4.0: Wed Mar 4 22:28:40 PST 2020; root:xnu-6153.101.6~15/RELEASE_X86_64 x86_64

Other relevant info you may wish to add:

Installed or activated gems - N/A
Application/framework version (e.g. Rails, Sinatra) - N/A
Environment variables - increase Java memory with JAVA_OPTS=-Xmx4g

Expected Behavior

Describe your expectation of how JRuby should behave, perhaps by showing how CRuby/MRI behaves.
JSON.dump completes without an error for a large hash. On MRI 2.5.1 provided sample code completes without an error.
Provide an executable Ruby script or a link to an example repository.

require 'JSON'
arr = [{"0" => "0", "1" => "1", "2" => "2", "3" => "3", "4" => "4", "5" => "5"}]
begin
  25.times do
    arr.concat arr
    puts "ARR size: #{arr.size}"
  end
  puts "JSON.dump size: #{JSON.dump(arr).size}"
  arr.concat arr;true
  arr.size
  puts "ARR size: #{arr.size}"
  puts "JSON.dump size: #{JSON.dump(arr).size}"
end

Actual Behavior

An error is thrown:

Traceback (most recent call last):
       16: from json.ext.Generator$Handler.generateNew(Generator.java:194)
       15: from json.ext.Generator$4.generate(Generator.java:272)
       14: from json.ext.Generator$4.generate(Generator.java:315)
       13: from json.ext.Generator$5.generate(Generator.java:329)
       12: from json.ext.Generator$5.generate(Generator.java:356)
       11: from org.jruby.dist/org.jruby.RubyHash.visitAll(RubyHash.java:2746)
       10: from org.jruby.dist/org.jruby.RubyHash.visitLimited(RubyHash.java:699)
        9: from org.jruby.dist/org.jruby.RubyHash$Visitor.visit(RubyHash.java:677)
        8: from json.ext.Generator$5$1.visit(Generator.java:377)
        7: from json.ext.Generator$6.generate(Generator.java:391)
        6: from json.ext.Generator$6.generate(Generator.java:412)
        5: from json.ext.StringEncoder.encode(StringEncoder.java:51)
        4: from json.ext.ByteListTranscoder.quoteStop(ByteListTranscoder.java:147)
        3: from org.jruby.dist/org.jruby.util.ByteList.append(ByteList.java:530)
        2: from org.jruby.dist/org.jruby.util.ByteList.append(ByteList.java:546)
        1: from org.jruby.dist/org.jruby.util.ByteList.grow(ByteList.java:1107)
Java::JavaLang::NegativeArraySizeException (-1746927586)

It seems that 2Gb is the limit when the error starts to occur.

The text was updated successfully, but these errors were encountered:

headius · 2020-11-20T20:02:15Z

Not too surprising... this is dumping the json output into one of our ByteList instances, which are based on Java byte array, and on Java the limit for such buffers is 2GB.

The only workaround I can think of at the moment would be to dump to an IO stream, rather than dumping to a >2GB in-memory buffer.

Unfortunately fixing this is a much larger challenge, since the result of JSON.dump is a Ruby String, and String in JRuby is backed by a single ByteList. There have been many other such bug reports that we have had to close as "won't fix" mostly because this is a JVM limitation.

headius · 2020-11-20T20:10:17Z

@enebo @lopex I don't know that we are any closer to a solution on this now than we were ten years ago. One option would be modifying bytelist to either use multiple arrays or a long[] but clearly that will impact a huge amount of code that expects to be able to get a byte[] out. On the other hand, the vast majority of cases will still be under 2GB, so perhaps we could incrementally add support for ranges outside int32 and error for cases that expect to have a real byte[].

jbaiza · 2023-03-31T12:19:00Z

Hello,
We have upgraded our user JRuby version to 9.3.7.0, and the same sample code now produces OutOfMemoryError:

Traceback (most recent call last):
       16: from json.ext.Generator$4.generate(Generator.java:253)
       15: from json.ext.Generator$4.generate(Generator.java:296)
       14: from json.ext.Generator$5.generate(Generator.java:310)
       13: from json.ext.Generator$5.generate(Generator.java:339)
       12: from org.jruby.dist/org.jruby.RubyHash.visitAll(RubyHash.java:2882)
       11: from org.jruby.dist/org.jruby.RubyHash.visitLimited(RubyHash.java:727)
       10: from org.jruby.dist/org.jruby.RubyHash$Visitor.visit(RubyHash.java:705)
        9: from json.ext.Generator$5$1.visit(Generator.java:358)
        8: from json.ext.Generator$6.generate(Generator.java:372)
        7: from json.ext.Generator$6.generate(Generator.java:393)
        6: from json.ext.StringEncoder.encode(StringEncoder.java:52)
        5: from json.ext.ByteListTranscoder.quoteStop(ByteListTranscoder.java:147)
        4: from org.jruby.dist/org.jruby.util.ByteList.append(ByteList.java:547)
        3: from org.jruby.dist/org.jruby.util.ByteList.append(ByteList.java:563)
        2: from org.jruby.dist/org.jruby.util.ByteList.grow(ByteList.java:1125)
        1: from org.jruby.dist/org.jruby.runtime.Helpers.calculateBufferLength(Helpers.java:493)
Java::JavaLang::OutOfMemoryError (Requested array size exceeds VM limit)

and on the latest version 9.4.2.0 the same error with a bit different stack trace formatting:

org.jruby.dist/org.jruby.runtime.Helpers.calculateBufferLength(Helpers.java:492): Requested array size exceeds VM limit (Java::JavaLang::OutOfMemoryError)
	from org.jruby.dist/org.jruby.util.ByteList.grow(ByteList.java:1125)
	from org.jruby.dist/org.jruby.util.ByteList.append(ByteList.java:563)
	from org.jruby.dist/org.jruby.util.ByteList.append(ByteList.java:547)
	from json.ext.ByteListTranscoder.quoteStop(ByteListTranscoder.java:147)
	from json.ext.StringEncoder.encode(StringEncoder.java:52)
	from json.ext.Generator$6.generate(Generator.java:393)
	from json.ext.Generator$6.generate(Generator.java:372)
	from json.ext.Generator$5$1.visit(Generator.java:358)
	from org.jruby.dist/org.jruby.RubyHash$Visitor.visit(RubyHash.java:715)
	from org.jruby.dist/org.jruby.RubyHash.visitLimited(RubyHash.java:759)
	from org.jruby.dist/org.jruby.RubyHash.visitAll(RubyHash.java:2982)
	from json.ext.Generator$5.generate(Generator.java:339)
	from json.ext.Generator$5.generate(Generator.java:310)
	from json.ext.Generator$4.generate(Generator.java:296)
	from json.ext.Generator$4.generate(Generator.java:253)
	from json.ext.Generator$Handler.generateNew(Generator.java:175)
	... 180 levels...

Java has given memory larger than 2 Gb - export JAVA_OPTS="-Xms256m -Xmx10048m"

headius · 2023-03-31T15:11:15Z

This remains a Java limitation. The buffer into which your large hash is being dumped eventually grows to be larger than 2GB, which is the limit of a Java array. Since we only have one implementation of Ruby's String, and that implementation uses a Java byte[], we cannot grow a string any larger than the 2GB limit.

The solutions to this in the Java world are to use multiple arrays or to use a native block of memory via a native ByteBuffer. I can see two paths forward for fixing this with ByteBuffer:

We can implement either a special RubyString or ByteList that uses a ByteBuffer rather than a byte[] as its storage. We have discussed this possibility in the past as a way to improve I/O performance and memory usage since an entire file could be memory-mapped rather than copied into the JVM heap. But implementing this is no small task.
We may be able to coax the json library to dump to a custom data type written in Ruby that maintains a native ByteBuffer and implements key methods of String. If so, we would have an option for cases that require more than 2GB of String data, and a possible path forward to enhancing or replacing our existing byte[] String with a more flexible implementation.

I'm going to poke around the json library and see if the latter option might work in the short term.

headius · 2023-03-31T16:34:27Z

Unfortunately the json library is not compatible with the approach I outlined. In all three implementations of the generator–the pure-Ruby version, the C version, and the Java version–all json is written first to a String, and then that String is returned or written to an IO. There's no streaming of json data into an abstract "write" or "append" interface, so there's no way to trick it into using a different type of buffer.

I was also mistaken when I said ByteBuffer could be used to work around this. ByteBuffers can only be constructed with a size specified in a Java int, effectively limiting it to 2GB. So that leaves us to use a native buffer in some other way, such as through Ruby FFI, Java FFI libraries like jnr-ffi, or the new OpenJDK project Panama's support for native memory buffers.

So the project is pretty big but also could be pretty valuable.

For IO-heavy json use cases, writing to a String buffer is obviously not going to be the most efficient option; there would be value in enhancing json to write not to a String but to any String-like or IO-like object provided by the caller. That would in turn allow us to pass it a native memory wrapper and avoid the 2GB byte[] limit.

The wrapper itself could be implemented today with FFI, but for better efficiency the JIT enhancements that come with project Panama make it a more attractive target. And Panama is only available in a preview form as of JDK 19 having incubated in JDK 17 and 18. I will be presenting on this topic (in part) next week so I'm looking into the possibilities right now.

Json.dump allows you to pass an IO to which the dump output will be sent, but it still buffers the entire output in memory before sending it to the given IO. This leads to issues on JRuby like jruby/jruby#6265 when it tries to create a byte[] that exceeds the maximum size of a signed int (JVM's array size limit). This commit plumbs the IO all the way through the generation logic so that it can be written to directly without filling a temporary memory buffer first. This allow JRuby to dump object graphs that would normally produce more content than the JVM can hold in a single array, providing a workaround for jruby/jruby#6265. It is unfortunately a bit slow to dump directly to IO due to the many small writes that all acquire locks and participate in the IO encoding subsystem. A more direct path that can skip some of these pieces could be more competitive with the in-memory version, but functionally it expands the size of graphs that cana be dumped when using JRuby. See flori#54

Json.dump allows you to pass an IO to which the dump output will be sent, but it still buffers the entire output in memory before sending it to the given IO. This leads to issues on JRuby like jruby/jruby#6265 when it tries to create a byte[] that exceeds the maximum size of a signed int (JVM's array size limit). This commit plumbs the IO all the way through the generation logic so that it can be written to directly without filling a temporary memory buffer first. This allow JRuby to dump object graphs that would normally produce more content than the JVM can hold in a single array, providing a workaround for jruby/jruby#6265. It is unfortunately a bit slow to dump directly to IO due to the many small writes that all acquire locks and participate in the IO encoding subsystem. A more direct path that can skip some of these pieces could be more competitive with the in-memory version, but functionally it expands the size of graphs that cana be dumped when using JRuby. See flori#524

headius · 2023-08-15T00:34:44Z

See flori/json#524 for a proof-of-concept streaming dump implementation. This is likely the closest we can get in the near term to defeating the JVM array-size limit, but I could use some help cleaning it up and getting it shipped.

headius mentioned this issue Mar 31, 2023

Streaming enhancements for dumping flori/json#524

Open

headius mentioned this issue Aug 15, 2023

When dumping to IO, dump directly flori/json#538

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception NegativeArraySizeException during JSON.dump of large Hash #6265

Exception NegativeArraySizeException during JSON.dump of large Hash #6265

jbaiza commented Jun 4, 2020 •

edited

headius commented Nov 20, 2020

headius commented Nov 20, 2020

jbaiza commented Mar 31, 2023

headius commented Mar 31, 2023

headius commented Mar 31, 2023

headius commented Aug 15, 2023

Exception NegativeArraySizeException during JSON.dump of large Hash #6265

Exception NegativeArraySizeException during JSON.dump of large Hash #6265

Comments

jbaiza commented Jun 4, 2020 • edited

headius commented Nov 20, 2020

headius commented Nov 20, 2020

jbaiza commented Mar 31, 2023

headius commented Mar 31, 2023

headius commented Mar 31, 2023

headius commented Aug 15, 2023

jbaiza commented Jun 4, 2020 •

edited