Skip to content

Streaming enhancements for dumping #524

@headius

Description

@headius

While investigating workarounds for jruby/jruby#6265 I realized that all dumping for e.g. to_json is done first to an in-memory buffer (always a Ruby String) even when given an IO object to which the json should be written. This applies to all three implementations: the pure-Ruby version, the C version, and the Java version.

This could obviously be more efficient if the json appends were writes directly to the given IO, or if it were possible to provide a String-like object that receives the appends. A rework of the generator subsystem would be necessary to pass any provided IO or String-like through the various dump methods.

This would have several benefits:

  • No intermediate String to hold the entirety of the dumped json.
  • No intermediate Strings for components of a dumped collection; Array and Hash currently dump each element or pair to a separate String and then append that String to the result buffer.
  • Reduced allocation, copying, and GC overhead when dumping directly to IO.
  • Potential to provide IO-like or String-like receivers of the dumped json, allowing for a workaround to the Java 2GB array limitation (Exception NegativeArraySizeException during JSON.dump of large Hash jruby/jruby#6265).

I'm hoping to attempt this for at least the Java and Ruby versions of the generator, but I may need help making the same change in the C extension. If others are interested in helping with any of these implementations, it would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions