New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO.copy_stream doesn't seem to limit the chunk size #4701

Closed
janko-m opened this Issue Jul 2, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@janko-m

janko-m commented Jul 2, 2017

Environment

$ jruby -v
jruby 9.1.12.0 (2.3.3) 2017-06-15 33c6439 Java HotSpot(TM) 64-Bit Server VM 25.40-b25 on 1.8.0_40-b27 +jit [darwin-x86_64]
$ uname -a
Darwin Jankos-MacBook-Pro-2.local 16.6.0 Darwin Kernel Version 16.6.0: Fri Apr 14 16:21:16 PDT 2017; root:xnu-3789.60.24~6/RELEASE_X86_64 x86_64

Expected Behavior

When I use IO.copy_stream on MRI, it limits the chunk size to 16KB.

require "stringio"

class FakeIO
  def write(data)
    p(data.bytesize)
  end
end

io = StringIO.new(100*1024)

IO.copy_stream(io, FakeIO.new)
16384
16384
16384
16384
16384
16384
4096

Actual Behavior

On JRuby IO.copy_stream appears not to limit the chunk size, it just always reads the entire content at once, because the output of the above script is

102400

I tried with a StringIO up to 10MB of size, and the behaviour is still the same, all 10MB of content is read at once. The behaviour is the same if I change the source IO into a File object.

janko-m added a commit to janko-m/http that referenced this issue Jul 2, 2017

Fix tests on JRuby
IO.copy_stream on JRuby doesn't seem to limit the chunk size, it always
reads the entire content at once.

jruby/jruby#4701

Since we don't care about what is the size of yielded chunks, we just
test whether the yielded chunks sum up to the entire content.

While here we also bring back HTTP::Request::Body#each returning an
Enumerator, so that tests can be simpler.

ixti added a commit to httprb/http that referenced this issue Aug 18, 2017

Fix tests on JRuby
IO.copy_stream on JRuby doesn't seem to limit the chunk size, it always
reads the entire content at once.

jruby/jruby#4701

Since we don't care about what is the size of yielded chunks, we just
test whether the yielded chunks sum up to the entire content.

While here we also bring back HTTP::Request::Body#each returning an
Enumerator, so that tests can be simpler.
@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Sep 7, 2017

Member

If you're going from a file to a file we use the builtin Java implementation of FileChannel.transfer, which will do whatever's efficient on that platform. For non-file copying, we do it manually via a loop, and this is likely where we're not choosing a reasonable block size. I'll have a look.

Member

headius commented Sep 7, 2017

If you're going from a file to a file we use the builtin Java implementation of FileChannel.transfer, which will do whatever's efficient on that platform. For non-file copying, we do it manually via a loop, and this is likely where we're not choosing a reasonable block size. I'll have a look.

@janko-m

This comment has been minimized.

Show comment
Hide comment
@janko-m

janko-m Nov 13, 2017

This snippet shows that IO.copy_stream in latest JRuby (9.1.14.0) passes nil as the length argument when calling #read on the source IO object, which is the same as not passing the length argument at all and will read the entire content of the IO object into a string.

source      = StringIO.new("foo")
destination = File::NULL

def source.read(*args)
  p args
  super
end

IO.copy_stream(source, destination)
# output on JRuby
[nil]

I think this is problematic because the reason people use IO.copy_stream is because it will copy the contents from source to destination in a memory-efficient way. If JRuby's implementation effectively does destination.write source.read, then for larger files the server could easily run out of memory.

MRI's IO.copy_stream passes both a length argument (of 16KB) and a buffer string to #read, so MRI the output of the previous script is the following:

# output on MRI
[16384, ""]
[16384, "foo"]

To provide background, I'm a maintainer of two libraries which handle file uploads and use IO.copy_stream with non-file IO objects (so JRuby cannot switch to the built-in Java implementation). The second library implements the tus resumable upload protocol, and that one is expected to handle very large file uploads (multiple GB), so I rely on IO.copy_stream for memory-efficient streaming.

janko-m commented Nov 13, 2017

This snippet shows that IO.copy_stream in latest JRuby (9.1.14.0) passes nil as the length argument when calling #read on the source IO object, which is the same as not passing the length argument at all and will read the entire content of the IO object into a string.

source      = StringIO.new("foo")
destination = File::NULL

def source.read(*args)
  p args
  super
end

IO.copy_stream(source, destination)
# output on JRuby
[nil]

I think this is problematic because the reason people use IO.copy_stream is because it will copy the contents from source to destination in a memory-efficient way. If JRuby's implementation effectively does destination.write source.read, then for larger files the server could easily run out of memory.

MRI's IO.copy_stream passes both a length argument (of 16KB) and a buffer string to #read, so MRI the output of the previous script is the following:

# output on MRI
[16384, ""]
[16384, "foo"]

To provide background, I'm a maintainer of two libraries which handle file uploads and use IO.copy_stream with non-file IO objects (so JRuby cannot switch to the built-in Java implementation). The second library implements the tus resumable upload protocol, and that one is expected to handle very large file uploads (multiple GB), so I rely on IO.copy_stream for memory-efficient streaming.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 14, 2017

Member

Fixed along with #4842.

Member

headius commented Nov 14, 2017

Fixed along with #4842.

@headius headius closed this Nov 14, 2017

@headius headius added this to the JRuby 9.1.15.0 milestone Nov 14, 2017

@janko-m

This comment has been minimized.

Show comment
Hide comment
@janko-m

janko-m Nov 14, 2017

Awesome, thank you!

janko-m commented Nov 14, 2017

Awesome, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment