New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy_stream to SSL seems to read the file into memory #4842

Closed
cfitz opened this Issue Nov 9, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@cfitz

cfitz commented Nov 9, 2017

JRuby 9.1.8, Java 1.8.0, Fedora 25

jruby -v                                                                        
jruby 9.1.8.0 (2.3.1) 2017-03-06 90fc7ab OpenJDK 64-Bit Server VM 25.144-b01 on 1.8.0_144-b01 +jit [linux-x86_64]

uname -a                                                                      
Linux chr.local 4.13.5-200.fc26.x86_64 #1 SMP Thu Oct 5 16:53:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

I'm running a script ( included below ) that is streaming a big JSON file ( ~300mb ) to a Solr server that's using SSL authentication. Running on JRuby 9.1.8 ( all default flags ) , it pretty much immediately runs out of heap. If I bump the heap up to 2g, it will run a bit then run out of heap. A dump from VisualVM indicates ~98% of the memory is bytes[].
So, I think the IO is being put into memory when being copied to a SSL socket, I think..

Running on MRI and JRuby 1.7.9 do not have these issues, as it looks like the IO is being streamed out as expected.
Also, running the script on JRuby 9.1 withouth SSL runs without any memory issues.

Here's the script:

require 'openssl'
require 'net/http'

PEM_FILE = ENV["CLIENT_CERT"]
SOLR_HOST = ENV["SOLR_HOST"]

class SSLSolr

  DEFAULT_OPTIONS = {
    use_ssl: true,
    verify_mode: OpenSSL::SSL::VERIFY_NONE,
    keep_alive_timeout: 30,
    cert: OpenSSL::X509::Certificate.new(IO.read(PEM_FILE)),
    key:  OpenSSL::PKey::RSA.new(IO.read(PEM_FILE)),
  }


  def initialize(http = nil)
    if http
      @http = http
    else
      @http = Net::HTTP.start(SOLR_HOST, 443, DEFAULT_OPTIONS)
    end
  end



  def update()
    bytes = File.open('index_batch_2.json', 'rb').bytes.count.to_s
    stream = File.open('index_batch_2.json', 'rb')
    puts "starting request..." 
    request = Net::HTTP::Post.new "/solr/archivesspace/update" 
    request['Content-Type'] = 'application/json'
    request['Content-Length'] = bytes
    request.body_stream = stream
    response = @http.request request
    puts response.body
  end

end

SSLSolr.new.update

And here's the dump I get when I use jruby -w


starting request...
Error: Your application used more memory than the safety cap of 1024M.
Specify -J-Xmx####M to increase it (#### = cap size in MB).
java.lang.OutOfMemoryError: Java heap space
	at org.jruby.util.ByteList.ensure(ByteList.java:341)
	at org.jruby.RubyString.modify(RubyString.java:950)
	at org.jruby.RubyString.modifyExpand(RubyString.java:960)
	at org.jruby.util.io.EncodingUtils.setStrBuf(EncodingUtils.java:1116)
	at org.jruby.util.io.OpenFile.fread(OpenFile.java:1755)
	at org.jruby.util.io.OpenFile.readAll(OpenFile.java:1669)
	at org.jruby.RubyIO.read(RubyIO.java:2992)
	at org.jruby.RubyIO.read(RubyIO.java:2976)
	at org.jruby.RubyIO.copy_stream(RubyIO.java:4085)
	at org.jruby.RubyIO$INVOKER$s$0$2$copy_stream.call(RubyIO$INVOKER$s$0$2$copy_stream.gen)
	at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodN.call(JavaMethod.java:724)
	at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:208)
	at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:358)
	at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:195)
	at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:323)
	at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:73)
	at org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:109)
	at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:95)
	at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:298)
	at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:79)
	at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:83)
	at org.jruby.ir.instructions.CallBase.interpret(CallBase.java:428)
	at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:355)
	at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:73)
	at org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:109)
	at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:95)
	at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:298)
	at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:79)
	at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:83)
	at org.jruby.ir.instructions.CallBase.interpret(CallBase.java:428)
	at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:355)
	at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:73)

@cfitz cfitz changed the title from copy_stream to SSL seems to read the file to copy_stream to SSL seems to read the file into memory Nov 9, 2017

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 9, 2017

Member

Ok, this must be one of the paths that doesn't chunk the transfer properly. It looks like it's falling into logic used for "fake" IO objects that only have a "read" method.

We do have chunking logic elsewhere that should also be used for this.

Member

headius commented Nov 9, 2017

Ok, this must be one of the paths that doesn't chunk the transfer properly. It looks like it's falling into logic used for "fake" IO objects that only have a "read" method.

We do have chunking logic elsewhere that should also be used for this.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 9, 2017

Member

Oh, a workaround would be for copy_stream to receive a third length argument and chunk it manually, but obviously we don't want folks to have to do that.

Member

headius commented Nov 9, 2017

Oh, a workaround would be for copy_stream to receive a third length argument and chunk it manually, but obviously we don't want folks to have to do that.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 9, 2017

Member

Reproduction below. This will OOM or NegativeArraySizeException depending on memory settings:

fake = Class.new do
  def initialize
    @io = File.open("/dev/zero")
  end
  def read(length = -1)
    if length == -1
      @io.read
    else
      @io.read(length)
    end
  end
end.new

IO.copy_stream(fake, "/dev/null")
Member

headius commented Nov 9, 2017

Reproduction below. This will OOM or NegativeArraySizeException depending on memory settings:

fake = Class.new do
  def initialize
    @io = File.open("/dev/zero")
  end
  def read(length = -1)
    if length == -1
      @io.read
    else
      @io.read(length)
    end
  end
end.new

IO.copy_stream(fake, "/dev/null")

headius added a commit that referenced this issue Nov 9, 2017

Additional tweaks to copy_stream for #4842.
* Use arg2 for write argument in duck-typed logic.
* Don't use a native ByteBuffer for streams we can't tell are
  native.
* Guard ByteBuffer unwrapping in IOChannel
* Update IO excludes.

@headius headius closed this in cd40c31 Nov 9, 2017

@headius headius added this to the JRuby 9.1.15.0 milestone Nov 9, 2017

@cfitz

This comment has been minimized.

Show comment
Hide comment
@cfitz

cfitz Nov 9, 2017

Thanks, much appriciated.
If anyone else runs into this, I'm patching http as a temporary fix:

class Net::HTTPGenericRequest
                                                                                                                                                                                             
  def send_request_with_body_stream(sock, ver, path, f)                                                                                                                                      
    unless content_length() or chunked?                                                                                                                                                      
      raise ArgumentError,                                                                                                                                                                   
          "Content-Length not given and Transfer-Encoding is not `chunked'"                                                                                                                  
    end                                                                                                                                                                                      
    supply_default_content_type                                                                                                                                                              
    write_header sock, ver, path                                                                                                                                                             
    wait_for_continue sock, ver if sock.continue_timeout                                                                                                                                     
    if chunked?                                                                                                                                                                              
      chunker = Chunker.new(sock)                                                                                                                                                            
      IO.copy_stream(f, chunker)                                                                                                                                                             
      chunker.finish                                                                                                                                                                         
    else                                                                                                                                                                                     
      # copy_stream can sendfile() to sock.io unless we use SSL.                                                                                                                             
      # If sock.io is an SSLSocket, copy_stream will hit SSL_write()                                                                                                                         
      if  sock.io.is_a? OpenSSL::SSL::SSLSocket                                                                                                                                              
        IO.copy_stream(f, sock.io, 16 * 1024 * 1024) until f.eof?                                                                                                                            
      else                                                                                                                                                                                   
        IO.copy_stream(f, sock.io)                                                                                                                                                           
      end                                                                                                                                                                                    
    end                                                                                                                                                                                      
  end                                                                                                                                                                                        
end 

cfitz commented Nov 9, 2017

Thanks, much appriciated.
If anyone else runs into this, I'm patching http as a temporary fix:

class Net::HTTPGenericRequest
                                                                                                                                                                                             
  def send_request_with_body_stream(sock, ver, path, f)                                                                                                                                      
    unless content_length() or chunked?                                                                                                                                                      
      raise ArgumentError,                                                                                                                                                                   
          "Content-Length not given and Transfer-Encoding is not `chunked'"                                                                                                                  
    end                                                                                                                                                                                      
    supply_default_content_type                                                                                                                                                              
    write_header sock, ver, path                                                                                                                                                             
    wait_for_continue sock, ver if sock.continue_timeout                                                                                                                                     
    if chunked?                                                                                                                                                                              
      chunker = Chunker.new(sock)                                                                                                                                                            
      IO.copy_stream(f, chunker)                                                                                                                                                             
      chunker.finish                                                                                                                                                                         
    else                                                                                                                                                                                     
      # copy_stream can sendfile() to sock.io unless we use SSL.                                                                                                                             
      # If sock.io is an SSLSocket, copy_stream will hit SSL_write()                                                                                                                         
      if  sock.io.is_a? OpenSSL::SSL::SSLSocket                                                                                                                                              
        IO.copy_stream(f, sock.io, 16 * 1024 * 1024) until f.eof?                                                                                                                            
      else                                                                                                                                                                                   
        IO.copy_stream(f, sock.io)                                                                                                                                                           
      end                                                                                                                                                                                    
    end                                                                                                                                                                                      
  end                                                                                                                                                                                        
end 

cfitz added a commit to cfitz/archivesspace that referenced this issue Nov 9, 2017

Fixes JRuby issue with files being loaded into memory on #copy_stream
There's a JRuby issue ( jruby/jruby#4842 )
that causes IOs to be read into memory is using SSL. This causes big
problems if using Solr or the backend with an SSL socket.
This adds a patch to chunk the IO manually, rather then just reading the
whole thing in.

cfitz added a commit to cfitz/archivesspace that referenced this issue Nov 9, 2017

Fixes JRuby issue with files being loaded into memory on #copy_stream
There's a JRuby issue ( jruby/jruby#4842 )
that causes IOs to be read into memory is using SSL. This causes big
problems if using Solr or the backend with an SSL socket.
This adds a patch to chunk the IO manually, rather then just reading the
whole thing in.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment