streaming GET performance issue #129

Closed
johnae opened this Issue Jun 29, 2012 · 10 comments

Comments

Projects
None yet
2 participants
@johnae

johnae commented Jun 29, 2012

I think I just came upon a performance issue with excon and streaming large responses. It seems that excon is also quite severely affected by running within a rails app and doing streaming GET.

These are a bit unscientific but here's a comparison (excon 0.14.2 used):

Outside rails app, streaming 132 MB file:

excon: 14.394922
curb: 8.138243
net/http took: 9.096502

looks like in above case excon streams at around 9-10 MB/s while curb streams at around 16 MB/s and net/http at around 14.5 MB/s. What's even more interesting is if I add require 'activesupport/all':

excon took: 16.452423
curb took: 8.182455
net/http took: 9.156512

with activesupport required excon performs even worse while the others stay at basically the same speed. Trying this out inside a rails app the transfer speed for excon drops even more to around 2.4 MB/s while the others stay at high speeds.

Here's how I did above testing:

require 'active_support/all' ## this affects excon quite a bit but even without it excon is a lot slower than the others
require 'excon'
require 'net/http'
require 'fileutils'
require 'curb'
out = '/tmp/testdl'
url = 'http://some.download.url/file'

FileUtils.rm_f out

def time(msg)
    ts = Time.now
    yield
    puts "#{msg} took: #{Time.now-ts}"
end

time "excon" do
    f = File.open(out, 'wb')
    streamer = lambda do |chunk, remaining_bytes, total_bytes|
      f.write chunk
    end
    response = Excon.get(url, response_block: streamer)
    f.close
end

FileUtils.rm_f out

time "curb" do
  easy = Curl::Easy.new
  easy.url = url
  File.open(out, 'wb') do|f|
    easy.on_body {|data| f << data; data.size }   
    easy.perform
  end
end

FileUtils.rm_f out

time "net/http" do
  Net::HTTP.start("some.download.url") do |http|
    begin
      file = open(out, 'wb')
      http.request_get('/file') do |response|
        response.read_body do |segment|
          file.write(segment)
        end
      end
    ensure
      file.close
    end
  end
end

Obviously this needs to be tested over a fast network.

@johnae

This comment has been minimized.

Show comment Hide comment
@johnae

johnae Jun 30, 2012

Interestingly, trying the same in jruby 1.6.7 yields different results, couldn't get curb working there:

excon took: 7.496
net/http took: 7.56

jruby seems to be a lot faster regardless of library used.

By the way, the first tests were done using 1.9.3-p125, trying the same on 1.9.3-p0 yields:

excon took: 11.89969
curb took: 8.172077
net/http took: 8.792199

so excon on MRI seem to be a lot slower than other http libraries when doing streaming GET. On jruby on the other hand both excon and net/http yield about the same results and in general is much faster overall.

The tests were done on a local network (1GB ethernet). Server is Apache2 and ceph/radosgw is serving those files via fastcgi. Initially I came upon this issue in Fog. Ceph/RadosGW is an S3-compatible storage solution - http://ceph.com. RadosGW works great otherwise together with Fog.

johnae commented Jun 30, 2012

Interestingly, trying the same in jruby 1.6.7 yields different results, couldn't get curb working there:

excon took: 7.496
net/http took: 7.56

jruby seems to be a lot faster regardless of library used.

By the way, the first tests were done using 1.9.3-p125, trying the same on 1.9.3-p0 yields:

excon took: 11.89969
curb took: 8.172077
net/http took: 8.792199

so excon on MRI seem to be a lot slower than other http libraries when doing streaming GET. On jruby on the other hand both excon and net/http yield about the same results and in general is much faster overall.

The tests were done on a local network (1GB ethernet). Server is Apache2 and ceph/radosgw is serving those files via fastcgi. Initially I came upon this issue in Fog. Ceph/RadosGW is an S3-compatible storage solution - http://ceph.com. RadosGW works great otherwise together with Fog.

@johnae

This comment has been minimized.

Show comment Hide comment
@johnae

johnae Jun 30, 2012

Also, here's jruby with activesupport/all required:

excon took: 7.529
net/http took: 7.665

i.e not really affected.

Here's 1.9.3-p0 with active support required:

excon took: 14.852342
curb took: 8.249867
net/http took: 9.112159

so excon is even worse with active support in the mix in 1.9.3-p0 (as with 1.9.3-p125). As you can see, it now takes close to 15 seconds to complete, compared to almost 12 seconds before (above comment).

johnae commented Jun 30, 2012

Also, here's jruby with activesupport/all required:

excon took: 7.529
net/http took: 7.665

i.e not really affected.

Here's 1.9.3-p0 with active support required:

excon took: 14.852342
curb took: 8.249867
net/http took: 9.112159

so excon is even worse with active support in the mix in 1.9.3-p0 (as with 1.9.3-p125). As you can see, it now takes close to 15 seconds to complete, compared to almost 12 seconds before (above comment).

@johnae

This comment has been minimized.

Show comment Hide comment
@johnae

johnae Jul 2, 2012

I just tested 0.6.6 which doesn't exhibit this performance issue. I suspect it has to do with the switch from socket.read to socket.read_nonblock. Is there any way to actually fix this?

johnae commented Jul 2, 2012

I just tested 0.6.6 which doesn't exhibit this performance issue. I suspect it has to do with the switch from socket.read to socket.read_nonblock. Is there any way to actually fix this?

@geemus

This comment has been minimized.

Show comment Hide comment
@geemus

geemus Jul 6, 2012

Contributor

@johnae - that does seems likely to be involved. I believe the jruby version uses blocking writes as well. A lot of it may have to do with just how much blocking ends up occuring, as the number of times through that loop increasing would slow performance. You might try upping the chunk size by setting Excon::CHUNK_SIZE to something larger than 1048576 (1mb), though I'm not sure off hand how much this will help. Also not really sure why ActiveSupport would be impactful. I'm not sure how much we can do about this or not...

Contributor

geemus commented Jul 6, 2012

@johnae - that does seems likely to be involved. I believe the jruby version uses blocking writes as well. A lot of it may have to do with just how much blocking ends up occuring, as the number of times through that loop increasing would slow performance. You might try upping the chunk size by setting Excon::CHUNK_SIZE to something larger than 1048576 (1mb), though I'm not sure off hand how much this will help. Also not really sure why ActiveSupport would be impactful. I'm not sure how much we can do about this or not...

@johnae

This comment has been minimized.

Show comment Hide comment
@johnae

johnae Jul 6, 2012

Well, the strangest thing is that decreasing the Excon::CHUNK_SIZE seem to be the way to go and by decreasing I mean by a lot, something like 1.megabyte/256. That gets the speed up to around where the other libs are. Unfortunately this is only "standalone", eg. outside of a rails app. It does increase the speed inside a rails app from 2 MB/s to maybe 6-7 MB/s so it is an improvement but compared to curb it's very slow (curbs at 15-16 MB/s in the rails app as is net/http)... I'd really like a fix for it but I'm not sure how to proceed either.
What's the reason for using read/write_nonblock instead of just read/write? I know excon used to use only read/write up until 0.6.x I think.

johnae commented Jul 6, 2012

Well, the strangest thing is that decreasing the Excon::CHUNK_SIZE seem to be the way to go and by decreasing I mean by a lot, something like 1.megabyte/256. That gets the speed up to around where the other libs are. Unfortunately this is only "standalone", eg. outside of a rails app. It does increase the speed inside a rails app from 2 MB/s to maybe 6-7 MB/s so it is an improvement but compared to curb it's very slow (curbs at 15-16 MB/s in the rails app as is net/http)... I'd really like a fix for it but I'm not sure how to proceed either.
What's the reason for using read/write_nonblock instead of just read/write? I know excon used to use only read/write up until 0.6.x I think.

@geemus

This comment has been minimized.

Show comment Hide comment
@geemus

geemus Jul 9, 2012

Contributor

I was happy enough with the blocking version, it was changed out to better support usage in multi-threaded environments (so that other threads could run while excon waited, instead of having that thread hold execution until it finished). I hadn't really done extensive benchmarking around it, perhaps the streaming request CHUNK_SIZE should be smaller or perhaps it should use a smaller chunk size in general. I believe I chose the current chunk size prior to changing to nonblock so I may not have re-benchmarked things afterward. I'll try to find some time in the near future to dig a bit deeper.

Contributor

geemus commented Jul 9, 2012

I was happy enough with the blocking version, it was changed out to better support usage in multi-threaded environments (so that other threads could run while excon waited, instead of having that thread hold execution until it finished). I hadn't really done extensive benchmarking around it, perhaps the streaming request CHUNK_SIZE should be smaller or perhaps it should use a smaller chunk size in general. I believe I chose the current chunk size prior to changing to nonblock so I may not have re-benchmarked things afterward. I'll try to find some time in the near future to dig a bit deeper.

@johnae

This comment has been minimized.

Show comment Hide comment
@johnae

johnae Jul 11, 2012

Well I couldn't stand this slowness so I monkey-patched Excon - I basically reused the code from Excon::SSLSocket but applied it to Excon::Socket. Now I can turn on/off non_blocking_io - perhaps it's nice enough to have to include in Excon? This is the patch:

module Excon

  @@disabled_non_blocking_io = false

  class << self
    def disable_non_blocking_io
      return if @@disabled_non_blocking_io
      @@disabled_non_blocking_io = true
      Socket.class_eval {

        undef_method :connect
        def connect
          @socket = TCPSocket.new(@params[:host], @params[:port])
        end

        undef_method :read
        def_delegators(:@socket, :read, :read)

        undef_method :write
        def_delegators(:@socket, :write, :write)

      }
    end
  end

end    

Without the patch, downloading a 328 MB-file takes 50 seconds with excon (1GB ethernet). With the patch same file takes 20 seconds to download so it's quite an improvement in throughput.

In MB/s without the patch, excon gives you on average around 6.5 MB/s.
With the patch excon gives you around 16.5 MB/s.

This is on my network. On an even faster/better network you'll probably see even more gains.

johnae commented Jul 11, 2012

Well I couldn't stand this slowness so I monkey-patched Excon - I basically reused the code from Excon::SSLSocket but applied it to Excon::Socket. Now I can turn on/off non_blocking_io - perhaps it's nice enough to have to include in Excon? This is the patch:

module Excon

  @@disabled_non_blocking_io = false

  class << self
    def disable_non_blocking_io
      return if @@disabled_non_blocking_io
      @@disabled_non_blocking_io = true
      Socket.class_eval {

        undef_method :connect
        def connect
          @socket = TCPSocket.new(@params[:host], @params[:port])
        end

        undef_method :read
        def_delegators(:@socket, :read, :read)

        undef_method :write
        def_delegators(:@socket, :write, :write)

      }
    end
  end

end    

Without the patch, downloading a 328 MB-file takes 50 seconds with excon (1GB ethernet). With the patch same file takes 20 seconds to download so it's quite an improvement in throughput.

In MB/s without the patch, excon gives you on average around 6.5 MB/s.
With the patch excon gives you around 16.5 MB/s.

This is on my network. On an even faster/better network you'll probably see even more gains.

@geemus

This comment has been minimized.

Show comment Hide comment
@geemus

geemus Jul 11, 2012

Contributor

@johnae - thanks for the detailed info and your fix. Makes me wonder if maybe blocking should be the default (since most people don't care and it is usually faster) and that nonblock could just be something you opt-in to. I'll try to dig in and integrate some kind of solution in the not too distant future.

Contributor

geemus commented Jul 11, 2012

@johnae - thanks for the detailed info and your fix. Makes me wonder if maybe blocking should be the default (since most people don't care and it is usually faster) and that nonblock could just be something you opt-in to. I'll try to dig in and integrate some kind of solution in the not too distant future.

@johnae

This comment has been minimized.

Show comment Hide comment
@johnae

johnae Jul 11, 2012

@geemus - thank you. I'll be using my patch for a while and keep an eye on Fog/Excon.

johnae commented Jul 11, 2012

@geemus - thank you. I'll be using my patch for a while and keep an eye on Fog/Excon.

@geemus geemus closed this in 920dd0d Aug 14, 2012

@geemus

This comment has been minimized.

Show comment Hide comment
@geemus

geemus Aug 14, 2012

Contributor

That allows setting :nonblock => false on a connection or request to use blocking (which is faster). nonblock is still default though, at least for now.

Contributor

geemus commented Aug 14, 2012

That allows setting :nonblock => false on a connection or request to use blocking (which is faster). nonblock is still default though, at least for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment