Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpClient::Session#read_body_chunked timeout logic unoptimal? #355

Open
kikonen opened this issue Jan 21, 2017 · 3 comments
Open

HttpClient::Session#read_body_chunked timeout logic unoptimal? #355

kikonen opened this issue Jan 21, 2017 · 3 comments
Labels

Comments

@kikonen
Copy link

kikonen commented Jan 21, 2017

I've been trying to understand why there is 80...100+ "Module::Timeout>#timeout" threads appearing while there is only some 7 actual REST API requests using HttpClient. After some digging in, I've started to ponder if read_body_chunked logic triggers those threads. For me "::Timeout#timeout" logic looks a bit inefficient, since it seems to trigger new thread (and destroy it in the end) on every invocation.

    RS = "\r\n"
    def read_body_chunked(&block)
      buf = empty_bin_str
      while true
       ::Timeout.timeout(@receive_timeout, ReceiveTimeoutError) do
          len = @socket.gets(RS)
          if len.nil? # EOF
            close
            return
          end
          @chunk_length = len.hex
          if @chunk_length == 0
            @content_length = 0
            @socket.gets(RS)
            return
          end
          @socket.read(@chunk_length, buf)
          @socket.read(2)
        end
        unless buf.empty?
          yield buf
        end
      end
    end

According to ruby-prof

1.76% (79.97%) HTTPClient#request [1 calls, 7 total]
1.74% (98.89%) HTTPClient#do_request [1 calls, 7 total]
1.73% (99.60%) HTTPClient#protect_keep_alive_disconnected [1 calls, 7 total]
1.73% (99.92%) HTTPClient#do_get_block [1 calls, 7 total]
1.64% (94.51%) HTTPClient::Session#get_body [1 calls, 7 total]
1.63% (99.95%) HTTPClient::Session#read_body_chunked [1 calls, 2 total]
1.60% (97.74%) <Module::Timeout>#timeout [59 calls, 85 total]
1.40% (87.74%) <Class::Thread>#start [59 calls, 85 total]

There is around 8 timeout threads started per rest api request.

@kikonen
Copy link
Author

kikonen commented Jan 21, 2017

Tried out experimental patch, which in my use case works fine (since "total time of whole request including transfer time" is good enough factor for in my use specific case; in general processing time dominates in this case, transfer time over network is not relevant)

class HTTPClient
  class Session
    def read_body_chunked(&block)
      ::Timeout.timeout(@receive_timeout, ReceiveTimeoutError) do
        buf = empty_bin_str
        while true
          len = @socket.gets(RS)
          if len.nil? # EOF
            close
            break
          end
          @chunk_length = len.hex
          if @chunk_length == 0
            @content_length = 0
            @socket.gets(RS)
            break
          end
          @socket.read(@chunk_length, buf)
          @socket.read(2)

          unless buf.empty?
            yield buf
          end
        end
      end
    end
  end
end

After this httpclient logic is not consuming enough time to appear in ruby-prof profiling results, and amount of Timeout#timeout threads is decreased significantly.

@nahi
Copy link
Owner

nahi commented Apr 16, 2017

@kikonen Thanks for the report. It's interesting. Surely Timeout::timeout creates Thread for each timeout call in Ruby, and that's the reason why httpclient had custom Timeout: https://github.com/nahi/httpclient/blob/master/lib/httpclient/timeout.rb
But after 1.9 the performance of Thread creation became faster so I stopped using it: https://github.com/nahi/httpclient/blob/master/lib/httpclient/timeout.rb#L26

What version of Ruby are you using? Possibly Timeout.timeout became slower these days.

Re: your proposed patch, it should improve performance by removing Thread.timeout calls. But it also changes the semantics of receive_timeout because current receive_timeout is for time of each read operation, not the accumulated time of read operations. If accumulated time works for your case, you can do the same thing manually by the following code.

client.receive_timeout = nil # 60 by default, nil means no timeout == no Timeout thread
Timeout.timeout(60) do
  client.get(...)
end

@nahi nahi added the Feedback label Apr 16, 2017
@kikonen
Copy link
Author

kikonen commented Apr 18, 2017

In my case access is into internal private REST API, and if there's access problem then it most certainly occurs in initial connection. Also logic (in my case) works so that there's anyway separate timeout, which goes over whole operation (i.e. proxy servers between will timeout XHR call done from web browser side anyway after certain timeout period).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants