connect_timeout and inactivity_timeout not working with eventmachine 1.0.1 #222

Closed
kenn opened this Issue Mar 8, 2013 · 11 comments

Projects

None yet

3 participants

@kenn
kenn commented Mar 8, 2013

Here's the source to reproduce the problem:

# gem 'eventmachine', '1.0.0'
require 'em-http-request'

puts EM::VERSION

EM.run do
  start_at = Time.now
  url = 'http://bot.lingr.com/delay/6'
  http = EventMachine::HttpRequest.new(url, :connect_timeout => 2, :inactivity_timeout => 2).head
  http.callback do
    puts "success! elapsed #{Time.now - start_at}"
    EM.stop
  end
  http.errback do
    puts "error! elapsed #{Time.now - start_at}"
    EM.stop
  end
end

The server returns a response in 6 seconds, while the timeout values are set an earlier point.

When I run this, I get

1.0.1
success! elapsed 6.280314

As you can see, it's not timing out and wait (indefinitely) until we get the response.

However, when I activate the gem pinning to 1.0.0, then run

1.0.0
error! elapsed 4.003471

So it times out after 4 seconds as expected.

Probably it's a regression in EventMachine 1.0.1 but I wasn't able to find out what change is causing this. Any advice on fix / workaround?

@igrigorik
Owner

Nope, needs to fixed in EM. /cc @tmm1 ... Short of adding extra timers to handle this yourself, this is not something em-http should address.

@tmm1
Contributor
tmm1 commented Mar 8, 2013

The only related commit in 1.0.1 is eventmachine/eventmachine@3c39365 (eventmachine/eventmachine#374). Can you try with and without that commit?

@kenn
kenn commented Mar 8, 2013

ok, I tried and confirmed that reverting eventmachine/eventmachine@3c39365 fixed the problem. So, it seems like a regression indeed.

@tmm1
Contributor
tmm1 commented Mar 8, 2013

Interesting. If you use :connect_timeout => 2, :inactivity_timeout => 1, does it time out in 3 seconds?

@kenn
kenn commented Mar 8, 2013

Here's the result matrix - I find it's quite interesting indeed.

Note that I built "fixed" EM as 1.0.1.1, and used http://bot.lingr.com/delay/8 as the endpoint so that the server returns a response in 8 seconds.

connect_timeout inactivity_timeout    1.0.0     1.0.1    1.0.1.1
              1                  1    2.019553  success  2.040749
              1                  2    3.018771  success  3.003296
              1                  3    4.002432  success  4.003432
              1                  4    5.019328  success  5.017295
              2                  1    2.021929  success  2.003186
              2                  2    4.020711  success  4.003911
              2                  3    5.023549  success  5.020934
              2                  4    6.018697  success  6.271681
              3                  1    2.003149  success  2.022989
              3                  2    4.020217  success  4.003389
              3                  3    6.003739  success  6.019589
              3                  4    7.002843  success  7.003071
@tmm1
Contributor
tmm1 commented Mar 8, 2013

Ok, this is clearly broken in 1.0.1. There's even a failing test on master. I'll try to come up with a solution that fixes the test and the original issue in eventmachine/eventmachine#374

See also #149

@tmm1
Contributor
tmm1 commented Mar 8, 2013

I believe this patch will fix all the timeout weirdness. Can you try your combination test again?

diff --git a/ext/ed.cpp b/ext/ed.cpp
index 9ebd4b1..48f3c5c 100644
--- a/ext/ed.cpp
+++ b/ext/ed.cpp
@@ -460,8 +460,7 @@ ConnectionDescriptor::SetConnectPending
 void ConnectionDescriptor::SetConnectPending(bool f)
 {
        bConnectPending = f;
-       if (f == false && NextHeartbeat)
-               MyEventMachine->ClearHeartbeat(NextHeartbeat, this);
+       MyEventMachine->QueueHeartbeat(this);
        _UpdateEvents();
 }
@tmm1 tmm1 added a commit to eventmachine/eventmachine that referenced this issue Mar 8, 2013
@tmm1 tmm1 fix long-standing pending_connect/inactivity timeout issue
this reverts a change made for #374, which inadvertently disabled inactivity
timeouts altogether.

instead we always QueueHeartbeat when bConnectPending changes, so
GetNextHeartbeat can calculate a timeout based on the new connection
state. i confirmed this fixes the broken test on master, and the original
"20s delay" bug (using the repro in #393).

this should also address the timeout stacking issue reported in
igrigorik/em-http-request#149, as well as the original report in
igrigorik/em-http-request#222
27fdd5b
@kenn
kenn commented Mar 8, 2013

@tmm1 Good news - it worked! Also, the most accurate timeout ever - results are almost exactly the same as inactivity_timeout. 👍

connect_timeout inactivity_timeout    1.0.0     1.0.1    1.0.1.1   1.0.1.2
              1                  1    2.019553  success  2.040749  1.134821
              1                  2    3.018771  success  3.003296  2.100689
              1                  3    4.002432  success  4.003432  3.098528
              1                  4    5.019328  success  5.017295  4.095405
              2                  1    2.021929  success  2.003186  1.106963
              2                  2    4.020711  success  4.003911  2.130382
              2                  3    5.023549  success  5.020934  3.09778
              2                  4    6.018697  success  6.271681  4.099069
              3                  1    2.003149  success  2.022989  1.122953
              3                  2    4.020217  success  4.003389  2.097198
              3                  3    6.003739  success  6.019589  3.096631
              3                  4    7.002843  success  7.003071  4.098878
@tmm1
Contributor
tmm1 commented Mar 8, 2013

Excellent. I've released EM v1.0.2.

@igrigorik
Owner

Awesome - great job guys! I take it, we can close this bug then? :)

@kenn
kenn commented Mar 8, 2013

@tmm1 @igrigorik I've just deployed EM v1.0.3 to production and confirmed the problems is fixed for real. Thank you!

@kenn kenn closed this Mar 8, 2013
@jperkin jperkin pushed a commit to joyent/pkgsrc that referenced this issue Dec 9, 2013
taca Update ruby-eventmachine to 1.0.3.
# Changelog

## 1.0.x
* EM.system was broken in 1.0.2 release [#413]

## 1.0.2 (March 8, 2013)
* binary win32 gems now include fastfilereader shim [#222]
* fix long-standing connection timeout issues [27fdd5b,
  igrigorik/em-http-request#222]
* http and line protocol cleanups [#193, #151]
* reactor return value cleanup [#225]
* fix double require from gemspec [#284]
* fix smtp server reset behavior [#351]
* fix EM.system argument handling [#322]
* ruby 1.9 compat in smtp server and stomp protocols [#349, #315]
* fix pause from post_init [#380]

## 1.0.1 (February 27, 2013)
* use rb_wait_for_single_fd() on ruby 2.0 to fix rb_thread_select() deprecation
* fix epoll/kqueue mode in ruby 2.0 by removing calls to rb_enable_interrupt()
  [#248, #389]
* fix memory leak when verifying ssl cerificates [#403]
* fix initial connection delay [#393, #374]
4fdca09
@jperkin jperkin pushed a commit to joyent/pkgsrc that referenced this issue Mar 14, 2014
taca Update ruby-eventmachine to 1.0.3.
# Changelog

## 1.0.x
* EM.system was broken in 1.0.2 release [#413]

## 1.0.2 (March 8, 2013)
* binary win32 gems now include fastfilereader shim [#222]
* fix long-standing connection timeout issues [27fdd5b,
  igrigorik/em-http-request#222]
* http and line protocol cleanups [#193, #151]
* reactor return value cleanup [#225]
* fix double require from gemspec [#284]
* fix smtp server reset behavior [#351]
* fix EM.system argument handling [#322]
* ruby 1.9 compat in smtp server and stomp protocols [#349, #315]
* fix pause from post_init [#380]

## 1.0.1 (February 27, 2013)
* use rb_wait_for_single_fd() on ruby 2.0 to fix rb_thread_select() deprecation
* fix epoll/kqueue mode in ruby 2.0 by removing calls to rb_enable_interrupt()
  [#248, #389]
* fix memory leak when verifying ssl cerificates [#403]
* fix initial connection delay [#393, #374]
f62d784
@jsonn jsonn pushed a commit to jsonn/pkgsrc that referenced this issue Oct 11, 2014
taca Update ruby-eventmachine to 1.0.3.
# Changelog

## 1.0.x
* EM.system was broken in 1.0.2 release [#413]

## 1.0.2 (March 8, 2013)
* binary win32 gems now include fastfilereader shim [#222]
* fix long-standing connection timeout issues [27fdd5b,
  igrigorik/em-http-request#222]
* http and line protocol cleanups [#193, #151]
* reactor return value cleanup [#225]
* fix double require from gemspec [#284]
* fix smtp server reset behavior [#351]
* fix EM.system argument handling [#322]
* ruby 1.9 compat in smtp server and stomp protocols [#349, #315]
* fix pause from post_init [#380]

## 1.0.1 (February 27, 2013)
* use rb_wait_for_single_fd() on ruby 2.0 to fix rb_thread_select() deprecation
* fix epoll/kqueue mode in ruby 2.0 by removing calls to rb_enable_interrupt()
  [#248, #389]
* fix memory leak when verifying ssl cerificates [#403]
* fix initial connection delay [#393, #374]
2bf572c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment