Permalink
Browse files

Currently the reconnect logic will only retry once if EM.reconnect th…

…rows an exception. EM.reconnect can throw an exception in the case of a failed DNS lookup for example.

It looks as though in eventmachine if an EM.add_periodic_timer throws an exception no further calls to that block of code will be made. Presumably it reschedules itself after executing the callback which doesn't get called when an exception goes off. Arguably thats a bug with eventmachine? maybe?

Here is a simple program to demonstrate that (You'll only get one exception, not one every 5 seconds):

require 'eventmachine'

def throw_exception
    raise "Issues and problems"
end

def puts_running
  puts "Running...."
end

EM.run {
  EM.add_periodic_timer(1) { puts_running }
  EM.add_periodic_timer(5) { throw_exception }

  EM.error_handler do |e|
    puts "Eventmachine problem, #{e}"
    puts ("#{e.backtrace.join("\n")}")
  end
}

------------

It looks as though EM.connect has two ways of handling errors. The first being an exception (e.g. bad dns), the second is via callbacks.

This change simply handles any exception created by EM.reconnect so further retries are attempted.

Also adding a test case for this issue which will simulate a DNS failure to trigger an exception.
  • Loading branch information...
1 parent dae3c1f commit a5eb74548cb4e70e49acc9ad148e494066c6c4e8 @corlettb corlettb committed Jan 7, 2014
Showing with 27 additions and 1 deletion.
  1. +4 −1 lib/nats/client.rb
  2. +23 −0 spec/reconnect_spec.rb
View
@@ -606,7 +606,10 @@ def process_disconnect #:nodoc:
def attempt_reconnect #:nodoc:
process_disconnect and return if (@reconnect_attempts += 1) > @options[:max_reconnect_attempts]
- EM.reconnect(@uri.host, @uri.port, self)
+ begin
+ EM.reconnect(@uri.host, @uri.port, self)
+ rescue
+ end
@reconnect_cb.call unless @reconnect_cb.nil?
end
View
@@ -7,15 +7,21 @@
R_PASS = 'mypassword'
R_TEST_AUTH_SERVER = "nats://#{R_USER}:#{R_PASS}@localhost:9333"
R_TEST_SERVER_PID = '/tmp/nats_reconnect_authorization.pid'
+ E_TEST_SERVER = "nats://localhost:9666"
+ E_TEST_SERVER_PID = '/tmp/nats_reconnect_exception_test.pid'
+
@as = NatsServerControl.new(R_TEST_AUTH_SERVER, R_TEST_SERVER_PID)
@as.start_server
@s = NatsServerControl.new
@s.start_server
+ @es = NatsServerControl.new(E_TEST_SERVER, E_TEST_SERVER_PID)
+ @es.start_server
end
after(:all) do
@s.kill_server
@as.kill_server
+ @es.kill_server
end
it 'should properly report connected after connect callback' do
@@ -26,6 +32,23 @@
end
end
+ it 'should properly handle exceptions thrown by eventmachine during reconnects' do
+ reconnect_cb = false
+ NATS.start(:uri => E_TEST_SERVER, :reconnect_time_wait => 0.25) do |c|
+ # Change the uri to simulate a DNS failure which will make EM.reconnect throw an exception
+ c.instance_eval('@uri = URI.parse("nats://does.not.exist:4222/")')
+ timer=timeout_nats_on_failure(1)
+ c.on_reconnect do
+ reconnect_cb = true
+ NATS.connected?.should be_false
+ NATS.reconnecting?.should be_true
+ NATS.stop
+ end
+ @es.kill_server
+ end
+ reconnect_cb.should be_true
+ end
+
it 'should report a reconnecting event when trying to reconnect' do
reconnect_cb = false
NATS.start(:reconnect_time_wait => 0.25) do |c|

0 comments on commit a5eb745

Please sign in to comment.