Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed issue with reconnect mechanism #251

Merged
merged 2 commits into from Sep 30, 2014

Conversation

@foeken
Copy link
Contributor

commented Sep 30, 2014

When an unexpected error happens when reading frames during the open_connection
method the mechanism failed to kick in and the connection would never be re-established.

When an unexpected error happens when reading frames during the open_connection
method the mechanism failed to kick in and the connection would never be re-established.
@@ -961,13 +961,14 @@ def open_connection
fr
# frame timeout means the broker has closed the TCP connection, which it
# does per 0.9.1 spec.
rescue Errno::ECONNRESET, ClientTimeout, AMQ::Protocol::EmptyResponseError, EOFError, IOError => e

This comment has been minimized.

Copy link
@foeken

foeken Sep 30, 2014

Author Contributor

I think it's ok to be careful here and catch everything. We could add some logging though.

raise TCPConnectionFailed.new('An empty frame while opening the connection')
# @state = :closed
# @logger.error "RabbitMQ closed TCP connection before AMQP 0.9.1 connection was finalized. Most likely this means authentication failure."
# raise Bunny::PossibleAuthenticationFailureError.new(self.user, self.vhost, self.password.size)

This comment has been minimized.

Copy link
@foeken

foeken Sep 30, 2014

Author Contributor

Can this really be removed in all cases?

This comment has been minimized.

Copy link
@michaelklishin

michaelklishin Sep 30, 2014

Member

RabbitMQ versions before 3.3.0 indicate authentication failures by closing TCP connection (per AMQP 0-9-1 spec). So no, we can't just comment these out.

This comment has been minimized.

Copy link
@foeken

foeken Sep 30, 2014

Author Contributor

Mm I was worried something like that would be the case ;) Unfortunately there is no context available for the code to tell them apart.

Worst case scenario in the older RabbitMQ version an auth exception would result in hammering for Bunny 1.5+.

This comment has been minimized.

Copy link
@foeken

foeken Sep 30, 2014

Author Contributor

I will try to install the older RabbitMQ and see if there is some way for me to tell them apart from the closing of the socket, but I expect no good news ;)

@@ -1011,13 +1012,14 @@ def open_connection
fr
# frame timeout means the broker has closed the TCP connection, which it
# does per 0.9.1 spec.
rescue Errno::ECONNRESET, ClientTimeout, AMQ::Protocol::EmptyResponseError, EOFError => e
rescue

This comment has been minimized.

Copy link
@michaelklishin

michaelklishin Sep 30, 2014

Member

What's the intent here? Rescue everything? Then it should be rescue Exception => e. Bare rescue is the same as rescue StandardError => e.

This comment has been minimized.

Copy link
@foeken

foeken Sep 30, 2014

Author Contributor

Would StandardError not suffice here? Is there any reasons we should catch the base Exception class in this case?

This comment has been minimized.

Copy link
@michaelklishin

michaelklishin Sep 30, 2014

Member

Yes, looks like it's fine.

@michaelklishin

This comment has been minimized.

Copy link
Member

commented Sep 30, 2014

Have you tried this with actual authentication failures, say, with RabbitMQ 3.1.x?

@michaelklishin

This comment has been minimized.

Copy link
Member

commented Sep 30, 2014

Actually, authentication failure notifications were introduced in 3.2.0. Worst case scenario we can make Bunny 1.5 require RabbitMQ 3.2.0 or later. 3.4.0 ships fairly soon.

@foeken

This comment has been minimized.

Copy link
Contributor Author

commented Sep 30, 2014

Confirmed: auth works with 3.2.x

@michaelklishin

This comment has been minimized.

Copy link
Member

commented Sep 30, 2014

Yes, it will work with 3.2.x because of the protocol extension. However, I'm not sure the client won't endlessly reconnect with 3.1.x or older.

@foeken

This comment has been minimized.

Copy link
Contributor Author

commented Sep 30, 2014

I am almost sure it will :) But I wanted to check both versions :D

@foeken

This comment has been minimized.

Copy link
Contributor Author

commented Sep 30, 2014

We are sort of in luck. The issue will only occur if the auth changes while a connection has been made. In the case the auth is refused while calling start by the user the reconnect mechanism won't kick in, the exception is now ambiguous but we can solve that in the wording (i.e. An empty frame was received while opening the connection. In RabbitMQ <= 3.1 this could mean an authentication issue.).

Now if the authentication changes while the connection is established, RabbitMQ 3.1.x does nothing to the connection. The established connections can still publish and consume.

Interestingly in my test when the connection drops and the reconnect kicks in it reconnects without any auth issue :/ Is the auth cached in some way?

@michaelklishin

This comment has been minimized.

Copy link
Member

commented Sep 30, 2014

I was only talking about the initial connection case. Credentials changing after connection is established is a pretty crazy edge case. In environments where that is possible, open connections from the user
are typically force-closed via HTTP API, too.

@michaelklishin

This comment has been minimized.

Copy link
Member

commented Sep 30, 2014

I like the "An empty frame was received while opening the connection. In RabbitMQ <= 3.1 this could mean an authentication issue.". Please update the error message and I think we can merge this.

@foeken

This comment has been minimized.

Copy link
Contributor Author

commented Sep 30, 2014

Then we should be safe. The User will always call session.start manually somewhere in their code. And that will yield the exception on their end. If we change the wording it should be fine.

This reflects the case where older RabbitMQ servers would simply
close the connection when authentication failed.
@foeken foeken changed the title [WIP] Fixed issue with reconnect mechanism Fixed issue with reconnect mechanism Sep 30, 2014
@foeken

This comment has been minimized.

Copy link
Contributor Author

commented Sep 30, 2014

Okay the PR is ready for merge then. If you have any more additions feel free to tell me :) When you merge it please release a pre2 so we can start testing the fix with some load :)

@foeken

This comment has been minimized.

Copy link
Contributor Author

commented Sep 30, 2014

I have yet to run the specs btw. I'm reinstalling my newer RabbitMQ :)

michaelklishin added a commit that referenced this pull request Sep 30, 2014
Fixed issue with reconnect mechanism
@michaelklishin michaelklishin merged commit 529c301 into ruby-amqp:master Sep 30, 2014
1 check passed
1 check passed
continuous-integration/travis-ci The Travis CI build passed
Details
@foeken

This comment has been minimized.

Copy link
Contributor Author

commented Sep 30, 2014

This has a failing spec I am still solving :(

@foeken

This comment has been minimized.

Copy link
Contributor Author

commented Sep 30, 2014

The PossibleAuth behaviour was explicitly specced, so no real damage probably just red specs

@michaelklishin

This comment has been minimized.

Copy link
Member

commented Sep 30, 2014

In 1.5.0.pre2. I think we'll require RabbitMQ 3.2.0 nonetheless.

@foeken

This comment has been minimized.

Copy link
Contributor Author

commented Sep 30, 2014

In that case perhaps we can remove the possible auth mentions? To fix the specs I have to pass them though a lot of exceptions :)

@foeken

This comment has been minimized.

Copy link
Contributor Author

commented Sep 30, 2014

Sorry, it seems i was still unwittingly running the older RabbitMQ against the specs and thats why they failed :/

@michaelklishin

This comment has been minimized.

Copy link
Member

commented Oct 2, 2014

@foeken let us know how this change works for you in production ;)

@foeken

This comment has been minimized.

Copy link
Contributor Author

commented Oct 2, 2014

So far so good ;) but might give it a few days

Sent from my iPhone

On 02 Oct 2014, at 15:32, Michael Klishin notifications@github.com wrote:

@foeken let us know how this change works for you in production ;)


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.