Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

send_frameset can now raise ConnectionClosedErrors #562

Merged
merged 2 commits into from
Aug 15, 2018

Conversation

sbonebrake
Copy link

Fixes #561

@sbonebrake
Copy link
Author

@michaelklishin Is this the approach you had in mind? If so I'll add some test cases.

@michaelklishin
Copy link
Member

Thank you. That's one half of what I had in mind. We also need to make missed heartbeats and I/O errors to first mark connection as closed (or maybe introduce a new state, recovering, that would still be different from open?).

Test cases would be great, we can adopt Toxiproxy for these new tests.

@sbonebrake
Copy link
Author

I think it a new state for recovering is a good idea because right now @recovering_from_network_failure isn't protected by a mutex.

    def handle_network_failure(exception)
      raise NetworkErrorWrapper.new(exception) unless @threaded

      @status_mutex.synchronize { @status = :disconnected }

      if !recovering_from_network_failure?
        begin
          @recovering_from_network_failure = true

Is it intended to have any thread be able to start the recovery? It would likely be the the reader_loop thread, but it seems like it could be heartbeat or the session thread. This is potentially bad if it happens when I make any call to rabbitmq. Eg session.create_channel() could hit a write_timeout, but then end up in the recover_from_network_failure method and sleep up to @network_recovery_interval * @max_recovery_attempts But maybe that's okay because it was initialized with automatically_recover?

@michaelklishin
Copy link
Member

Recovery happens in the I/O loop thread and there's only one per connection.

@michaelklishin
Copy link
Member

Extra synchronisation can be added but in 6 or so years of the current architecture of Bunny concurrent recovery attempts hasn't been an issue.

@sbonebrake
Copy link
Author

Couldn't this be a possible call stack?

Session#create_channel
Channel#open
Session#open_channel
Transport#send_frame
Transport#write    #Hits Timeout::Error
Session#handle_network_failure
Session#recover_from_network_failure

Maybe I'm missing something. I can try to get a repro of that with toxiproxy another time.
As for reconnecting state, same PR or different one to keep them small?

@michaelklishin
Copy link
Member

Connections are generally not shared in Ruby apps (that hardly have any concurrency that developers intentionally introduce). I don't mind related changes in a single PR.

@sbonebrake sbonebrake force-pushed the send_frameset_exception branch 2 times, most recently from 6821517 to d2691eb Compare August 14, 2018 20:00
it "raises a ConnectionClosedError" do
ch = @connection.create_channel
# Assume the IO thread closed the connection due to exception
@connection.instance_variable_set("@status", :closed)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the test case fine like this? or would you prefer a higher level toxiproxy test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simply close the connection to the same effect. A Toxiproxy test would be more useful.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you close the connection with session#close you would get the other exception Bunny::ChannelAlreadyClosed: cannot use a closed channel! Channel id: 1
because channel knows it's closed. I'll look into a toxiproxy test case. Remove this one?

@sbonebrake
Copy link
Author

sbonebrake commented Aug 14, 2018

You mentioned "We also need to make missed heartbeats and I/O errors to first mark connection as closed". Both heartbeat thread and reader_loop will call handle_network_failure from the Transport class. With automatic_recovery the transport will be marked as closed and the session status will be :disconnected. Is that what you mean? Or do you want :disconnected changed to :closed? Without automatic_recovery the transport will be closed and the session thread will get an exception raised on it, but the status will still be :connected.

@michaelklishin
Copy link
Member

@sbonebrake your understanding of the issue is mostly correct. I do not want to focus on the status of the transport since the transport is both an implementation detail and a disposable (as in, you have to instantiate a new one) dependency. Bunny::Session state is what I had in mind.

There's a difference between :disconnected and :closed that's also present in other clients (but not as a state). The former means "this connection is no longer open" while the latter tries to indicate "this connection was explicitly closed by the user". In Java client there is a "shutdown reason" to help distinguish between the two (in fact, three in that client) scenarios.

As far as Bunny::Session#open? goes they both should be identical. :disconnected therefore makes more sense for missed heartbeats.

Of course you can argue a case for a different set of states or just :closed. As long as we are making progress towards the idea in #560 which

@michaelklishin
Copy link
Member

michaelklishin commented Aug 14, 2018

So the behavior we are looking for (which partially might already be the case):

  • When a connection is explicitly closed with Bunny::Session#close, publish attempts on it throw a meaningful exception
  • When a connection is closed remotely but cleanly (e.g. via the HTTP API, see existing connection recovery tests), publish attempts on it throw a meaningful exception
  • When a connection is closed due to missed heartbeats, publish attempts… well, you get the idea ;)

@michaelklishin
Copy link
Member

I found something in commit logs (0e77347) that makes me think that Bunny has been pretty consistent about :disconnected vs :closed. When a connection is closed, it will never be recovered. When it is disconnected, that condition is temporary. 0e77347 transitions connections to :closed when they run out of recovery attempts.

@sbonebrake
Copy link
Author

Good to know, makes sense to me. Is :disconnected any different from :not_connected? Session#close is interesting because it first sets status to :not_connected via close_connection then sets it to :closed

@michaelklishin
Copy link
Member

@sbonebrake no, just an unintentional inconsistency. Connection closure is a request-response kind of operation. Other clients use :closing or similar. It makes a bit more sense to me than :disconnected in this particular case.

@sbonebrake sbonebrake force-pushed the send_frameset_exception branch 2 times, most recently from 757492b to 5df9d43 Compare August 14, 2018 23:59
@sbonebrake
Copy link
Author

Ok, you mentioned three scenarios here:

When a connection is explicitly closed with Bunny::Session#close, publish attempts on it throw a meaningful exception

This is covered by channel_close_spec.rb

When a connection is closed remotely but cleanly (e.g. via the HTTP API, see existing connection recovery tests), publish attempts on it throw a meaningful exception

Covered by connection recovery tests

. 3. When a connection is closed due to missed heartbeats, publish attempts…
Covered by new basic_publish_toxi_spec.rb

Is there anything else I should cover in this PR? Happy to fix any style/naming preferences you have as well.

@michaelklishin
Copy link
Member

@sbonebrake please move the Toxiproxy tests under spec/higher_level/integration. The helper file can be folded into spec_helper.rb in a module.

I'd like to have a way to have them skipped when Toxiproxy isn't available (e.g. when you are not running the dependencies in Docker and don't have it running locally). We also make some tests that require setup steps conditional (e.g. TLS). Toxiproxy is a good candidate for that. As far as I can tell in the Ruby client Toxiproxy.running? is the predicate we want.

@michaelklishin
Copy link
Member

I tried running Toxiproxy locally via Homebrew with all defaults and it logs the following:

INFO[0000] API HTTP server starting                      host=localhost port=8474 version=2.1.2
INFO[0009] Started proxy                                 name=rabbitmq proxy=[::]:11111 upstream=rabbitmq:5672
INFO[0009] Accepted client                               client=[::1]:62399 name=rabbitmq proxy=[::]:11111 upstream=rabbitmq:5672
ERRO[0009] Unable to open connection to upstream         client=[::1]:62399 name=rabbitmq proxy=[::]:11111 upstream=rabbitmq:5672

even though a RabbitMQ node is running doesn't log anything. This likely means that RabbitMQ closes TCP connection that sends no data after a handshake period but also doesn't log it since, well, it sent no data (this is to avoid log spam from TCP load balancer health checks). I haven't done a traffic capture to further confirm this. Are there any manual setup steps that I'm missing?

@michaelklishin
Copy link
Member

@sbonebrake never mind, I'm happy to make those largely cosmetic changes. All tests but TLS pass with Docker Compose but that is due to either stale certificates or recent tls-gen changes that break on MacOS with LibreSSL.

@michaelklishin michaelklishin merged commit 492c335 into ruby-amqp:master Aug 15, 2018
@michaelklishin
Copy link
Member

@sbonebrake thank you! Let me know how soon you need a preview release up on rubygems.org.

@sbonebrake
Copy link
Author

Thanks @michaelklishin Let me if you're still having trouble with toxiproxy. I didn't really try to run it without docker-compose.
A preview release soon would be nice, but for now I can always pin my Gemfile to a git sha in the interim.

@sbonebrake sbonebrake deleted the send_frameset_exception branch August 15, 2018 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants