Bumping vhost start timeout #591

kitsirota · 2018-07-27T16:59:56Z

This issue is new since we upgraded from 3.6.x. We run 5 node clusters running RMQ 3.7.7/Erlang 20.3.8.1. When we reach about 15 vhosts, new vhosts can take longer than 15s to create. This typically results in unhealthy vhosts with 1+ "stopped" nodes.

Proposed Changes

We would like to bump the limit to 45 seconds to mitigate having to detect failed nodes with an external monitoring solution and start them via the /api/vhosts/name/start/node endpoint.

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

Bugfix (non-breaking change which references issue Virtual hosts imported from a definitions file need a moment to initialise #575)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation (correction or otherwise)
Cosmetics (whitespace, appearance)

Checklist

Put an x in the boxes that apply. You can also fill these out after
creating the PR. If you're unsure about any of them, don't hesitate to
ask on the mailing list. We're here to help! This is simply a reminder
of what we are going to look for before merging your code.

[x ] I have read the CONTRIBUTING.md document
[x ] I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
All tests pass locally with my changes
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)
Any dependent changes have been merged and published in related repositories

In our 5 node clusters running 3.7.7, once we reach about 15 vhosts, new vhosts can take longer than 15s to create. This typically results in unhealthy vhosts with 1+ "stopped" nodes. We would like to bump the limit to 45 seconds to mitigate having to detect failed nodes and start them via the /api/vhosts/name/start/node endpoint.

michaelklishin · 2018-07-27T18:07:34Z

This is merely a workaround for a problem somewhere else. 15 seconds to create a virtual host is pretty extreme.

michaelklishin · 2018-07-27T19:03:53Z

I am not against merging this but please post some details on how it can be reproduced (and give 3.7.8-rc.1 a shot, it has non-trivial optimizations in virtual host recovery) to the mailing list.

kitsirota · 2018-07-27T19:17:51Z

@michaelklishin I absolutely agree, this is purely a bandaid for a condition that should probably be handled asynchronously. We're having trouble identifying the root cause of the variable request times.

In our use-case, we're deploying containers in a Pivotal CloudFoundry deployment. We seem to reach a point where vhosts take longer than 15 seconds to generate when we're up to about 300-400 containers spread across around 15 vhosts. The underlying nodes dont seem to have any operational bottlenecks (no load/mem/io issues when this starts happening).

I'll also try 3.7.8-rc1 and report back if that helps.

Thanks!

michaelklishin · 2018-07-27T19:35:51Z

@kitsirota so, 300-400 application instances? How many connections do they open on average? (a ball park estimate would do)

michaelklishin · 2018-07-29T01:59:01Z

So apparently the management part of #575 was not cherry-picked to v3.7.x 🤦‍♂️, so the bump per se may or may not be necessary but 45s is not an unreasonable value.

michaelklishin approved these changes Jul 27, 2018

View reviewed changes

michaelklishin merged commit ae223d9 into rabbitmq:master Jul 29, 2018

michaelklishin added this to the 3.7.8 milestone Jul 29, 2018

michaelklishin mentioned this pull request Jul 29, 2018

Virtual hosts imported from a definitions file need a moment to initialise #575

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bumping vhost start timeout #591

Bumping vhost start timeout #591

kitsirota commented Jul 27, 2018 •

edited by michaelklishin

Loading

michaelklishin commented Jul 27, 2018

michaelklishin commented Jul 27, 2018

kitsirota commented Jul 27, 2018 •

edited

Loading

michaelklishin commented Jul 27, 2018

michaelklishin commented Jul 29, 2018

Bumping vhost start timeout #591

Bumping vhost start timeout #591

Conversation

kitsirota commented Jul 27, 2018 • edited by michaelklishin Loading

Proposed Changes

Types of Changes

Checklist

michaelklishin commented Jul 27, 2018

michaelklishin commented Jul 27, 2018

kitsirota commented Jul 27, 2018 • edited Loading

michaelklishin commented Jul 27, 2018

michaelklishin commented Jul 29, 2018

kitsirota commented Jul 27, 2018 •

edited by michaelklishin

Loading

kitsirota commented Jul 27, 2018 •

edited

Loading