Skip to content
This repository has been archived by the owner. It is now read-only.

Make statistics collection and aggregation distributed across all cluster nodes #236

Closed
michaelklishin opened this issue Jun 24, 2016 · 13 comments

Comments

@michaelklishin
Copy link
Member

@michaelklishin michaelklishin commented Jun 24, 2016

Follow-up to #41. Problem definition is the same and the solution there doesn't work for some workload because a single node collecting and aggregating all stats only can go so far.

So this issue is about making the collector distributed (stats are stored on every cluster node) for 3.6.x.

@michaelklishin michaelklishin added this to the 3.7.0 milestone Jun 24, 2016
@michaelklishin michaelklishin changed the title Make statistics collection and aggregation distributed Make statistics collection and aggregation distributed across all cluster nodes Jun 24, 2016
@sega-yarkin
Copy link

@sega-yarkin sega-yarkin commented Jun 24, 2016

Is it make sense to append some parameters to choose which statistics I want to collect? For example, if I need statistics for exchanges and queues, but not need it for channels and connections, it will save some system resources.

@michaelklishin
Copy link
Member Author

@michaelklishin michaelklishin commented Jun 24, 2016

Some have asked for this. This may be a good chance to make that possible. Currently exchange and queue stats are emitted by channels so you cannot disable one without the others.

@michaelklishin
Copy link
Member Author

@michaelklishin michaelklishin commented Jun 24, 2016

@sega-yarkin let's not turn this issue into a support case. Please take this to the mailing list. Thanks.

@michaelklishin
Copy link
Member Author

@michaelklishin michaelklishin commented Jun 30, 2016

We are considering if we should try to target 3.6.x with this. This is easily over half of our support load right now.

Introducing another breaking management plugin version in 3.6.x isn't cool at all, however.

@noahhaon
Copy link

@noahhaon noahhaon commented Jun 30, 2016

@michaelklishin What would the breaking changes be?

@michaelklishin
Copy link
Member Author

@michaelklishin michaelklishin commented Jun 30, 2016

@noahhaon mixed clusters will have to all run the new plugin, same as with 3.6.2. No breaking HTTP API changes planned.

@noahhaon
Copy link

@noahhaon noahhaon commented Jun 30, 2016

Do users often upgrade rabbitmq-server without upgrading the plugins? I assume they are all packaged together now? I suppose the issue would be for rolling upgrades, but as long as those failures are handled gracefully ...

As you mentioned, this is a real pain point for large RMQ clusters and appears to be causing support issues for Pivotal. We would certainly love to see this feature, and it seems like it would be worth getting into 3.6.x, despite some pain during a rolling upgrade.

@michaelklishin
Copy link
Member Author

@michaelklishin michaelklishin commented Jun 30, 2016

It's not about plugins being out of sync with the server but rather mixed
patch version clusters. But I definitely see your point.

On Thu, Jun 30, 2016 at 7:56 PM, noahhaon notifications@github.com wrote:

Do users often upgrade rabbitmq-server without upgrading the plugins? I
assume they are all packaged together now? I suppose the issue would be for
rolling upgrades, but as long as those failures are handled gracefully ...

As you mentioned, this is a real pain point for large RMQ clusters and
appears to be causing support issues for Pivotal. We would certainly love
to see this feature, and it seems like it would be worth getting into
3.6.x, despite some pain during a rolling upgrade.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#236 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAAEQmrrfYHC-QOi0ncODG6TBSm_zu1Aks5qQ_UlgaJpZM4I9tBc
.

MK

Staff Software Engineer, Pivotal/RabbitMQ

@noahhaon
Copy link

@noahhaon noahhaon commented Jun 30, 2016

Gotcha - well, maybe an out-of-cycle release of the plugin which would run on 3.6.x clusters? Then at least you're not violating least-surprise and breaking versioning semantics by bundling it with a 3.6.x release.

Not sure which is worse from a maintenance perspective, but I'd imagine many users with large clusters (including us) would be quite happy to install the plugin separately if it included this feature.

@michaelklishin
Copy link
Member Author

@michaelklishin michaelklishin commented Jul 5, 2016

@noahhaon we are leaning towards shipping it in 3.6.5 or so. Most users would rather upgrade all nodes to 3.6.5 than continue fighting the issues with the existing collector.

@michaelklishin
Copy link
Member Author

@michaelklishin michaelklishin commented Sep 1, 2016

A couple of updates:

  • So far we intend to ship this in a 3.6.x release
  • We will switch to Cowboy at the same time (3.7.0 already uses Cowboy) to reduce the delta between branches. The only user facing change is HTTP API response code changing from 201 to 204 in some cases — virtually no client libraries or users should be affected.
michaelklishin added a commit to rabbitmq/hop that referenced this issue Nov 24, 2016
michaelklishin added a commit to rabbitmq/hop that referenced this issue Nov 24, 2016
This reconfigures mgmt plugin to work better as of rabbitmq/rabbitmq-management#236.
We do the same in other HTTP API clients.
michaelklishin added a commit to ruby-amqp/rabbitmq_http_api_client that referenced this issue Nov 24, 2016
michaelklishin added a commit to ruby-amqp/rabbitmq_http_api_client that referenced this issue Nov 24, 2016
Preparing for rabbitmq/rabbitmq-management#236 to land.
acogoluegnes added a commit to rabbitmq/hop that referenced this issue Nov 25, 2016
Adapt to async event processing in rabbitmq/rabbitmq-management#236
@michaelklishin
Copy link
Member Author

@michaelklishin michaelklishin commented Dec 1, 2016

It has been merged and will be dogfooded in stable before cutting a milestone release for the community.

michaelklishin added a commit to ruby-amqp/bunny that referenced this issue Dec 6, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
@michaelklishin @essen @noahhaon @dcorbacho @kjnilsson @sega-yarkin and others
You can’t perform that action at this time.