Make statistics collection and aggregation distributed across all cluster nodes #236

Closed
michaelklishin opened this Issue Jun 24, 2016 · 13 comments

Comments

@michaelklishin
Member

michaelklishin commented Jun 24, 2016

Follow-up to #41. Problem definition is the same and the solution there doesn't work for some workload because a single node collecting and aggregating all stats only can go so far.

So this issue is about making the collector distributed (stats are stored on every cluster node) for 3.6.x.

@michaelklishin michaelklishin added this to the 3.7.0 milestone Jun 24, 2016

@michaelklishin michaelklishin changed the title from Make statistics collection and aggregation distributed to Make statistics collection and aggregation distributed across all cluster nodes Jun 24, 2016

@sega-yarkin

This comment has been minimized.

Show comment
Hide comment
@sega-yarkin

sega-yarkin Jun 24, 2016

Is it make sense to append some parameters to choose which statistics I want to collect? For example, if I need statistics for exchanges and queues, but not need it for channels and connections, it will save some system resources.

Is it make sense to append some parameters to choose which statistics I want to collect? For example, if I need statistics for exchanges and queues, but not need it for channels and connections, it will save some system resources.

@michaelklishin

This comment has been minimized.

Show comment
Hide comment
@michaelklishin

michaelklishin Jun 24, 2016

Member

Some have asked for this. This may be a good chance to make that possible. Currently exchange and queue stats are emitted by channels so you cannot disable one without the others.

Member

michaelklishin commented Jun 24, 2016

Some have asked for this. This may be a good chance to make that possible. Currently exchange and queue stats are emitted by channels so you cannot disable one without the others.

@michaelklishin

This comment has been minimized.

Show comment
Hide comment
@michaelklishin

michaelklishin Jun 24, 2016

Member

@sega-yarkin let's not turn this issue into a support case. Please take this to the mailing list. Thanks.

Member

michaelklishin commented Jun 24, 2016

@sega-yarkin let's not turn this issue into a support case. Please take this to the mailing list. Thanks.

@michaelklishin

This comment has been minimized.

Show comment
Hide comment
@michaelklishin

michaelklishin Jun 30, 2016

Member

We are considering if we should try to target 3.6.x with this. This is easily over half of our support load right now.

Introducing another breaking management plugin version in 3.6.x isn't cool at all, however.

Member

michaelklishin commented Jun 30, 2016

We are considering if we should try to target 3.6.x with this. This is easily over half of our support load right now.

Introducing another breaking management plugin version in 3.6.x isn't cool at all, however.

@noahhaon

This comment has been minimized.

Show comment
Hide comment
@noahhaon

noahhaon Jun 30, 2016

@michaelklishin What would the breaking changes be?

@michaelklishin What would the breaking changes be?

@michaelklishin

This comment has been minimized.

Show comment
Hide comment
@michaelklishin

michaelklishin Jun 30, 2016

Member

@noahhaon mixed clusters will have to all run the new plugin, same as with 3.6.2. No breaking HTTP API changes planned.

Member

michaelklishin commented Jun 30, 2016

@noahhaon mixed clusters will have to all run the new plugin, same as with 3.6.2. No breaking HTTP API changes planned.

@noahhaon

This comment has been minimized.

Show comment
Hide comment
@noahhaon

noahhaon Jun 30, 2016

Do users often upgrade rabbitmq-server without upgrading the plugins? I assume they are all packaged together now? I suppose the issue would be for rolling upgrades, but as long as those failures are handled gracefully ...

As you mentioned, this is a real pain point for large RMQ clusters and appears to be causing support issues for Pivotal. We would certainly love to see this feature, and it seems like it would be worth getting into 3.6.x, despite some pain during a rolling upgrade.

Do users often upgrade rabbitmq-server without upgrading the plugins? I assume they are all packaged together now? I suppose the issue would be for rolling upgrades, but as long as those failures are handled gracefully ...

As you mentioned, this is a real pain point for large RMQ clusters and appears to be causing support issues for Pivotal. We would certainly love to see this feature, and it seems like it would be worth getting into 3.6.x, despite some pain during a rolling upgrade.

@michaelklishin

This comment has been minimized.

Show comment
Hide comment
@michaelklishin

michaelklishin Jun 30, 2016

Member

It's not about plugins being out of sync with the server but rather mixed
patch version clusters. But I definitely see your point.

On Thu, Jun 30, 2016 at 7:56 PM, noahhaon notifications@github.com wrote:

Do users often upgrade rabbitmq-server without upgrading the plugins? I
assume they are all packaged together now? I suppose the issue would be for
rolling upgrades, but as long as those failures are handled gracefully ...

As you mentioned, this is a real pain point for large RMQ clusters and
appears to be causing support issues for Pivotal. We would certainly love
to see this feature, and it seems like it would be worth getting into
3.6.x, despite some pain during a rolling upgrade.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#236 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAAEQmrrfYHC-QOi0ncODG6TBSm_zu1Aks5qQ_UlgaJpZM4I9tBc
.

MK

Staff Software Engineer, Pivotal/RabbitMQ

Member

michaelklishin commented Jun 30, 2016

It's not about plugins being out of sync with the server but rather mixed
patch version clusters. But I definitely see your point.

On Thu, Jun 30, 2016 at 7:56 PM, noahhaon notifications@github.com wrote:

Do users often upgrade rabbitmq-server without upgrading the plugins? I
assume they are all packaged together now? I suppose the issue would be for
rolling upgrades, but as long as those failures are handled gracefully ...

As you mentioned, this is a real pain point for large RMQ clusters and
appears to be causing support issues for Pivotal. We would certainly love
to see this feature, and it seems like it would be worth getting into
3.6.x, despite some pain during a rolling upgrade.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#236 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAAEQmrrfYHC-QOi0ncODG6TBSm_zu1Aks5qQ_UlgaJpZM4I9tBc
.

MK

Staff Software Engineer, Pivotal/RabbitMQ

@noahhaon

This comment has been minimized.

Show comment
Hide comment
@noahhaon

noahhaon Jun 30, 2016

Gotcha - well, maybe an out-of-cycle release of the plugin which would run on 3.6.x clusters? Then at least you're not violating least-surprise and breaking versioning semantics by bundling it with a 3.6.x release.

Not sure which is worse from a maintenance perspective, but I'd imagine many users with large clusters (including us) would be quite happy to install the plugin separately if it included this feature.

Gotcha - well, maybe an out-of-cycle release of the plugin which would run on 3.6.x clusters? Then at least you're not violating least-surprise and breaking versioning semantics by bundling it with a 3.6.x release.

Not sure which is worse from a maintenance perspective, but I'd imagine many users with large clusters (including us) would be quite happy to install the plugin separately if it included this feature.

@michaelklishin

This comment has been minimized.

Show comment
Hide comment
@michaelklishin

michaelklishin Jul 5, 2016

Member

@noahhaon we are leaning towards shipping it in 3.6.5 or so. Most users would rather upgrade all nodes to 3.6.5 than continue fighting the issues with the existing collector.

Member

michaelklishin commented Jul 5, 2016

@noahhaon we are leaning towards shipping it in 3.6.5 or so. Most users would rather upgrade all nodes to 3.6.5 than continue fighting the issues with the existing collector.

@michaelklishin

This comment has been minimized.

Show comment
Hide comment
@michaelklishin

michaelklishin Sep 1, 2016

Member

A couple of updates:

  • So far we intend to ship this in a 3.6.x release
  • We will switch to Cowboy at the same time (3.7.0 already uses Cowboy) to reduce the delta between branches. The only user facing change is HTTP API response code changing from 201 to 204 in some cases — virtually no client libraries or users should be affected.
Member

michaelklishin commented Sep 1, 2016

A couple of updates:

  • So far we intend to ship this in a 3.6.x release
  • We will switch to Cowboy at the same time (3.7.0 already uses Cowboy) to reduce the delta between branches. The only user facing change is HTTP API response code changing from 201 to 204 in some cases — virtually no client libraries or users should be affected.

@dcorbacho dcorbacho referenced this issue in michaelklishin/rabbit-hole Sep 1, 2016

Merged

Wait for stats to be published in new management plugin #84

@essen essen referenced this issue in rabbitmq/rabbitmq-web-dispatch Sep 6, 2016

Merged

Backport switch to Cowboy from master #18

@essen

This comment has been minimized.

Show comment
Hide comment
@essen

essen Sep 13, 2016

Contributor

The development branches now use Cowboy.

Contributor

essen commented Sep 13, 2016

The development branches now use Cowboy.

@michaelklishin michaelklishin modified the milestones: 3.6.7, 3.6.x Nov 17, 2016

michaelklishin added a commit to rabbitmq/hop that referenced this issue Nov 24, 2016

michaelklishin added a commit to rabbitmq/hop that referenced this issue Nov 24, 2016

Add a before_build script
This reconfigures mgmt plugin to work better as of rabbitmq/rabbitmq-management#236.
We do the same in other HTTP API clients.

michaelklishin added a commit to ruby-amqp/rabbitmq_http_api_client that referenced this issue Nov 24, 2016

michaelklishin added a commit to ruby-amqp/rabbitmq_http_api_client that referenced this issue Nov 24, 2016

acogoluegnes added a commit to rabbitmq/hop that referenced this issue Nov 25, 2016

@dcorbacho dcorbacho referenced this issue in michaelklishin/rabbit-hole Dec 1, 2016

Merged

Use agent supervisor to restart DB #86

@michaelklishin

This comment has been minimized.

Show comment
Hide comment
@michaelklishin

michaelklishin Dec 1, 2016

Member

It has been merged and will be dogfooded in stable before cutting a milestone release for the community.

Member

michaelklishin commented Dec 1, 2016

It has been merged and will be dogfooded in stable before cutting a milestone release for the community.

michaelklishin added a commit to ruby-amqp/bunny that referenced this issue Dec 6, 2016

@michaelklishin michaelklishin referenced this issue in michaelklishin/rabbit-hole Dec 21, 2016

Closed

[rfc] support message_stats -> * -> samples #88

@michaelklishin michaelklishin referenced this issue in rabbitmq/rabbitmq-auth-backend-http Jan 13, 2017

Closed

Remove dependency on mochiweb in master #19

@michaelklishin michaelklishin referenced this issue in ruby-amqp/rabbitmq_http_api_client Feb 1, 2017

Closed

Convenience method for fetching statistics DB node #2

@mattbennett mattbennett referenced this issue in nameko/nameko Sep 7, 2017

Merged

Fix flakey tests #468

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment