Visibly deprecate 'rabbitmq-diagnostics node_health_check' #426

michaelklishin · 2020-06-04T21:07:43Z

rabbitmq-diagnostics node_health_check (better known as rabbitmqctl node_health_check)
is an opinionated, intrusive, aspirational attempt at producing One True Health Check™ for RabbitMQ that stems from 2016 or so.

It has proven to be too prone to false positives, consume resources unnecessarily, and be too opinionated for many teams.

A much more modular, pick-and-choose approach has been adopted since but this command has never been deprecated and continues polluting search results. It's time to deprecate it, and perhaps remove for 3.9.

Equally importantly, this command requires a fully booted node which does not work well with what nodes do today on restart and Kubernetes environments where one-at-a-time pod deployment is our recommended deployment option. This command used as a readiness probe for a pod in a stateful set can create a deployment deadlock where
Kubernetes does not proceed to deploy the next node but the current one is not fully booted until another node comes up (and it never will with one-at-a-time policy).

The text was updated successfully, but these errors were encountered:

* It requires a fully booted node, so not generally suitable for a Kubernetes readiness probe. * It can produce false positives * It is too intrusive and CPU-intensive to use at scale * Most operators do not undrestand what it really does and when they learn about it, consider it to be too opinionated and intrusive Time for the One True Health Check™ to retire from duty. Part of #426

* It requires a fully booted node, so not generally suitable for a Kubernetes readiness probe. * It can produce false positives * It is too intrusive and CPU-intensive to use at scale * Most operators do not undrestand what it really does and when they learn about it, consider it to be too opinionated and intrusive Time for the One True Health Check™ to retire from duty. Part of rabbitmq/rabbitmq-cli#426

* It requires a fully booted node, so not generally suitable for a Kubernetes readiness probe. * It can produce false positives * It is too intrusive and CPU-intensive to use at scale * Most operators do not understand what it really does and when they learn about it, consider it to be too opinionated and intrusive Time for the One True Health Check™ to retire from duty. Part of rabbitmq/rabbitmq-cli#426

* It requires a fully booted node, so not generally suitable for a Kubernetes readiness probe. * It can produce false positives * It is too intrusive and CPU-intensive to use at scale * Most operators do not understand what it really does and when they learn about it, consider it to be too opinionated and intrusive Time for the One True Health Check™ to retire from duty. Part of #426

michaelklishin self-assigned this Jun 4, 2020

michaelklishin added bug usability labels Jun 4, 2020

michaelklishin added this to the 3.8.5 milestone Jun 4, 2020

This was referenced Jun 4, 2020

Deprecate 'ctl node_health_check' #427

Merged

Deprecate 'ctl node_health_check' rabbitmq/rabbitmq-server#2366

Merged

michaelklishin closed this as completed Jun 4, 2020

pabigot mentioned this issue Jul 11, 2020

rabbitmq plugin: make deprecated heathcheck optional influxdata/telegraf#7823

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Visibly deprecate 'rabbitmq-diagnostics node_health_check' #426

Visibly deprecate 'rabbitmq-diagnostics node_health_check' #426

michaelklishin commented Jun 4, 2020

Visibly deprecate 'rabbitmq-diagnostics node_health_check' #426

Visibly deprecate 'rabbitmq-diagnostics node_health_check' #426

Comments

michaelklishin commented Jun 4, 2020