Skip to content
This repository has been archived by the owner on Nov 18, 2020. It is now read-only.

Visibly deprecate 'rabbitmq-diagnostics node_health_check' #426

Closed
michaelklishin opened this issue Jun 4, 2020 · 0 comments
Closed

Visibly deprecate 'rabbitmq-diagnostics node_health_check' #426

michaelklishin opened this issue Jun 4, 2020 · 0 comments
Assignees
Milestone

Comments

@michaelklishin
Copy link
Member

rabbitmq-diagnostics node_health_check (better known as rabbitmqctl node_health_check)
is an opinionated, intrusive, aspirational attempt at producing One True Health Check™ for RabbitMQ that stems from 2016 or so.

It has proven to be too prone to false positives, consume resources unnecessarily, and be too opinionated for many teams.

A much more modular, pick-and-choose approach has been adopted since but this command has never been deprecated and continues polluting search results. It's time to deprecate it, and perhaps remove for 3.9.

Equally importantly, this command requires a fully booted node which does not work well with what nodes do today on restart and Kubernetes environments where one-at-a-time pod deployment is our recommended deployment option. This command used as a readiness probe for a pod in a stateful set can create a deployment deadlock where
Kubernetes does not proceed to deploy the next node but the current one is not fully booted until another node comes up (and it never will with one-at-a-time policy).

@michaelklishin michaelklishin self-assigned this Jun 4, 2020
@michaelklishin michaelklishin added this to the 3.8.5 milestone Jun 4, 2020
michaelklishin added a commit that referenced this issue Jun 4, 2020
 * It requires a fully booted node, so not generally suitable for a Kubernetes readiness probe.
 * It can produce false positives
 * It is too intrusive and CPU-intensive to use at scale
 * Most operators do not undrestand what it really does and when they learn about it,
   consider it to be too opinionated and intrusive

Time for the One True Health Check™ to retire from duty.

Part of #426
michaelklishin added a commit to rabbitmq/rabbitmq-server that referenced this issue Jun 4, 2020
 * It requires a fully booted node, so not generally suitable for a Kubernetes readiness probe.
 * It can produce false positives
 * It is too intrusive and CPU-intensive to use at scale
 * Most operators do not undrestand what it really does and when they learn about it,
   consider it to be too opinionated and intrusive

Time for the One True Health Check™ to retire from duty.

Part of rabbitmq/rabbitmq-cli#426
michaelklishin added a commit to rabbitmq/rabbitmq-server that referenced this issue Jun 4, 2020
 * It requires a fully booted node, so not generally suitable for a Kubernetes readiness probe.
 * It can produce false positives
 * It is too intrusive and CPU-intensive to use at scale
 * Most operators do not understand what it really does and when they learn about it,
   consider it to be too opinionated and intrusive

Time for the One True Health Check™ to retire from duty.

Part of rabbitmq/rabbitmq-cli#426
michaelklishin added a commit that referenced this issue Jun 4, 2020
 * It requires a fully booted node, so not generally suitable for a Kubernetes readiness probe.
 * It can produce false positives
 * It is too intrusive and CPU-intensive to use at scale
 * Most operators do not understand what it really does and when they learn about it,
   consider it to be too opinionated and intrusive

Time for the One True Health Check™ to retire from duty.

Part of #426
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant