Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make node_health_check completely node-local #883

Merged
merged 3 commits into from Jul 14, 2016

Conversation

binarin
Copy link
Contributor

@binarin binarin commented Jul 14, 2016

Changes to rabbitmq-common - rabbitmq/rabbitmq-common#118

Fixes #818

Also:

  • added test-suite
  • fix queues health-checking - rabbit_amqqueue:info_all was expecting
    vhost argument, but was passed [pid] instead. So no health check was
    actually performed for queues. Proper way is to iterate over all vhosts.
  • delegate all timeout/error handling to rabbit_cli, so no more magic
    70 and 68 and no more case-ing on badrpc that is already
    present in other places.
  • Make rabbitmqctl timeout arg handling more consistent - global timeout ?RPC_TIMEOUT was not used, because default value of
    "infinity" was always introduced via ?TIMEOUT_DEF. Now infinity is
    used for commands without timeout support, and ?RPC_TIMEOUT
    otherwise.
  • ?COMMANDS_WITH_TIMEOUT now can contain per-command default values
    for timeout, using tuple {Command, DefaultTimeoutInMilliSeconds}
    instead of just Command.

- Global timeout `?RPC_TIMEOUT` was not used, because default value of
  infinity was always introduced via `?TIMEOUT_DEF`. Now `infinity` is
  used for commands without timeout support, and `?RPC_TIMEOUT`
  otherwise.

- `?COMMANDS_WITH_TIMEOUT` now can contain per-command default values
  for timeout, using tuple `{Command, DefaultTimeoutInMilliSeconds}`
  instead of just `Command`.
@michaelklishin
Copy link
Member

Corrected end_per_testsuite a bit and merged. Thank you.

@michaelklishin
Copy link
Member

@binarin this fails to merge into master cleanly. Would you have a chance to resolve conflicts?

@binarin
Copy link
Contributor Author

binarin commented Jul 14, 2016

Should I just merge current stable into master? It also pushes some commits that are not authored by me.

@binarin
Copy link
Contributor Author

binarin commented Jul 14, 2016

Here it is - #884

@michaelklishin
Copy link
Member

@binarin yup, simply merge stable into master and resolve conflicts. Thank you.

@michaelklishin
Copy link
Member

@binarin just in case: for both server and common :)

@binarin
Copy link
Contributor Author

binarin commented Jul 14, 2016

Done.

binarin added a commit to binarin/rabbitmq-server that referenced this pull request Aug 10, 2016
To stop wasting network bandwidth during health checks (e.g. list_queues
in 3-node cluster with 10k queues costs on average 12 megabytes of
traffic and 27k TCP packets).

Features are disabled by default to preserve compatibility, but they
SHOULD be enabled when following patches are present in currently used
rabbitmq version:
- rabbitmq#915
- rabbitmq#911
- rabbitmq#883
binarin added a commit to binarin/rabbitmq-server that referenced this pull request Aug 17, 2016
This will stop wasting network bandwidth for monitoring.

E.g. a 200-node OpenStack installation produces aronud 10k queues and
10k channels. Doing single list_queues/list_channels in cluster in this
environment results in 27k TCP packets and around 12 megabytes of
network traffic. Given that this calls happen ~10 times a minute with 3
controllers, it results in pretty significant overhead.

To enable those features you shoud have rabbitmq containing following
patches:
- rabbitmq#883
- rabbitmq#911
- rabbitmq#915
binarin added a commit to binarin/rabbitmq-server that referenced this pull request Aug 23, 2016
This will stop wasting network bandwidth for monitoring.

E.g. a 200-node OpenStack installation produces aronud 10k queues and
10k channels. Doing single list_queues/list_channels in cluster in this
environment results in 27k TCP packets and around 12 megabytes of
network traffic. Given that this calls happen ~10 times a minute with 3
controllers, it results in pretty significant overhead.

To enable those features you shoud have rabbitmq containing following
patches:
- rabbitmq#883
- rabbitmq#911
- rabbitmq#915
openstack-gerrit pushed a commit to openstack-archive/fuel-library that referenced this pull request Aug 24, 2016
This will stop wasting network bandwidth for monitoring.

E.g. a 200-node OpenStack installation produces aronud 10k queues and
10k channels. Doing single list_queues/list_channels in cluster in this
environment results in 27k TCP packets and around 12 megabytes of
network traffic. Given that this calls happen ~10 times a minute with 3
controllers, it results in pretty significant overhead.

Upstream change:
- rabbitmq/rabbitmq-server#916

To enable those features you shoud have rabbitmq containing following patches:
- rabbitmq/rabbitmq-server#883
- rabbitmq/rabbitmq-server#911
- rabbitmq/rabbitmq-server#915

Change-Id: Icfde3360b42a841ad3a219b94f65a69b2a18cea7
Closes-Bug: 1614071
bogdando pushed a commit to bogdando/resource-agents that referenced this pull request Oct 1, 2021
This will stop wasting network bandwidth for monitoring.

E.g. a 200-node OpenStack installation produces aronud 10k queues and
10k channels. Doing single list_queues/list_channels in cluster in this
environment results in 27k TCP packets and around 12 megabytes of
network traffic. Given that this calls happen ~10 times a minute with 3
controllers, it results in pretty significant overhead.

To enable those features you shoud have rabbitmq containing following
patches:
- rabbitmq/rabbitmq-server#883
- rabbitmq/rabbitmq-server#911
- rabbitmq/rabbitmq-server#915
oalbrigt pushed a commit to ClusterLabs/resource-agents that referenced this pull request Nov 4, 2021
* Backward-compatible commit for packaging of fuel-library

based on Change-Id: Ie759857fb94db9aa94aaeaeda2c6ab5bb159cc9e
All the work done for fuel-library packaging

Should be overriden by the change above after we switch
CI to package-based

implements blueprint: package-fuel-components

Change-Id: I48ed37a009b42f0a9a21cc869a869edb505b39c3

* All the work done for fuel-library packaging

1) Package fuel library into three different
packages:
RPM: fuel-library6.1
ALL: fuel-ha-utils, fuel-misc

2) Install packages onto slave nodes
implements blueprint: package-fuel-components

Change-Id: Ie759857fb94db9aa94aaeaeda2c6ab5bb159cc9e

* Check hostlist against starting and active resources

This commit makes post-start notify action to check
hostlist of nodes that should be joined to the cluster
to contain not only nodes that will be started but
also ones that are already started. This fixes
the case when Pacemaker sends notifies only for
the latest event and thus the node which is not
included into the start list will not join the
cluster. Also it checks whether the node is
already clustered and skips the join if it
is not needed.

Change-Id: Ibe8ecdcfe42c14228350b1eb3c9d08b1a64e117d
Closes-bug: #1455761

* Check whether beam is started before running start_app

There is a mistake in OCF logic which tries
to start rabbitmq app without running beam
after Mnesia reset getting into the loop
which constantly fails until it times out

Change-Id: Id096961e206a083b51978fc5034f99d04715d7ea
Related-bug: #1436812

* Sync rabbit OCF code diverge to packages

W/o this patch, the code in OCF script from
deployment/ dir will never get to the fuel-library
packages, which are building from files/ and /debian
dirs only.

The solution is:
1) sync the code diverged to the files/ and debian/
2) either to remove the source OCF file or to
update the way files being linked.

This patch fixes only the step 1 as there is not yet
decided how to deal with the step 2.

Related-bug: #1457441
Related-bug: #184966

Change-Id: Ied86640e8e853de99bcd26f1ae726fc8272b6db7
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Fix rabbit OCF reset_mnesia

W/o this fix, when rabbit app cannot start due to
corrupted mnesia state, the mnesia would be cleaned
not completely. This may prevent the rabbit app from
start and take the node out of the cluster permanently.

The solution is to remove all rabbit node related mnesia
files.

Closes-bug: #1457766

Change-Id: I680efbf573c22aa9a13d8429d985b5a57235b2bf
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Fix rabbit OCF demote/stop/promote actions

* When the rabbit node went down, its status remains 'running'
  in mnesia db for a while, so few retries (50 sec of total) are
  required in order to kick and forget this node from the cluster.
  This also requires +50 sec for actions stop & demote timeout.
* The rabbit master score in the CIB is retained after the current
  master moved manually. This is wrong and the score must be reset
  ASAP for post-demote and post-stop as well.
* The demoted node must be kicked from cluster by other nodes
  on post-demote processing.
* Post-demote should stop the rabbit app at the node being demoted as
  this node should be kicked from the cluster by other nodes.
  Instead, it stops the app at the *other* nodes and brings full
  cluster downtime.
* The check to join should be only done at the post-start and not at
  the post-promote, otherwise the node being promoted may think it
  is clustered with some node while the join check reports it as
  already clustered with another one.
  (the regression was caused by https://review.openstack.org/184671)
* Change `hostname` call to `crm_node -n` via $THIS_PCMK_NODE
  everywhere to ensure we are using correct pacemaker node name
* Handle empty values for OCF_RESKEY_CRM_meta_notify_* by reporting
  the resource as not running. This will rerun resource and restore
  its state, eventually.

Closes-bug: #1436812
Closes-bug: #1455761

Change-Id: Ib01c1731b4f06e6b643a4bca845828f7db507ad3
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Add rabbit OCF functions to get pacemaker node names

W/o this fix, the failover time was longer than expected
as rabbit nodes was able to query corosync nodes left the
cluster and also try to join them by rabbit cluster ending
up being reset and rejoin alive nodes later.
1) Add functions:
  a) to get all alive nodes in the partition
  b) to get all nodes
This fixes get_monitor behaviour so that it ignores
attributes for dead nodes as crm_node behaviour
changed with upgrade of pacemaker. So rabbit nodes will
never try to join the dead ones.

2) Fix bash scopes for local variables
Minor change removing unexcpeted behavior when local variable
impacts global scope.

Related-bug: #1436812

Change-Id: I89b716b4cd007572bb6832365d4424669921f057
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Check if the rabbitmqctl command is responding

W/o this fix, rabbitmqctl sometimes may hang failing
many commands. This is a problem as it brings the rabbit node
to unresponsive and broken state. This also may affect
entire cluster operations, for example, when the failed command is
the forget_cluster_node.

The solution is to check for the cases when the command rabbitmqctl
list_channels timed out and killed or termintated with exit codes
137 or 124 and return generic error.
There is also related confusing error message "get_status() returns generic
error" may be logged when the rabbit node is running out of the cluster
and fixed as well.

Closes-bug: #1459173

Change-Id: Ia52fc5f2ab7adb36252a7194f9209ab87ce487de
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Add second monitor operation to check RabbitMQ

This commit checks whether there is a running
cluster of rabbitmq and if rabbitmq app is running
on the node and exits with non-zero code if
current node is not running rabbitmq, but should
do so

Change-Id: I2098405b39ade7325b94781aeb997de0937bdf4c
Closes-bug: #1458828

* Erase mnesia if a rabbit node cannot join the cluster

W/o this fix, the situation is possible when a
rabbit node would stuck in a start/stop loop failing
to join the cluster with an error:
"no_running_cluster_nodes, You cannot leave a cluster
if no online nodes are present."

This is an issue because the rabbit node should always
be able to join the cluster, if it was ordered to start
by pacemaker RA.

The solution is to force the mnesia reset, if the
rabbit node cannot join the cluster on post-start
notify. Note, that for the master starting, the node
wouldn't be reset. So, the mnesia will be kept intact
at least on the resource master.

Partial-bug: #1461509

Change-Id: I69bc13266a1dc784681b2677ae5616bfc28cf54f
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Restart rabbit if can't list queues or found memory alert

W/o this fix the dead end situation is possible
when the rabbit node have no free memory resources left
and the cluster blocks all publishing, by design.
But the app thinks "let's wait for the publish block have
lifted" and cannot recover.

The workaround is to monitor results
of crucial rabbitmqctl commands and restart the rabbit node,
if queues/channels/alarms cannot be listed or if there are
memory alarms found.
This is the similar logic as we have for the cases when
rabbitmqctl list_channels hangs. But the channels check is also
fixed to verify if the exit code>0 when the rabbit app is
running.

Additional checks added to the monitor also require extending
the timeout window for the monitor action from 60 to 180 seconds.

Besides that, this patch makes the monitor action to gather the
rabbit status and runtime stats, like consumed memory by all
queues of total Mem+Swap, total messages in all queues and
average queue consumer utilization. This info should help to
troubleshoot failures better.

DocImpact: ops guide. If any rabbitmq node exceeded its memory
threshold the publish became blocked cluster-wide, by design.
For such cases, this rabbit node would be recovered from the
raised memory alert and immediately stopped to be restarted
later by the pacemaker. Otherwise, this blocked publishing state
might never have been lifted, if the pressure persists from the
OpenStack apps side.

Closes-bug: #1463433

Change-Id: I91dec2d30d77b166ff9fe88109f3acdd19ce9ff9
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Fix chowning for rabbit OCF

W/o this fix, the list of file names not
accessible by rabbitmq user will be treated
as multiple arguments to the if command causing
it to throw the "too many arguments" error and
the chown command to be skipped.

This is the problem as it might prevent the rabbitmq
server from starting because of a bad files ownership.

The solution is to pass the list of files as a single
argument "${foo}".

Closes-bug: #1472175

Change-Id: I1d00ec3f31cd0f023bd58a4e11e5b31659977229
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Fix error return codes for rabbit OCF

W/o this fix the situation is possible when
rabbit OCF returns OCF_NOT_RUNNING in the hope of
future restart of the resource by pacemaker.

But in fact, pacemaker will not trigger restart action
if monitor returns "not running". This is an issue
as we want resource restarted.

The solution is to return OCF_ERR_GENERIC instead of
OCF_NOT_RUNNING when we expect the resource to be restarted
(which is action stop plus action start).

Closes-bug: #1472230

Change-Id: I10c6e43d92cb23596636d86932674b36864d1595
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Implement the dumping of rabbitMQ definitions

This changes leverages the rabbitmq management plugin to dump
exchanges, queues, bindings, users, virtual hosts, permissions and
parameters from the running system. Specifically this change adds the
following:

* The dumping rabbitMQ definitions (users/vhosts/exchanges/etc) during
  the end of the deployment
* The possibility to restore definitions to the rabbitmq-server ocf
  script during rabbitMQ startup.
* Enabled rabbitmq admin plugin, but restricts it to localhost traffic.
  This reverts Ic01c26200f6019a8112b1c5fb04a282e64b3b3e6 but adds
  firewall rules to mitigate the issue.

DocImpact: The dump_rabbit_definitions task can be used to backup the
rabbitmq definitions and if custom definitions (users/vhosts/etc) are
created it must be run or the changes may be lost during the rabbitmq
failover via pacemaker.

Change-Id: I715f7c2ae527f7e105b9f6b7d82c443e8accf178
Closes-bug: #1383258
Related-bug: #1450443
Co-Authored-By: Alex Schultz <aschultz@mirantis.com>

* Fix rabbitmq data restore for large datasets

Previously we were sending the json backup data on the command line
which fails when the dataset is large. This change updates the command
line options for curl to pass the filename directly and let it handle
the reading of the data.

Change-Id: I37f298279beca06df41fb08e1745602976c6a776
Closes-Bug: 1383258

* Add more logs to rabbitmq get_status function

It's really hard to debug, when get_status() returns $OCF_NOT_RUNNING
only and looses exit code and error output.

Added more logs to avoid of this situation.

Related-Bug: #1488999

Change-Id: Id0999235d7be688f55799e2952fe22e97b678ce7

* Detect a last man standing for rabbit OCF agent

W/o this patch, the race condition is possible
when there is no running rabbit nodes/resource
master. The rabbit nodes will start/stop in an
endless loop as a result introducing full downtime
for AMQP cluster and cloud control plane.

The solution is:
* On post-start/post-promote notify, do nothing, if
  either of the following is a true:
  - there is no rabbit resources running or no master
  - the list of rabbit resources being started/promoted
    reported empty
* For such cases, do not report resource failure and delegate
  recovery, if needed, to the "running out of the cluster"
  monitor's logic.
* Additionally, report about a last man standing when
  there is no running rabbit resources around.

Closes-bug: #1491306

Change-Id: If1c62fac26b63410636413c49fce55c35e53dc5f
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Make RabbitMQ OCF script tolerate rabbitmqctl timeouts

The change makes OCF script ignore small number of timeouts of rabbitmqctl
for 'heavy' operations: list_channels, get_alarms and list_queues.
Number of tolerated timeouts in a row is configured through a new variable
'max_rabbitmqctl_timeouts'. By default it is set to 1, i.e. rabbitmqctl
timeouts are not tolerated at all.

Bug #1487517 is fixed by extracting declaration of local variables
'rc_alarms' and 'rc_queues' from assignment operations.


Text for Operations Guide:

If on node where RabbitMQ is deployed
other processes consume significant part of CPU, RabbitMQ starts
responding slow to queries by 'rabbitmqctl' utility. The utility is
used by RabbitMQ's OCF script to monitor state of the RabbitMQ.
When utility fails to return in pre-defined timeout, OCF script
considers RabbitMQ to be down and restarts it, which might lead to
a limited (several minutes) OpenStack downtime. Such restarts
are undesirable as they cause downtime without benefit. To
mitigate the issue, the OCF script might be told to tolerate
certain amount of rabbitmqctl timeouts in a row using the following
command:
  crm_resource --resource p_rabbitmq-server --set-parameter \
      max_rabbitmqctl_timeouts --parameter-value N

Here N should be replaced with the number of timeouts. For instance,
if it is set to 3, the OCF script will tolerate two rabbitmqctl
timeouts in a row, but fail if the third one occurs.

By default the parameter is set to 1, i.e. rabbitmqctl timeout is not
tolerated at all. The downside of increasing the parameter is that
if a real issue occurs which causes rabbitmqctl timeout, OCF script
will detect that only after N monitor runs and so the restart, which
might fix the issue, will be delayed.

To understand that RabbitMQ's restart was caused by rabbitmqctl timeout
you should examine lrmd.log of the corresponding controller on Fuel
master node in /var/log/docker-logs/remote/ directory. Here lines like
"the invoked command exited 137: /usr/sbin/rabbitmqctl list_channels ..."

indicate rabbitmqctl timeout. The next line will explain if it
caused restart or not. For example:
"rabbitmqctl timed out 2 of max. 3 time(s) in a row. Doing nothing for now."

DocImpact: user-guide, operations-guide

Closes-Bug: #1479815
Closes-Bug: #1487517
Change-Id: I9dec06fc08dbeefbc67249b9e9633c8aab5e09ca

* Return NOT_RUNNING when beam is not RUNNING

Change get_status to return NOT_RUNNING when
beam is not_running. Otherwise, pacemaker
will get stuck during rabbitmq failover and
will not attempt to restart the failed resource

Change-Id: I926a3eafa9968abdf07baa5f2d5c22480300fb30
Closes-bug: #1484280

* Start RabbitMQ app on notify

On notify, if we detect that we are a part of a cluster we still
need to start the RabbitMQ application, because it is always
down after action_start finishes.

Closes-Bug: #1496386
Change-Id: I307452b687a6100cc4489c8decebbc3dccdbc432

* Avoid division operation in shell

When the data returned from 'rabbitmqctl list_queues' grows a lot
and awk sums up all the rows especially for memory calculation it
returns the sum in scientific notation (example from bug
was .15997e+09), later when we want to calculate the memory in
MB instead of bytes, the bash division does not like this string.

We can just avoid the situation by doing the division into MB
in awk itself. Since we don't need the memory in bytes anyway.

Closes-Bug: #1503331
Change-Id: I38d25406b84d0f70ed62101d5fb5ba108bcab8bd

* Wait for rabbitmq sync before stop/demote actions

Added new OCF key stop_time (corresponding to start_time)
Added wait_sync function which tries until start_time/2
for queues on stopped/demoted node to reach synced state.

Added optional [-t timeout] to su_rabbit_cmd function to
provide arbitrary timeout

Change-Id: Iae2211b3d477a9603a58d5eacb12e0fba924861a
Closes-Bug: #1464637

* Sync rabbitmq OCF from upstream

Sync upstream changes back to Fuel downstream
Source https://github.com/rabbitmq/rabbitmq-server
version stable/fedfefebaa39a0aeb41cf9328ba44c3a458e4614

Related blueprint upstream-rabbit-ocf
Closes-bug: #1473015

Change-Id: Ie19c2f071c53b873a359c6c5134e9498c6391e66
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Packages are now "self-hosted": no need for the packaging dir

... in the source distribution anymore

* Fix the timeout arg for the su_rabbit_cmd

And fix local bashisms as a little bonus
Upstream patch https://github.com/rabbitmq/rabbitmq-server/pull/374

Related-bug: #1464637

Change-Id: I13189de9f8abce23673c031d11132e495e1972e3
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Fix piped exit codes expectations and count processing

* Fix return code of the get_all_pacemaker_nodes() and
  get_alive_pacemaker_nodes_but() to be
  not provided as ignored anyway.
* Fix return code expectation of the fetched count attribute
  in the check_timeouts().
Upstream patch https://github.com/rabbitmq/rabbitmq-server/pull/374

Closes-bug: #1506440

Change-Id: I44a6cff2ccba1ba53a18da90c9d74cbb6084ca0c
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Merge branch 'master' into erlang.mk

* Don't update .erlang.cookie on every run

Update happens even during no-op commands like 'meta-data' or 'usage'.
During this update there is a short window for a race condition: a shell
redirection truncates the cookie file, and echo writes data there only
after a brief period of time. So erlang may read data from this empty
file and die with error "Too short cookie string".

Change-Id: I4c3201617669f3872145048b77337632cb93558c
Closes-Bug: #1512754

* Fix metadata in OCF HA script

Looks like copy-paste has gone wrong.

* Don't update cookie on every run of HA OCF script

Update happens even during no-op commands like 'meta-data' or 'usage'.
During this update there is a short window for a race condition: a shell
redirection truncates the cookie file, and echo writes data there only
after a brief period of time. So erlang may read data from this empty
file and die with the error "Too short cookie string".

* Bind rabbitmq, epmd, and management plugin to internal IP

RabbitMQ itself was already listening on the correct IP
for controllers, but epmd and management plugin listened
everywhere (although management was covered by firewall
rules).

This covers all RabbitMQ server connection binding so that
all connections are done on the same IP address (with the
unfortunate side effect of blocking localhost connections).

Removed unused parameter rabbitmq_host from
nailgun::rabbitmq.

Change-Id: I9bfb8bc85fcd6d4711c4ca9d79745ad2ce7e673a
Closes-Bug: #1501731

* Add host_ip field

Working with RMQ definitions via management plugin
requires knowing the IP address where it listens.

host_ip parameter will default to 127.0.0.1, but is
configurable.

* Merge branch 'stable'

* Add ability to disable HA for RabbitMQ queues

Add two flags:
 * enable_rpc_ha which enables queue mirroring for RPC queues
 * enable_notifications_ha which enables queue mirroring for
   Ceilometer queues

Since the feature is experimental, both flags are set to true by
default to preserve current behaviour.

The change is implemented in several steps:
 * the upstream script changed so that it allows to extend the
   list of parameters and uses a policy file to define RabbitMQ
   policies.
 * we add our own version of OCF script which wraps around the
   upstream one. It defines a new enable_rpc_ha and
   enable_notifications_ha parameter and passes their value to the
   upstream script.
 * we add our policy file, where we use the introduced parameters
   to decide which policies we should set.

So we will have two OCF scripts for RabbitMQ in our deployment:
 * rabbitmq-server-upstream - the upstream version
 * rabbitmq-server - our extention, which will be used in the
   environment

The upstream version of the script is pushed to the upstream
along with empty policy file, so that other users can define their
own policies or extend the script if needed. Here are the
corresponding pull requests:
  https://github.com/rabbitmq/rabbitmq-server/pull/480
  https://github.com/rabbitmq/rabbitmq-server/pull/482
(both are already merged)

Text for Operations Guide

It is possible to significantly reduce load which OpenStack puts on
RabbitMQ by disabling queue mirroring. This could be done separately
for RPC queues and Ceilometer ones. To disable mirroring for RPC
queues, execute the following command on one of the controllers:

    crm_resource --resource p_rabbitmq-server --set-parameter \
        enable_rpc_ha --parameter-value false

To disable mirroring for Ceilometer queues, execute the following
command on one of the controllers:

    crm_resource --resource p_rabbitmq-server --set-parameter \
        enable_notifications_ha --parameter-value false

In order for any of the changes to take effect, RabbitMQ service
should be restarted. To do that, first execute

    pcs resource disable master_p_rabbitmq-server

Then monitor RabbitMQ state using command

    pcs resource

until it shows that all RabbitMQ nodes are stopped. Once they are,
execute the following command to start RabbitMQ:

    pcs resource enable master_p_rabbitmq-server

Beware: during restart all messages accumulated in RabbitMQ will be
lost. Also, OpenStack will stop functioning until RabbitMQ is up
again, so plan accordingly.

Note that it is not yet well tested how this configuration affects
failover when some cluster nodes go down. Hence it is experimental,
use at your own risk!

DocImpact:  ops-guide

Implements: blueprint rabbitmq-disable-mirroring-for-rpc
Change-Id: I80ae231ca64e2a903b0968d36ba0e85ca9cc9891

* Merge branch 'stable'

* Fix default value for 'use_fqdn' in meta_data

This change fixes the copy-paste gone wrong and pulls in the rabbitmq
upstream commit of c85fdd0f5c54f312fc2147dad2b956961aae3f12.

Closes-Bug: #1526062
Change-Id: I49e45cd893af8c65ed5ddd3efb834e38737a69a2

* Fix stop conditions for the rabbit OCF resource

* Fix the get_status() unexpectedly reports generic error
  instead of "not running"
* Add proc_stop and proc_kill functions
  (TODO these shall go as external common ocf heplers, eventually)
* Rework stop_server_process()
  - make it to return SUCCESS/ERROR as expected
  - grant the "rabbitmqctl stop" a graceful termintation window and only
    then ensure the beam process termination and pidfile removal as well
  - return the actual status with get_status()
* Rework kill_rmq_and_remove_pid()
  - use proc_stop to try to kill by pgrp with -TERM, then -KILL, or
    by the beam process name match, if there is no PID.
  - make it to returns SUCCESS/ERROR
* Fix action_stop()
  - fail early by the stop_server_process() results without additional
    rabbitmqctl invocations in the get_status() call
  - rework hard-coded sleep 10 to use the gracefull stop windows in the
    stop_server_process() instead
  - ensure the rabbit-start-time removal from CIB before to try to stop
    the server process
  - issue the "stop: action end" log record before the actual end
* Add comments and make logs to be more informational

Related Fuel bug https://bugs.launchpad.net/fuel/+bug/1529897

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Co-authored-by: Alex Schultz <aschultz@mirantis.com>

* Ensure rabbit node uptime is reset in the CIB for OCF resource

* Add ocf_run wrappers and info log messages for CIB attribute events
* Move "fast" CIB attribute updates before "heavy" operations like
  start/stop/wait to ensure CIB consistent even if the timeouts
  exceeded for the ops
* Delete master and start time attributes from CIB on action_start
  to ensure the correct rabbit nodes uptime evaluation for new
  master elections for corresponding pacemaker resources
* For post-demote notify and action_demote() delete the master
  attribute from CIB as well.
* For post-start notify, update the start time in the CIB even when
  the node is already clustered. Otherwise it would remain running
  in cluster w/o the start time registered, which affects the new
  master elections badly.
* fix wrong log message when joining by a node

Related Fuel bug https://bugs.launchpad.net/fuel/+bug/1530150
https://bugs.launchpad.net/fuel/+bug/1530296

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Fix rabbit OCF log message when joining by a node

Closes-bug: #1530296

Change-Id: Id2258da4f272dc8eca92130d45ecb69a16ed7c35
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Remove unneeded sleep for a graceful stop by PID

The sleep in not needed according to the
https://www.rabbitmq.com/man/rabbitmqctl.1.man.html
"If a pid_file is specified, also waits for the process
specified there to terminate."

Related Fuel bug https://launchpad.net/bugs/1529897
Related PR
https://github.com/rabbitmq/rabbitmq-server/pull/523

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Syntax and local vars usage fixes to OCF HA

Related Fuel bug:
https://launchpad.net/bugs/1529897

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Fix proc_kill then there is no pid found

W/o this fix, the rabbit OCF cannot make
proc_stop to try to kill the pid-less beam process
by its name matching because the proc_kill()'s
1st parameter cannot be passed empty.

The fix is to use the "none" value then the pid-less
process must be matched by the service_name instead.

Also, fix the proc_kill to deal with Multi process
pid files as well (there are many pids, a space separated).

Related Fuel bugs:
https://launchpad.net/bugs/1529897
https://launchpad.net/bugs/1532723

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Fix get_status, action_stop, proc_stop then beam's unresponsive

* Fix get status() to catch beam state and output errors
* Fix action_stop() to force name-based mathcing then no
pidfile and the beam's unresponsive
* Fix proc_stop to use name based matching if no pidfile
found
* Fix proc_stop to retry sending the signal when using the name
based match as well

W/o this patch, the situation is possible when:
- beam's running and cannot process signals, but is reported "not running"
by the get_status(), while in fact it shall be reported as generic error
- which_applications() returned error, while its output is still
being parsed for the "what" match, while it shall not.
- action stop and proc_stop gives up then there is no pidfile and the beam's
running unresponsive.

The solution is to make get_status to return generic error and action
stop to use the rabbit process name matching for killing it.

Related Fuel bug:
https://bugs.launchpad.net/fuel/+bug/1529897

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Fix monitor/stop operations for the rabbit OCF resource

W/o this fix, the situation is possible when:
- beam's running and cannot process signals, but is reported "not running"
by the get_status(), while in fact it shall be reported as generic error
- which_applications() returned error, while its output is still
being parsed for the "what" match, while it shall not.
- action stop and proc_stop gives up then there is no pidfile and the beam's
running unresponsive.

The solution is to make get_status to return generic error and action
stop to use the rabbit process name matching for killing it. These and
other related fixes listed below (tl;dr)

* Fix get_status, action_stop, proc_stop then beam's unresponsive
  (ie. fails to process signals or does it very slowly)
  - Fix get status() to catch beam state and output errors
  - Fix action_stop() to force name-based mathcing then no
    pidfile and the beam's unresponsive
  - Fix proc_stop to use name based matching if no pidfile
    found
  - Fix proc_stop to retry sending the signal when using the name
    based match as well
* Fix the get_status() unexpectedly reports generic error
  instead of "not running"
* Add reworked proc_stop and proc_kill functions from the
  ocf-fuel-funcs
* Rework stop_server_process()
  - make it to return SUCCESS/ERROR as expected
  - grant the "rabbitmqctl stop" a graceful termintation window and only
    then ensure the beam process termination and pidfile removal as well
  - return the actual status with get_status()
* Rework kill_rmq_and_remove_pid()
  - use proc_stop to try to kill by pgrp with -TERM, then -KILL, or
    by the beam process name match, if there is no PID.
  - make it to returns SUCCESS/ERROR
* Fix action_stop()
  - fail early by the stop_server_process() results without additional
    rabbitmqctl invocations in the get_status() call
  - rework hard-coded sleep 10 to use the gracefull stop windows in the
    stop_server_process() instead
  - ensure the rabbit-start-time removal from CIB before to try to stop
    the server process
  - issue the "stop: action end" log record before the actual end
* Add comments, adjust logs levels and make them to be more informational

Upstream PRs
https://github.com/rabbitmq/rabbitmq-server/pull/523
https://github.com/rabbitmq/rabbitmq-server/pull/532
https://github.com/rabbitmq/rabbitmq-server/pull/538
https://github.com/rabbitmq/rabbitmq-server/pull/540

Closes-bug: #1529897

Change-Id: I1c382e3cf004630847b6626fabaecaa0094ee271
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Ensure rabbit node uptime is reset in the CIB for OCF resource

* Add ocf_run wrappers and info log messages for CIB attribute events
* Move "fast" CIB attribute updates before "heavy" operations like
  start/stop/wait to ensure CIB consistent even if the timeouts
  exceeded for the ops
* Delete master and start time attributes from CIB on action_start
  to ensure the correct rabbit nodes uptime evaluation for new
  master elections for corresponding pacemaker resources
* For post-demote notify and action_demote() delete the master
  attribute from CIB as well.
* For post-start notify, update the start time in the CIB even when
  the node is already clustered. Otherwise it would remain running
  in cluster w/o the start time registered, which affects the new
  master elections badly.

Upstream RR https://github.com/rabbitmq/rabbitmq-server/pull/524
Closes-bug: #1530150

Change-Id: I9db3c819031cef620377b4fee08ea92e90b11c70
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Fix rabbitMQ OCF monitor detection of running master

When monitor detected the node as OCF_RUNNING_MASTER, this may be
lost while the monitor checks in progress.
* Rework the prev_rc by the rc_check to fix this.
* Also add info log if detected as running master.
* Break the monitor check loop early, if it shall be exiting to be
  restarted by pacemaker.
* Do not recheck the master status and do not update the master score,
  if the node was already detected by monitor as OCF_RUNNING_MASTER.
  By that point, the running and healthy master shall not be checked
  against other nodes uptime as it is pointless and only takes more
  time and resources for the action monitor to finish.
* Fail early, if monitor detected the node as OCF_RUNNING_MASTER, but
  the rabbit beam process is not running
* For OCF_CHECK_LEVEL>20, exclude the current node from the check
  loop as we already checked it before

Closes-bug: #1531838

Change-Id: I319db307c73ef24d829be44eeb63d1f52f4180fa
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Fix rabbitMQ OCF monitor detection of running master

When monitor detected the node as OCF_RUNNING_MASTER, this may be
lost while the monitor checks in progress.
* Rework the prev_rc by the rc_check to fix this.
* Also add info log if detected as running master.
* Break the monitor check loop early, if it shall be exiting to be
  restarted by pacemaker.
* Do not recheck the master status and do not update the master score,
  if the node was already detected by monitor as OCF_RUNNING_MASTER.
  By that point, the running and healthy master shall not be checked
  against other nodes uptime as it is pointless and only takes more
  time and resources for the action monitor to finish.
* Fail early, if monitor detected the node as OCF_RUNNING_MASTER, but
  the rabbit beam process is not running
* For OCF_CHECK_LEVEL>20, exclude the current node from the check
  loop as we already checked it before

Related Fuel bug:
https://launchpad.net/bugs/1531838

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Introduce node name prefix for mgmt/messaging IPs

RabbitMQ will resolve <prefix>-<fqdn> hostnames to valid mgmt/messaging IP

Change-Id: Ifc2af16b08663655d365587ea6f45c87bfc68698
Depends-On: I9813fa8c20d47e0ef1e251fe5ac8d01d08fe7703
Closes-bug: #1528707

* Add optional prefix for RabbitMQ node FQDNs

It would allow to instantiate multiple rabbit clusters constructed
from prefix-based instances of rabbit nodes.

* Reset master score if we decide to restart RabbitMQ on timeout

Doing otherwise might not trigger the restart while it is clearly
needed.

* Reset master score if we decide to restart RabbitMQ on timeout

Doing otherwise might not trigger the restart while it is clearly
needed.

Upstream PR: https://github.com/rabbitmq/rabbitmq-server/pull/560

Change-Id: I480ebaddc98fa0784098efbf0c5ab8c512c8661d
Closes-Bug: #1513421

* Improve rabbitmq OCF script diagnostics

Currently time-out when running 'rabbitmqctl list_channels' is treated
as a sign that current node is unhealthy. But it could not be the
case, as the hanging channel could be actually on some other
node. Given that currently we have more than one bug related to
'list_channels', it makes sense to improve diagnostics here.

This patch doesn't change any behaviour, only improves logging after
time-out happens. If time-outs continue to occur (even with latest
rabbitmq versions or with backported fixes), we could switch to this
improved list_channels and kill rabbitmq only if stuck channels are
located on current node. But I hope that all related rabbitmq bugs
were already closed.

* Improve 'list_channels' diagnostics in OCF

timeout(1) manpage mentions 124 as another valid return code from, in addition to 128 + signal-number.

* Fix usage of uninitialized variable in OCF script

* Fix uninitialized variable in rabbitmq script

Upstream: https://github.com/rabbitmq/rabbitmq-server/pull/571

Shell was sometimes complaining at line 1447 due to empty `rc_check`

Change-Id: I9411fbc41f8ebf6ac41504ff7456ee7952485564
Partial-Bug: #1531838

* Improve OCF script diagnostics for timed-out 'list_channels'

Upstream PR: https://github.com/rabbitmq/rabbitmq-server/pull/563

Currently time-out when running 'rabbitmqctl list_channels' is treated
as a sign that current node is unhealthy. But it could not be the
case, as the hanging channel could be actually on some other
node. Given that currently we have seen more than one bug related to
'list_channels', it makes sense to improve diagnostics here.

This patch doesn't change any behaviour, only improves logging after
time-out happens. If time-outs continue to occur (even with latest
rabbitmq versions or with backported fixes), we could switch to this
improved list_channels and kill rabbitmq only if stuck channels are
located on current node. But I hope that all related rabbitmq bugs
were already closed.

Change-Id: I4746d3a4e85dc2a51af581034ae09a1cf0eefce2
Partial-Bug: #1515223
Partial-Bug: #1513511

* Suppress curl progress indicator in rabbit OCF

curl is used by OCF script for fetching definitions (queues etc.), but
results of that invocation is shown as garbage in pacemaker logs -
progress indicator doesn't make any sense in logs.

According to curl manpage the following combination of options should be
used "--silent --show-error" - this will suppress only progress
indicator, errors will still be shown.

Also other short curl options are replaced with their long counterparts
- for improved readability.

* Fix uninitialized status_master

Fix multiple nodes may be reported in logs as the running master

Related Fuel bug https://bugs.launchpad.net/bugs/1540936

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Fix cluster membership check for running master

The running master is always inside of its own cluster.
Fix the cluster membership check when a node is the master.

* Fix uninitialized status_master

Fix multiple nodes may be reported in logs as the running master

Closes-bug: #1540936

Change-Id: Ic2dfe7b2ba657b9bf06d97f49ddb4b69f2f4e063
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Streamline checking for cluster partitioning

Move check if we are current cluster master to earlier place in code.
That way we will avoid unnecessary operations for master case.

* Fix action_stop for the rabbit OCF

The action_stop may sometimes stop the rabbitmq-server gracefully
by the PID, but leave unresponsive beam.smp processes running and
spoiling rabbits. Those shall be stopped as well. The solution is:
- make proc_stop() to accept a pid=none to use a name matching instead
- make kill_rmq_and_remove_pid() to stop by the beam process matching as well
- fix stop_server_process() to ensure there is no beam process left running

Closes-bug: #1541029

Change-Id: Ib9669d15bb714be8a88fd65d7f1815173da788d3
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Fix action_stop for the rabbit OCF

The action_stop may sometimes stop the rabbitmq-server gracefully
by the PID, but leave unresponsive beam.smp processes running and
spoiling rabbits. Those shall be stopped as well. The solution is:
- make proc_stop() to accept a pid=none to use a name matching instead
- make kill_rmq_and_remove_pid() to stop by the beam process matching as well
- fix stop_server_process() to ensure there is no beam process left running

Related Fuel bug: https://launchpad.net/bugs/1541029

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Do not check cluster health if master is not elected

Doing otherwise causes node to restart when get_monitor is called
within action_promote - it does not find a master and assumes that
it is running out of cluster.

Also, code is refactored a little bit - a new function returning
current master is created and is used in the changed code.

Closes-Bug: #1543154
Change-Id: If14fcfc915d76c9580be0a097b250d79cf953b9e

* Exit waiting loop once node has unjoined

Without the break we always wait for 50 seconds, even if we don't need
to wait at all.

Change-Id: Ib361fbac714d61056f4b9d71f23bb74af33abf77

* On neighbor promotion do nothing if we are already clustered

 + extracted function checking if we are in the same cluster with
   given node

 + made post-promote ignore promotion of self. Previously it was
   done inside jjj_join, but now we need to do that before the
   new check.

 + now we write "post-promote end" log entry at the very
   end of post-promote, not somewhere in the middle.

Closes-Bug: #1544036
Change-Id: Id28d6c94abe5d96452f7ecba2b3fe022f40afa0d

* Exit waiting loop once node has unjoined

Without the break we always wait for 50 seconds, even if we don't need
to wait at all.

* Private attributes usage in rabbitmq script

There are three types of rabbitmq attributes for pacemaker nodes:
	-'rabbit-master'
	-'rabbit-start-time'
	- timeouts:
		-'rabbit_list_channels_timeouts'
		-'rabbit_get_alarms_timeouts'
		-'rabbit_list_queues_timeouts'

Attributes with names 'rabbit-master' and 'rabbit-start-time' should be
public because we monitor this attributes in cycle for all nodes in our
script.

All timeouts attributes were changed to private to avoid unnecessary
transitions.

Also, --lifetime and --node options were removed for attrd_updater as
'lifetime' for this command is always 'reboot' and 'node' default value
is local one.

* Private attributes usage in rabbitmq script

There are three types of rabbitmq attributes for pacemaker nodes:
	-'rabbit-master'
	-'rabbit-start-time'
	- timeouts:
		-'rabbit_list_channels_timeouts'
		-'rabbit_get_alarms_timeouts'
		-'rabbit_list_queues_timeouts'

Attributes with names 'rabbit-master' and 'rabbit-start-time' should be
public because we monitor this attributes in cycle for all nodes in our
script.

All timeouts attributes were changed to private to avoid unnecessary
transitions.

Also, --lifetime and --node options were removed for attrd_updater as
'lifetime' for this command is always 'reboot' and 'node' default value
is local one.

Closes-bug: #1524672
Change-Id: Ie45ae3a82b8daa35dbdd977dc894877160af457b

* [OCF HA] Increase tolerable number of rabbitmqctl timeouts

We still see that rabbitmqctl list_channels times out from time
to time, though the RabbitMQ cluster is absolutely healthy in any
other aspect.

Setting max_rabbitmqctl_timeouts to 3 seems to be a sane default
to help avoid unnecessary restarts.

* [OCF HA] Log process id in RabbitMQ OCF script

Several OCF calls might run simultaneously. For example, it often
happens that two monitor calls intersect. Logging current process id
for each line helps distinguish logs of different calls.

Also aligned get_status() logging with format used in all other
parts of the script.

* [OCF HA] Do not check cluster health if master is not elected

Doing otherwise causes node to restart when get_monitor is called
within action_promote - it does not find a master and assumes that
it is running out of cluster.

Also, code is refactored a little bit - a new function returning
current master is created and is used in the changed code.

* Increase tolerable number of rabbitmqctl timeouts

We still see that rabbitmqctl list_channels times out from time
to time, though the RabbitMQ cluster is absolutely healthy in any
other aspect.

Setting max_rabbitmqctl_timeouts to 3 seems to be a sane default
to help avoid unnecessary restarts.

Upstream PR: https://github.com/rabbitmq/rabbitmq-server/pull/650

Closes-Bug: #1550293
Change-Id: I6b0686ef66ba3966e03c8706594f473e9ab01145

* [OCF HA] On neighbor promotion do nothing if we are already clustered

 + extracted function checking if we are in the same cluster with
   given node

 + made post-promote ignore promotion of self. Previously it was
   done inside jjj_join, but now we need to do that before the
   new check.

 + now we write "post-promote end" log entry at the very
   end of post-promote, not somewhere in the middle.

* Suppress curl progress indicator in rabbit OCF

Upstream PR: https://github.com/rabbitmq/rabbitmq-server/pull/597

curl is used by OCF script for fetching definitions (queues etc.), but
results of that invocation is shown as garbage in pacemaker logs -
progress indicator doesn't make any sense in logs.

According to curl manpage the following combination of options should be
used "--silent --show-error" - this will suppress only progress
indicator, errors will still be shown.

Also other short curl options are replaced with their long counterparts
- for improved readability.

Change-Id: I5ae35b3f76dc33be68c79f5dc983f0c779529fb9
Closes-Bug: #1540831

* Log process id in RabbitMQ OCF script

Several OCF calls might run simultaneously. For example, it often
happens that two monitor calls intersect. Logging current process id
for each line helps distinguish logs of different calls.

Also aligned get_status() logging with format used in all other
parts of the script.

Upstream PR: https://github.com/rabbitmq/rabbitmq-server/pull/653

Closes-Bug: 1553089
Change-Id: Icbaeb560021f70ef13e062cb79fe2cba84e33dce

* Revert "Merge "Private attributes usage in rabbitmq script""

This reverts commit 686bed1b4f090d7f6fd368b94a5ced12c8e28744, reversing
changes made to d42a753d75dc419c123de257a974ca9c175789f7.

Change-Id: I56ce3671558cf12ab7ce7d616e14cf27f3adb5f1
Closes-bug: #1556123

* Revert "Private attributes usage in rabbitmq script"

This reverts commit 4aeaa79bc566c81bc7f5c20d7afbe39c32771aba.

Related Fuel bug https://bugs.launchpad.net/fuel/+bug/1556123

* Revert "Private attributes usage in rabbitmq script"

This reverts commit 4aeaa79bc566c81bc7f5c20d7afbe39c32771aba.

Related Fuel bug https://bugs.launchpad.net/fuel/+bug/1556123

* Put the RabbitMQ OCF RA policy to /usr/sbin

* Fix failing pcs resource list command
  and move the policy file from the ocf to policy dir
* Configure the custom policy file to be picked
  in the /usr/sbin/set_rabbitmq_policy as the
  fuel-libraryX package installs it.
* As the upstream rabbitmq-server package does not
  install one, use the default policy OCF path param
  as the /usr/local/sbin/...
* Add the policy_file param and unit tests to the
  cluster::rabbitmq_ocf

Closes-bug: #1558627

Change-Id: I4937bde611b06c3e39385a322053610c98584d79
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Put the RabbitMQ OCF RA policy to /usr/sbin

* Fix failing pcs resource list command
* Move policy file to examples in docs dirs

Related Fuel bug: https://bugs.launchpad.net/fuel/+bug/1558627

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Fix half-hearted attempt to erase mnesia in OCF RA

ocf_run does `"$@"`, so "${MNESIA_FILES}/*" wasn't expanded and mnesia
directory wasn't actually cleaned up

Fuel bug: https://bugs.launchpad.net/fuel/+bug/1565868

* Fix half-hearted attempt to erase mnesia in OCF RA

ocf_run does $("$@"), so "${MNESIA_FILES}/*" wasn't expanded and mnesia
directory wasn't actually cleaned up

It's safe to remove that directory completely - it will be re-created
automatically by mnesia.

Upstream https://github.com/rabbitmq/rabbitmq-server/pull/724

Change-Id: I0aa47f61e03c99ee6ebb56b833463cdf4ccd243e
Closes-Bug: 1565868

* Stop a rabbitmq pacemaker resource when monitor fails

Related Fuel bug https://bugs.launchpad.net/fuel/+bug/1567355

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Stop a rabbitmq pacemaker resource when monitor fails

Upstream PR https://github.com/rabbitmq/rabbitmq-server/pull/731
Closes-bug: #1567355

Change-Id: I83415e0e2a40f0e99e7baa26e35b6f7463c52928
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* Stop process when rabbit is running but is not connected to master.

It's should goes down due to avoid split brain.

Change-Id: I4c51f8608702f2284d835ba9c3c9070b2c329ed8
Closes-Bug: #1541471
Upstream PR: https://github.com/rabbitmq/rabbitmq-server/pull/758

* Stop process when rabbit is running but is not connected to master.

It's should goes down due to avoid split brain.

Related Fuel bug: https://bugs.launchpad.net/fuel/+bug/1541471

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Co-authored-by: Maciej Relewicz <mrelewicz@mirantis.com>

* Private attributes usage in rabbitmq script

There are three types of rabbitmq attributes for pacemaker nodes:
	-'rabbit-master'
	-'rabbit-start-time'
	- timeouts:
		-'rabbit_list_channels_timeouts'
		-'rabbit_get_alarms_timeouts'
		-'rabbit_list_queues_timeouts'

Attributes with names 'rabbit-master' and 'rabbit-start-time' should be
public because we monitor this attributes in cycle for all nodes in our
script.

All timeouts attributes were changed to private to avoid unnecessary
transitions.

Also, --lifetime and --node options were removed for attrd_updater as
'lifetime' for this command is always 'reboot' and 'node' default value
is local one.

This reverts commit b2b191d2e28b96c9f9a6ea440a383cf4f691d8ad.
(As the pacemaker version was updated).

Closes-bug: #1524672

Change-Id: I6f0d4a99641b847321754d75605a78fbbc96ddad

* Private attributes usage in rabbitmq script

Required Pacemaker >= 1.1.13.
(The command 'attrd_updater' have '-p' option only since this version).

There are three types of rabbitmq attributes for pacemaker nodes:
	-'rabbit-master'
	-'rabbit-start-time'
	- timeouts:
		-'rabbit_list_channels_timeouts'
		-'rabbit_get_alarms_timeouts'
		-'rabbit_list_queues_timeouts'

Attributes with names 'rabbit-master' and 'rabbit-start-time' should be
public because we monitor this attributes in cycle for all nodes in our
script. All timeouts attributes were changed to private to avoid
unnecessary transitions.

Also, --lifetime and --node options were removed for attrd_updater as
'lifetime' for this command is always 'reboot' and 'node' default value
is local one.

* Check cluster_status liveness during OCF checks

We've observed some `autoheal` bug that made `cluster_status` became
stuck forever.

* Fix bashisms in OCF HA script

`-` is not allowed in function names by POSIX, and some
shells (e.g. `dash`) will consider this as a syntax error.

* Update iptables calls with --wait

If iptables is currently being called outside of the ocf script, the
iptables call will fail because it cannot get a lock. This change
updates the iptables call to include the -w flag which will wait until
the lock can be established and not just exit with an error.

* Update iptables calls with --wait

If iptables is currently being called outside of the ocf script, the
iptables call will fail because it cannot get a lock. This change
updates the iptables call to include the -w flag which will wait until
the lock can be established and not just exit with an error.

* Fix bashisms in rabbitmq OCF RA

Change "printf %b" to be passing the checkbashisms.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* [OCF HA] Add ocf_get_private_attr function to RabbitMQ OCF script

The function is extracted from check_timeouts to be re-used later
in other parts of the script. Also, swtich check_timeouts to use
existing ocf_update_private_attr function.

* [OCF HA] Rank master score based on start time

Right now we assign 1000 to the oldest nodes and 1 to others. That
creates a problem when Master restarts and no node is promoted until
that node starts back. In that case the returned node will have score
of 1, like all other slaves and Pacemaker will select to promote it
again. The node is clean empty and afterwards other slaves join to
it, wiping their data as well. As a result, we loose all the messages.

The new algorithm actually ranks nodes, not just selects the oldest
one. It also maintains the invariant that if node A started later
than node B, then node A score must be smaller than that of
node B. As a result, freshly started node has no chance of being
selected in preference to older node. If several nodes start
simultaneously, among them an older node might temporarily receive
lower score than a younger one, but that is neglectable.

Also remove any action on demote or demote notification - all of
these duplicate actions done in stop or stop notification. With these
removed, changing master on a running cluster does not affect RabbitMQ
cluster in any way - we just declare another node master and that is
it. It is important for the current change because master score might
change after initial cluster start up causing master migration from
one node to another.

This fix is a prerequsite for fix to Fuel bugs
https://bugs.launchpad.net/fuel/+bug/1559136
https://bugs.launchpad.net/mos/+bug/1561894

* [OCF HA] Enhance split-brain detection logic

Previous split brain logic worked as follows: each slave checked
that it is connected to master. If check fails, slave restarts. The
ultimate flaw in that logic is that there is little guarantee that
master is alive at the moment. Moreover, if master dies, it is very
probable that during the next monitor check slaves will detect its
death and restart, causing complete RabbitMQ cluster downtime.

With the new approach master node checks that slaves are connected to
it and orders them to restart if they are not. The check is performed
after master node health check, meaning that at least that node
survives. Also, orders expire in one minute and freshly started node
ignores orders to restart for three minutes to give cluster time to
stabilize.

Also corrected the problem, when node starts and is already clustered.
In that case OCF script forgot to start the RabbitMQ app, causing
subsequent restart. Now we ensure that RabbitMQ app is running.

The two introduced attributes rabbit-start-phase-1-time and
rabbit-ordered-to-restart are made private. In order to allow master
to set node's order to restart, both ocf_update_private_attr and
ocf_get_private_attr signatures are expanded to allow passing
node name.

Finally, a bug is fixed in ocf_get_private_attr. Unlike crm_attribute,
attrd_updater returns empty string instead of "(null)", when an
attribute is not defined on needed node, but is defined on some other
node. Correspondingly changed code to expect empty string, not a
"(null)".

This fix is a fix for Fuel bugs
https://bugs.launchpad.net/fuel/+bug/1559136
https://bugs.launchpad.net/mos/+bug/1561894

* Monitor rabbitmq from OCF with less overhead

This will stop wasting network bandwidth for monitoring.

E.g. a 200-node OpenStack installation produces aronud 10k queues and
10k channels. Doing single list_queues/list_channels in cluster in this
environment results in 27k TCP packets and around 12 megabytes of
network traffic. Given that this calls happen ~10 times a minute with 3
controllers, it results in pretty significant overhead.

To enable those features you shoud have rabbitmq containing following
patches:
- https://github.com/rabbitmq/rabbitmq-server/pull/883
- https://github.com/rabbitmq/rabbitmq-server/pull/911
- https://github.com/rabbitmq/rabbitmq-server/pull/915

* Perform partition checks from OCF HA script

Partitioned nodes are ordered to restart by master. It may sound like
`autoheal`, but the problem is that OCF script and `autoheal` are not
compatible because concepts of master in pacemaker and winner in
autoheal are completely unrelated.

* Allow node_health_check retries in OCF HA script

* [OCF HA] Do not suggest to run the second monitor action

Right now we suggest to users to run the second monitor for slaves
with depth=30. It made sense previously, when there was an additional
check at that depth. Right now we don't have any depth-specific
checks and hence it does not make sense to run the second monitor.
Moreover, removing the second monitor fixes an issue with Pacemaker
not reacting on failing monitor if it takes more than a minute. For
details see Fuel bug https://launchpad.net/bugs/1618843

* [OCF HA] Delete Mnesia schema on mnesia reset

Not doing so leads to RabbitMQ node being half-stuck in cluster. As a
result, it can't clearly join back and constantly fails. Details could
be found in the following Fuel bug:
https://bugs.launchpad.net/fuel/+bug/1620649

* Fix stdout/stderr redirects

Related Fuel bug https://bugs.launchpad.net/fuel/+bug/1506423

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>

* scripts: Take package-specific files from rabbitmq-server

[#130659985]

* Move all release handling bits to rabbitmq-release

[#130659985]

* OCF RA: Check partitions on non-master nodes

Partitions reported by `rabbit_node_monitor:partitions/0` are not
commutative (i.e. node1 can report itself as partitioned with node2, but
not vice versa).

Given that we now have strong notion of master in OCF script, we can
check for those fishy situations during master health check, and order
damaged nodes to restart.

Fuel bug: https://bugs.launchpad.net/fuel/+bug/1628487

* Correctly return exit code from stop

Panicking and returning non-success on stop often leads to resource
becoming unmanaged on that node.

Before we called get_status to verify that RabbitMQ is dead. But
sometimes it returns error even though RabbitMQ is not running. There
is no reason to call it - we will just verify that there is no beam
process running.

Related fuel bug - https://bugs.launchpad.net/fuel/+bug/1626933

* OCF RA: Don't hardcode primitive name in rabbitmq-server-ha.ocf

We can compute the name of the primitive automatically from environment
variables, instead of hard-coding p_rabbitmq-server; this makes the
resource agent more flexible.

Closes https://github.com/rabbitmq/rabbitmq-server-release/issues/23

* OCF RA: Don't hardcode primitive name in rabbitmq-server-ha.ocf

We can compute the name of the primitive automatically from environment
variables, instead of hard-coding p_rabbitmq-server; this makes the
resource agent more flexible.

Closes https://github.com/rabbitmq/rabbitmq-server-release/issues/23

* OCF RA: Add default_vhost parameter to rabbitmq-server-ha.ocf

This enables the cluster to focus on a vhost that is not /, in case the
most important vhost is something else.

For reference, other vhosts may exist in the cluster, but these are not
guaranteed to not suffer from any data loss. This patch doesn't address
this issue.

Closes https://github.com/rabbitmq/rabbitmq-server-release/issues/22

* OCF RA: Add new limit_nofile parameter to rabbitmq-server-ha OCF RA

This enables to change the limit of open files, as the default on
distributions is usually too low for rabbitmq. Default is 65535.

* OCF RA: Only set limit for open files when higher than current value

This allows to set the limit via some other way.

* Manually backport #20, #21, #24, #25 by @untz and @aplanas to stable

* Manually backport #20, #21, #24, #25 by @vuntz and @aplanas to stable

* Fix HA OCF script

Some parts of #21 have not been added to the stable branch. This change
fixes the issue by adding missing changes to rabbitmq-server-ha.ocf and
also fixing rabbitmq-server.ocf

* OCF RA: Avoid promoting nodes with same start time as master

It may happen that two nodes have the same start time, and one of these
is the master. When this happens, the node actually gets the same score
as the master and can get promoted. There's no reason to avoid being
stable here, so let's keep the same master in that scenario.

* OCF RA: Fix test for no node in start notification handler

If there's nothing starting and nothing active, then we do a -z " ",
which doesn't have the same result as -z "". Instead, just test for
emptiness for each set of nodes.

* OCF RA: Do not start rabbitmq if notification of start is not about us

Right now, every time we get a start notification, all nodes will ensure
the rabbitmq app is started. This makes little sense, as nodes that are
already active don't need to do that.

On top of that, this had the sideeffect of updating the start time for
each of these nodes, which could result in the master moving to another
node.

* OCF RA: Fix logging in start notification handler

The "post-start end" log message was written too early (some things were
still done afterwards), and not in all cases (it was inside a if
statement).

* OCF RA: Avoid promoting nodes with same start time as master

It may happen that two nodes have the same start time, and one of these
is the master. When this happens, the node actually gets the same score
as the master and can get promoted. There's no reason to avoid being
stable here, so let's keep the same master in that scenario.

(cherry picked from commit 62a4f7561171328cd1d62cab394d0bba269ea7ad)
(cherry picked from commit 861f2a57f916a9829e9a11092ada2bb52bdaf028)

* OCF RA: Fix syntax error

(cherry picked from commit a9b4a4ff97a96e798de51933fc44f61aa6bc88a3)

* OCF RA: Fix syntax error

(cherry picked from commit a9b4a4ff97a96e798de51933fc44f61aa6bc88a3)

* OCF RA: Do not consider local failures as remote node problems

In is_clustered_with(), commands that we run to check if the node is
clustered with us, or partitioned with us may fail. When they fail, it
actually doesn't tell us anything about the remote node.

Until now, we were considering such failures as hints that the remote
node is not in a sane state with us. But doing so has pretty negative
impact, as it can cause rabbitmq to get restarted on the remote node,
causing quite some disruption.

So instead of doing this, ignore the error (it's still logged).

There was a comment in the code wondering what is the best behavior;
based on experience, I think preferring stability is the slightly more
acceptable poison between the two options.

* Use ocf_attribute_target instead of crm_node

Instead of calling crm_node directly it is preferrable to use the
ocf_attribute_target function. This function will return crm_node -n
as usual, except when run inside a bundle (aka container in pcmk
language). Inside a bundle it will return the bundle name or, if the
meta attribute meta_container_attribute_target is set to 'host', it
will return the physical node name where the bundle is running.

Typically when running a rabbitmq cluster inside containers it is
desired to set 'meta_container_attribute_target=host' on the rabbit
cluster resource so that the RA is aware on which host it is running.

Tested both on baremetal (without containers):
 Master/Slave Set: rabbitmq-master [rabbitmq]
     Masters: [ controller-0 controller-1 controller-2 ]

And with bundles as well.

Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>

* URL Cleanup

This commit updates URLs to prefer the https protocol. Redirects are not followed to avoid accidentally expanding intentionally shortened URLs (i.e. if using a URL shortener).

# Fixed URLs

## Fixed Success
These URLs were switched to an https URL with a 2xx status. While the status was successful, your review is still recommended.

* [ ] http://www.apache.org/licenses/LICENSE-2.0 with 1 occurrences migrated to:
  https://www.apache.org/licenses/LICENSE-2.0 ([https](https://www.apache.org/licenses/LICENSE-2.0) result 200).

* Allow operator to disable iptables client blocking

Currently the resource agent hard-codes iptables calls to block off
client access before the resource becomes master. This was done
historically because many libraries were fairly buggy detecting a
not-yet functional rabbitmq, so they were being helped by getting
a tcp RST packet and they would go on trying their next configured
server.

It makes sense to be able to disable this behaviour because
most libraries by now have gotten better at detecting timeouts when
talking to rabbit and because when you run rabbitmq inside a bundle
(pacemaker term for a container with an OCF resource inside) you
normally do not have access to iptables.

Tested by creating a three-node bundle cluster inside a container:
 Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]
   Replica[0]
      rabbitmq-bundle-podman-0  (ocf::heartbeat:podman):        Started controller-0
      rabbitmq-bundle-0 (ocf::pacemaker:remote):        Started controller-0
      rabbitmq  (ocf::rabbitmq:rabbitmq-server-ha):     Master rabbitmq-bundle-0
   Replica[1]
      rabbitmq-bundle-podman-1  (ocf::heartbeat:podman):        Started controller-1
      rabbitmq-bundle-1 (ocf::pacemaker:remote):        Started controller-1
      rabbitmq  (ocf::rabbitmq:rabbitmq-server-ha):     Master rabbitmq-bundle-1
   Replica[2]
      rabbitmq-bundle-podman-2  (ocf::heartbeat:podman):        Started controller-2
      rabbitmq-bundle-2 (ocf::pacemaker:remote):        Started controller-2
      rabbitmq  (ocf::rabbitmq:rabbitmq-server-ha):     Master rabbitmq-bundle-2

The ocf resource was created inside a bundle with:
pcs resource create rabbitmq ocf:rabbitmq:rabbitmq-server-ha avoid_using_iptables="true" \
  meta notify=true container-attribute-target=host master-max=3 ordered=true \
  op start timeout=200s stop timeout=200s promote timeout=60s bundle rabbitmq-bundle

Signed-off-by: Michele Baldessari <michele@acksyn.org>

* Allow rabbitmq to run in a larger cluster composed of also non-rabbitmq nodes

We introduce the OCF_RESKEY_allowed_cluster_node parameter which can be used to specify
which nodes of the cluster rabbitmq is expected to run on. When this variable is not
set the resource agent assumes that all nodes of the cluster (output of crm_node -l)
are eligible to run rabbitmq. The use case here is clusters that have a large
numbers of node, where only a specific subset is used for r…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants