Skip to content

Conversation

@mergify
Copy link

@mergify mergify bot commented Jun 3, 2025

Why

The retry logic I added in 4621fe7 was completely wrong. If Khepri reached its own timeout of 30 seconds (as of this writing), the mirrored supervisor would retry 50 times because it would not check the time spent. This means it would retry for 25 minutes. Nice.

That retry would be terminated forcefully by the parent supervisor after 5 minutes if it was part of a shutdown.

How

This time, the code simply pass the error (timeout or something else) down to the following case. It will shut the mirrored supervisor down.

This fixes very long RabbitMQ node termination (at least 5 minutes, sometimes more) in testsuites. An example to reproduce:

gmake -C deps/rabbitmq_mqtt \
  RABBITMQ_METADATA_STORE=khepri \
  ct-v5 t=cluster_size_3:session_takeover_v3_v5

In this one, the third node of the cluster will take 5+ minutes to stop.


This is an automatic backport of pull request #14018 done by Mergify.

[Why]
The retry logic I added in 4621fe7
was completely wrong. If Khepri reached its own timeout of 30 seconds (as
of this writing), the mirrored supervisor would retry 50 times because
it would not check the time spent. This means it would retry for 25
minutes. Nice.

That retry would be terminated forcefully by the parent supervisor after
5 minutes if it was part of a shutdown.

[How]
This time, the code simply pass the error (timeout or something else)
down to the following `case`. It will shut the mirrored supervisor down.

This fixes very long RabbitMQ node termination (at least 5 minutes,
sometimes more) in testsuites. An example to reproduce:

    gmake -C deps/rabbitmq_mqtt \
      RABBITMQ_METADATA_STORE=khepri \
      ct-v5 t=cluster_size_3:session_takeover_v3_v5

In this one, the third node of the cluster will take 5+ minutes to stop.

(cherry picked from commit 376dd2c)
@michaelklishin michaelklishin added this to the 4.1.1 milestone Jun 3, 2025
@michaelklishin michaelklishin merged commit 280ce65 into v4.1.x Jun 3, 2025
539 of 540 checks passed
@michaelklishin michaelklishin deleted the mergify/bp/v4.1.x/pr-14018 branch June 3, 2025 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants