Improve stream reader metrics cleanup #3340

kjnilsson · 2021-08-26T11:00:12Z

When stream readers crash, we want to cleanup metrics, otherwise stream consumer and publisher metrics will diverge from reality, and nodes will report a higher number of these than there are in reality.

We cleaned up some logs in the process, and the replica_recovery test in rabbit_stream_queue SUITE was made more reliable.

@kjnilsson

Otherwise metrics will not get cleaned up correctly when processes crash. It's also tidier to do this in a single place, in terminate/3 Pair: @kjnilsson Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

@kjnilsson

Pair: @kjnilsson Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

@kjnilsson

Pair: @kjnilsson Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

This ensures that only nodes that are ready to host stream members are included in the election. This avoids continuous restart attempts when the rabbit application is stopped.

Some logs used ~p to format a full stack trace. Given these warnings are emitted during any nodedown this unnecessarily pollutes the logs. Trimmed using ~W instead.

Also increased the tick timeout to avoid checking for new rabbit nodes to auto add too often. Also increased sleep times for nodedowns to retry less often.

When the server initiate connection close.

@kjnilsson

Rather than sleeping for 6 seconds, we want to check that replica recovered multiple times within 30 seconds, and either eventually succeed, or fail if this does not recover within 30 seconds, the default await_condition time interval. Pair: @kjnilsson Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

michaelklishin · 2021-09-01T02:08:37Z

@Mergifyio backport v3.9.x

mergify · 2021-09-01T02:09:31Z

Command backport v3.9.x: success

Backports have been created

#3354 Improve stream reader metrics cleanup (backport #3340) has been created for branch v3.9.x

Improve stream reader metrics cleanup (backport #3340)

I've missed adding notes to the release part of #3340 I am going to add a new item to the PULL_REQUEST_TEMPLATE.md checklist so that we get reminded about it next time. Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

I've missed adding notes to the release part of #3340 I am going to add a new item to the PULL_REQUEST_TEMPLATE.md checklist so that we get reminded about it next time. Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk> (cherry picked from commit f6c8356)

gerhard force-pushed the improve-stream-reader-metrics-cleanup branch from b288ffc to 97704ec Compare August 27, 2021 10:55

gerhard and others added 7 commits August 31, 2021 15:29

Perform stream reader cleanup in terminate

dad0025

Otherwise metrics will not get cleaned up correctly when processes crash. It's also tidier to do this in a single place, in terminate/3 Pair: @kjnilsson Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

Test stream publisher & consumer counters

0ecf3d4

Pair: @kjnilsson Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

Test that we start from 0 publishers & consumers

6c0ba03

Pair: @kjnilsson Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

Stream coordinator: only return tail info if osiris app is started

9092de1

This ensures that only nodes that are ready to host stream members are included in the election. This avoids continuous restart attempts when the rabbit application is stopped.

Tidy up some stream coordinator warning logs

b59d87d

Some logs used ~p to format a full stack trace. Given these warnings are emitted during any nodedown this unnecessarily pollutes the logs. Trimmed using ~W instead.

stream coordinator: further logging improvements

c016567

Also increased the tick timeout to avoid checking for new rabbit nodes to auto add too often. Also increased sleep times for nodedowns to retry less often.

Fix function_clause error in stream reader

c240ec2

When the server initiate connection close.

gerhard force-pushed the improve-stream-reader-metrics-cleanup branch from 97704ec to c240ec2 Compare August 31, 2021 14:29

gerhard added the backport v3.9.x label Aug 31, 2021

gerhard marked this pull request as ready for review August 31, 2021 17:36

gerhard merged commit 2a35b1c into master Aug 31, 2021

gerhard deleted the improve-stream-reader-metrics-cleanup branch August 31, 2021 17:36

gerhard added this to the 3.9.5 milestone Aug 31, 2021

michaelklishin modified the milestones: 3.9.5, 3.9.6 Sep 1, 2021

mergify bot mentioned this pull request Sep 1, 2021

Improve stream reader metrics cleanup (backport #3340) #3354

Merged

michaelklishin added a commit that referenced this pull request Sep 1, 2021

Merge pull request #3354 from rabbitmq/mergify/bp/v3.9.x/pr-3340

9863226

Improve stream reader metrics cleanup (backport #3340)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve stream reader metrics cleanup #3340

Improve stream reader metrics cleanup #3340

kjnilsson commented Aug 26, 2021 •

edited by gerhard

michaelklishin commented Sep 1, 2021

mergify bot commented Sep 1, 2021

Improve stream reader metrics cleanup #3340

Improve stream reader metrics cleanup #3340

Conversation

kjnilsson commented Aug 26, 2021 • edited by gerhard

michaelklishin commented Sep 1, 2021

mergify bot commented Sep 1, 2021

kjnilsson commented Aug 26, 2021 •

edited by gerhard