rptest: fix test_exceed_broker_limit flake #17932

travisdowns · 2024-04-17T21:13:14Z

ConnectionLimitsTest.test_exceed_broker_limit had a spurious failure in CI. This test starts 2 consumers which (should) consume all 6 available connections, then checks that a producer started after that fails to produce (due to connection limit being hit).

However the consumer & producer starts are all async, so the producer can race ahead of one of the consumers and grab the connections for itself, failing the test.

Change the test to wait for the consumers to connect, by waiting until the connection metric hits 6, then starts the producer.

Fixes #17897.

Backports Required

Release Notes

none

oleiman

makes sense. lgtm

dotnwat · 2024-04-18T01:37:26Z

tests/rptest/tests/connection_limits_test.py

+        # producer, since otherwise the consumers and the producer race and the producer
+        # may win in which case it would be one of the consumers that fail to connect
+        self.redpanda.wait_until(
+            lambda: connection_count() == 6, 60, 1,


any chance this becomes flaky? for example a once-in-a-while connection gets dropped and re-opened by franz-go/rpk but the metric doesn't update so quickly?

I did consider the possible flakiness here.

For one thing, this just wouldn't work on a system where there are any unaccounted-for connections, e.g., in a cloud test with other stuff connected (say, kminion) this condition might simply fail (connections may never hit 6, they may be above even from the start, or jump from 1 to 4 to 7 or something like that. However, this test is already written in a way that expects a clean system since it counts connections "exactly".

The metric itself updates instantly, but it's based on RP's view of the connections, so if a connection was silently dropped then it could continue to reflect a non-existent connection for a while as you suggest. This doesn't seem that likely as we are making fresh connections and waiting for the number to hit the expected count which generally happens almost instantly.

However, I think it would reduce the chance of future flakiness if I made this >= 6, rather than == 6, at the cost of not being informed about unexpected changes in the connection behavior. I think that's probably closer to the original intent of this test. I'll make that change unless anyone disagrees.

Changed to >= 6 in cb22f0a

ConnectionLimitsTest.test_exceed_broker_limit had a spurious failure in CI. This test starts 2 consumers which (should) consume all 6 available connections, then checks that a producer started after that fails to produce (due to connection limit being hit). However the consumer & producer starts are all async, so the producer can race ahead of one of the consumers and grab the connections for itself, failing the test. Change the test to wait for the consumers to connect, by waiting until the connection metric hits 6, then starts the producer. Fixes redpanda-data#17897.

dotnwat · 2024-04-19T03:04:01Z

gtest_raft_rpunit failed. this pr touches no .cc file or files associated with unit testing.

vbotbuildovich · 2024-04-19T03:04:32Z

/backport v23.3.x

oleiman previously approved these changes Apr 17, 2024

View reviewed changes

dotnwat reviewed Apr 18, 2024

View reviewed changes

travisdowns dismissed oleiman’s stale review via cb22f0a April 18, 2024 17:39

travisdowns force-pushed the td-17872-connect-limit-test-flake branch from 384fc37 to cb22f0a Compare April 18, 2024 17:39

dotnwat approved these changes Apr 19, 2024

View reviewed changes

dotnwat merged commit ce44d56 into redpanda-data:dev Apr 19, 2024
13 of 16 checks passed

This was referenced Apr 19, 2024

[v23.3.x] [v23.3.x] CI Failure (key symptom) in WriteCachingFailureInjectionTest.test_unavoidable_data_loss #17959

Closed

[v23.3.x] rptest: fix test_exceed_broker_limit flake #17960

Merged

travisdowns mentioned this pull request May 9, 2024

CI Failure (Producer should have failed) in ConnectionLimitsTest.test_exceed_broker_limit #17872

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rptest: fix test_exceed_broker_limit flake #17932

rptest: fix test_exceed_broker_limit flake #17932

travisdowns commented Apr 17, 2024

oleiman left a comment

dotnwat Apr 18, 2024

travisdowns Apr 18, 2024 •

edited

travisdowns Apr 18, 2024

dotnwat commented Apr 19, 2024

vbotbuildovich commented Apr 19, 2024

rptest: fix test_exceed_broker_limit flake #17932

rptest: fix test_exceed_broker_limit flake #17932

Conversation

travisdowns commented Apr 17, 2024

Backports Required

Release Notes

oleiman left a comment

Choose a reason for hiding this comment

dotnwat Apr 18, 2024

Choose a reason for hiding this comment

travisdowns Apr 18, 2024 • edited

Choose a reason for hiding this comment

travisdowns Apr 18, 2024

Choose a reason for hiding this comment

dotnwat commented Apr 19, 2024

vbotbuildovich commented Apr 19, 2024

travisdowns Apr 18, 2024 •

edited