You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
abhijat
changed the title
CI Failure (assertion error) ManyPartitionsTest.test_omb
CI Failure (assertion error for metric aggregatedEndToEndLatencyAvg) ManyPartitionsTest.test_ombSep 8, 2022
This test mainly functions as a quick check that if someone runs OMB against a system with high partition count it doesn't fall over -- the actual latency pass/fail threshold is inherited from the pre-existing UNIT_TEST_LATENCY_VALIDATOR that's meant to be quite liberal (although apparently not liberal enough)
The actual performance is expected to be somewhat below-par on a system with maximum density of partitions/core, although it is still kind of interesting that it's this inconsistent.
The latency is fine initially but goes bad ~100s into the test, around the same time there are a bunch of rpc request timeouts.
These tests only run with INFO-level logging, so that's the extent of how much information is available. We can't see if e.g. there was leadership instability.
An immediate, but temporary, solution to this is to increase the upper limit for avg. latency. In the longer term we are seeing similar spikes in latency in our benchmarking efforts with OMB. So my plan would be increasing the latency limit for now. Then we will be able to look at this issue more closely in our benchmarking runs and hopefully figure out an actionable reason for these latency spikes. Once that is done we should be to return the latency limit back to what it was.
To that end I'll be opening a PR later today with the latency limit for this test increased.
FAIL test: ManyPartitionsTest.test_omb (2/2 runs)
failure at 2022-09-08T07:19:27.327Z: AssertionError("['Metric aggregatedEndToEndLatencyAvg, value 137.5648158112895, Expected to be <= 50, check failed.']")
in job https://buildkite.com/redpanda/vtools/builds/3487#01831b0e-22b1-4569-998d-ba2249309e87
The text was updated successfully, but these errors were encountered: