Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORE-1743 rptest: fix flaky metric check by disabling leader balancer #17833

Merged
merged 1 commit into from
Apr 15, 2024

Conversation

nvartolomei
Copy link
Contributor

Fixes #16342

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x

Release Notes

  • none

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Apr 12, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/47735#018ed344-70ed-493a-8653-dec48f325166:

"rptest.tests.partition_move_interruption_test.PartitionMoveInterruption.test_cancelling_partition_move.replication_factor=3.unclean_abort=True.recovery=restart_recovery.compacted=False"

new failures in https://buildkite.com/redpanda/redpanda/builds/47735#018ed34b-3e86-4bdd-8ae4-ddf947ee733f:

"rptest.tests.cloud_storage_timing_stress_test.CloudStorageTimingStressTest.test_cloud_storage_with_partition_moves.cleanup_policy=delete"

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Apr 12, 2024

Comment on lines +188 to +190
cloud_storage_cache_chunk_size=self.default_chunk_size,
# Disable leader balancer to have stable node to fetch metrics from.
enable_leader_balancer=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would making the metric check be more flexible be an alternative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be always a chance of a race condition between querying who the leader is and the metric value.

So you have ideas? For this particular test it would probably be easier to just start one replica and not bother about leadership at all

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i approved the pr, i'm just asking if it's possible/reasonable.

So you have ideas?

if in the face of leadership transfer there is a metrics query, potentially involving aggregation across endpoints, that even with the leadership transfer correctly asserts the property being tested for, then that would be the idea. not that it's easy or possible. just wondering, because less we assume about the system (e.g. balancer enabled/disabled) the more robust the tests are.

@nvartolomei
Copy link
Contributor Author

/dt

@nvartolomei nvartolomei merged commit 38775dd into redpanda-data:dev Apr 15, 2024
17 checks passed
@nvartolomei nvartolomei deleted the nv/issue-16342 branch April 15, 2024 15:18
@dotnwat dotnwat changed the title rptest: fix flaky metric check by disabling leader balancer CORE-1743 rptest: fix flaky metric check by disabling leader balancer Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI Failure (MetricCheckFailed) in CloudStorageChunkReadTest.test_read_chunks
4 participants