Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (p99 spikes over threshold) in OMBValidationTest.test_max_partitions #18481

Closed
vbotbuildovich opened this issue May 14, 2024 · 14 comments
Labels
auto-triaged used to know which issues have been opened from a CI job ci-failure performance sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages

Comments

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented May 14, 2024

https://buildkite.com/redpanda/vtools/builds/13711

Module: rptest.redpanda_cloud_tests.omb_validation_test
Class: OMBValidationTest
Method: test_max_partitions
test_id:    OMBValidationTest.test_max_partitions
status:     FAIL
run time:   787.923 seconds

AssertionError("['Metric aggregatedEndToEndLatency99pct, value 131.356, Expected to be <= 120.0, check failed.', 'Metric aggregatedEndToEndLatency999pct, value 310.797, Expected to be <= 200.0, check failed.']")
Traceback (most recent call last):
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 276, in run_test
    return self.test_context.function(self.test)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 103, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/redpanda_cloud_tests/omb_validation_test.py", line 458, in test_max_partitions
    benchmark.check_succeed()
  File "/home/ubuntu/redpanda/tests/rptest/services/openmessaging_benchmark.py", line 384, in check_succeed
    OMBSampleConfigurations.validate_metrics(self._metrics,
  File "/home/ubuntu/redpanda/tests/rptest/services/openmessaging_benchmark_configs.py", line 123, in validate_metrics
    assert is_valid, str(results)
AssertionError: ['Metric aggregatedEndToEndLatency99pct, value 131.356, Expected to be <= 120.0, check failed.', 'Metric aggregatedEndToEndLatency999pct, value 310.797, Expected to be <= 200.0, check failed.']

JIRA Link: CORE-2959

@vbotbuildovich vbotbuildovich added auto-triaged used to know which issues have been opened from a CI job ci-failure labels May 14, 2024
@rpdevmp
Copy link
Contributor

rpdevmp commented May 15, 2024

There are 2 readings in a row above allowed:

170.727, 370.477,

"endToEndLatency99pct" : [ 43.147, 51.325, 47.856, 46.15, 42.678, 44.578, 45.264, 43.281, 46.71, 45.613, 44.348, 45.829, 44.663, 44.317, 48.659, 45.526, 170.727, 370.477, 43.558, 49.542, 49.821, 47.037, 51.814, 46.226, 44.755, 47.846, 44.819, 44.503, 42.481, 48.834 ],

209.434, 456.771

"endToEndLatency999pct" : [ 54.919, 157.728, 61.438, 59.856, 56.24, 55.506, 58.216, 57.755, 67.128, 132.129, 56.255, 58.183, 57.662, 53.804, 59.724, 56.227, 209.434, 456.771, 54.495, 64.736, 64.122, 122.681, 77.082, 60.931, 55.851, 60.86, 66.105, 60.409, 52.206, 62.648 ],

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@travisdowns
Copy link
Member

@rpdevmp - but are they disk spikes? If they fit the pattern of temporary blips in disk performance then we should still treat it as a spike, e.g., maybe relax the rule about only 1 in a row.

@travisdowns travisdowns added the sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages label Jun 23, 2024
@travisdowns travisdowns changed the title CI Failure (key symptom) in OMBValidationTest.test_max_partitions CI Failure (p99 spikes over threshold) in OMBValidationTest.test_max_partitions Jun 23, 2024
@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@piyushredpanda
Copy link
Contributor

Closing older-bot-filed CI issues as we transition to a more reliable system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-triaged used to know which issues have been opened from a CI job ci-failure performance sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages
Projects
None yet
Development

No branches or pull requests

4 participants