Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud_storage/ducktape: allow bad logs in scope #14143

Conversation

abhijat
Copy link
Contributor

@abhijat abhijat commented Oct 13, 2023

A utility is added which allows logs to be ignored in a specific time frame.

The allow list is added to the cluster decorator:

@cluster(log_allow_list=[..., TimedLogAllowList(r'message123')]

Then within that test:

# message123 errors not ignored here

with bad_logs_allowed('message123'):
    # message123 errors ignored here

# message123 errors not ignored here

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

  • none

@abhijat
Copy link
Contributor Author

abhijat commented Oct 13, 2023

/ci-repeat 10
skip-units
dt-repeat=100
tests/rptest/tests/cloud_storage_scrubber_test.py::CloudStorageScrubberTest.test_scrubber

@abhijat abhijat force-pushed the issue/14132/allow-bad-logs-when-write-disabled branch from f325a59 to d21bc72 Compare October 13, 2023 09:45
@abhijat
Copy link
Contributor Author

abhijat commented Oct 13, 2023

/ci-repeat 10
skip-units
dt-repeat=100
tests/rptest/tests/cloud_storage_scrubber_test.py::CloudStorageScrubberTest.test_scrubber

@abhijat abhijat force-pushed the issue/14132/allow-bad-logs-when-write-disabled branch from d21bc72 to 226650f Compare October 13, 2023 14:50
@abhijat abhijat force-pushed the issue/14132/allow-bad-logs-when-write-disabled branch 2 times, most recently from f026336 to a648865 Compare October 13, 2023 15:12
@abhijat abhijat marked this pull request as ready for review October 13, 2023 15:12
@abhijat abhijat force-pushed the issue/14132/allow-bad-logs-when-write-disabled branch from a648865 to a8ea3ed Compare October 13, 2023 15:17
@abhijat
Copy link
Contributor Author

abhijat commented Oct 13, 2023

leaving this PR up as the tooling could be useful, but removed the changes for #14132 because it might need change in the C++ code.

Usually a test will ignore bad log strings throughout the test run. However
there are instances where we know exactly when the bad log lines are
acceptable but not elsewhere.

This change adds a bad log line matcher which is initialized with the
expression to match but no timestamps. Later the test can use the new
decorator added here to mark the time ranges where the expression is
acceptable.

When searching for bad log lines to ignore, both the expression and the
datetime range must match the log entry for it to be allowed.
@abhijat abhijat force-pushed the issue/14132/allow-bad-logs-when-write-disabled branch from a8ea3ed to 109a1de Compare October 13, 2023 15:20
@abhijat abhijat changed the title cloud_storage/ducktape: allow bad logs when write disabled cloud_storage/ducktape: allow bad logs in scope Oct 13, 2023
Copy link
Contributor

@nvartolomei nvartolomei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixed feelings. Tests that make assumptions about time/clocks are generally flaky and should be avoided. This one makes even more assumptions. I.e. that clocks are synchronised to millisecond precision across machines (in CDT ducktape runs on distinct machine from Redpanda). Truncating time and adding tolerances I think will make it just worse. I'd prefer the following alternatives over this one:

  1. Ignore specific error loglines for the entire test
  2. Change log level to WARN, quoting our internal logging guide (Logging in Redpanda Core on the wiki): "Issues with external systems (S3, Kafka Clients) are not considered errors: they are warnings."

@abhijat
Copy link
Contributor Author

abhijat commented Oct 30, 2023

Mixed feelings. Tests that make assumptions about time/clocks are generally flaky and should be avoided. This one makes even more assumptions. I.e. that clocks are synchronised to millisecond precision across machines (in CDT ducktape runs on distinct machine from Redpanda). Truncating time and adding tolerances I think will make it just worse. I'd prefer the following alternatives over this one:

  1. Ignore specific error loglines for the entire test
  2. Change log level to WARN, quoting our internal logging guide (Logging in Redpanda Core on the wiki): "Issues with external systems (S3, Kafka Clients) are not considered errors: they are warnings."

Sorry, I missed the review. The idea here was to add some sort of tolerance to a specific error in a scope, which is unfortunately tied to time. This is useful in cases where for example we toggle a feature on and off which causes an error.

So this is better than the alternative, instead of saying the error is acceptable for the entire run of the test, we are saying it is only acceptable for a shorter time span. In worst case scenario we may get false negatives due to the time differences.

However I do not see the utility of this any more as the original issue will not be served by this change, so I will close the PR.

@abhijat abhijat closed this Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants