-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TLS Metrics: Ensure that recorded expiries advance on credentials reload #17233
TLS Metrics: Ensure that recorded expiries advance on credentials reload #17233
Conversation
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46547#018e6001-2262-4a58-8cd3-e7355cfd9f13 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46547#018e6001-2259-437f-8b2c-206434377909 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46547#018e6013-392c-4da7-a59e-c7f747695acc ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46589#018e6345-e9c0-4d0a-905d-8789e588d50d |
src/v/net/probes.cc
Outdated
_cert_expiry_time = clock_type::time_point::max(); | ||
_ca_expiry_time = clock_type::time_point::max(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should reset()
really be setting these to max when !certs_info.has_value() || !ts_info.has_value()
?
Should that check also ensure they are not empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably they should be cleared down to something nominally invalid in either case, right? Since we're monitoring a file, I assumed whatever state we previously had in the probe should be 100% junk, irrespective of whether the new creds are valid, empty, etc.
That said, it's quite possible/likely that I misunderstand the common case for administering TLS certificates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think that min would be a safer number than max; I'd probably have a check that the cert doesn't expire in the next week or so, and max would satisfy that. I'm assuming here that when the callback is called we are expecting valid certificates. But maybe it means that TLS certificates have been turned off?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have a check that the cert doesn't expire in the next week or so, and max would satisfy that
Not sure I follow. I was thinking about something like
- always reset probe states to time_point::min (effectively "invalid")
- use a couple of temporary optionals to track the min expiry in the creds
- set probe state to the new value at the very end
Does that cover our bases? Is there an edge case I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took sort of a hard left turn - trying to make it harder to represent invalid states. Very much open to suggestions here.
Previously, on certificate load, expiry times would be overwritten iff the expiry of the new cert was sooner than the old stored expiry. This is a bug. Each time credentials are reloaded, the recorded expiry should always reflect the soonest expiring cert in the _current_ credential set, irrespective of whatever the probe stored previously. To that end, this commit refactors tls_certificate_probe such that: - The default state is "no cert"/"no CA" - The default expiry time for a cert/CA is time_point::min - Certificate state is cleared immediately and unconditionally at load time. With these changes, probe state when `!_cert_loaded` can be considered "junk" but should produce correct semantics for all metrics.
175da6c
to
c71c579
Compare
force push heavier handed refactor in response to CR comments. |
new failures in https://buildkite.com/redpanda/redpanda/builds/46589#018e6333-1d8e-48f3-a40e-d30571821ee7:
|
CI Failures: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
/backport v23.3.x |
What it says on the box.
Fixes #16840
Backports Required
Release Notes
Bug Fixes