-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Provide metrics to monitor certificates expiration #3761
Comments
Triaged on 12.4.2022: This makes sense for the CAs:
User certificates are not that easy because the User Operator doesn't know whether the certificate is actually used by the client or not. This would be better solved at the client side. |
Thanks for the update. |
I think this is harder than #5413. There are two parts to this:
Of course if you wanna look into it, we will try or best to help you. |
@scholzj @maciej-tatarski and I would like to work on this, we have a solution in our company, not directly using Strimzi but build around and have some suggestions for dashboards as well. |
One suggestion would be to expose the actual epoch of the expiration date instead of |
Can you elaborate a bit more on it?
|
@maciej-tatarski will elaborate on the epoch :) @scholzj Regarding our existent solution, we don't plan to use that as part of this implementation, rather decom it when this is done. What we have done, is to implement dashboards and alerts on certs based on a K8s CronJob to monitor our external secret store and emit metrics that way, because we were missing this. What we want to do in this solution is to emit the metric when a cluster is created, or secret is updated and remove it when a cluster is deleted, i.e. in the |
I think epoch is better because in grafana you can easily visualize it as a date or days to expiry, because it fits default grafana time format.. Additionally it gives you more precise data, as it is in seconds. |
Ok, I guess that makes sense. We would still need to figure out what would be the best way to expose these metrics. One of the main issues is how to cleanly remove them when the cluster is deleted. |
I see, there are no callbacks or anything on deletion in the |
To be honest, I do not remember the details exactly. But in general, the deletion is done by Kubernetes and its garbage collection. It is not always simple to remove the metrics for the deleted resources. But if you don't do it, they usually stay set until the operator restarts. |
Makes sense, @maciej-tatarski and I would gladly give it a go and see if we can come up with anything meaningful :) |
Ok, great. That sounds like a plan then. |
Signed-off-by: Steffen Karlsson <steffen.karlsson@maersk.com> Signed-off-by: Steffen Wirenfeldt Karlsson <steffen.karlsson@maersk.com> Signed-off-by: maciej-tatarski <maciej.tatarski@maersk.com> Co-authored-by: Jakub Scholz <www@scholzj.com> Co-authored-by: maciej-tatarski <maciej.tatarski@maersk.com>
Is your feature request related to a problem? Please describe.
Lack of visibility regarding the validity period of certificates created by the cluster & user operators.
Describe the solution you'd like
Expose metrics to monitor the expiration of the certificates through visualisations and alerts.
The text was updated successfully, but these errors were encountered: