-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add checks to provide feedback during cert rotation #3781
Comments
👍 from me! I love seeing the success and failure messages. Mind doing ones for the others as well? |
That is correct. |
@grampelberg @zaharidichev Some additional error messages, per our standup conversation:
@zaharidichev LMK how difficult it is to split up the checks into the control plane and data plane categories. #3696 currently has all the checks grouped under the data plane category. Also, for data plane checks warnings and errors, I listed the affected namespace to help users to locate the errors source. I am not sure if that is sufficient or if we want to list all the pods (which can appear cluttered where there are many pods). LMK if this is doable without requiring a massive refactoring. |
@zaharidichev Depending on the amount of effort required, we can make the event publication stuff optional. At this point, completing #3677 and #3696 is more important. LMK if I can help you out in any ways. Thanks. |
I agree that listing namespaces is better than dumping a potentially huge list of pods. |
The purpose of this issue is to introduce additional checks to provide helpful feedback to the users during the trust root rotation process. (See linkerd/website#595.) This will help the users to find out the latest state of the control plane and data plane, without performing low-level inquiries with
kubectl
.@grampelberg @zaharidichev LMKWYT.
The following workflow adds the following items to #3696:
linkerd check [--proxy]
. Currently, in Add checks for issuer certificate validation #3696,linkerd check [--proxy]
is returning a 503 error (tested with expired trust root and/or expired issuer cert)Let the users determine the expiry dates of the trust root and issue certs:
As the expiry date draws closer, we should publish warning events to the k8s event bus (with
identity
being the event owner), so that services like Dive can pick them up (in the future).check
should also provide a link to the relevant documentation:Once the trust root and issuer certificate are rotated,
linkerd check
will show the new expiry dates with √ . If the data plane trust root hasn't been rotated,linkerd check --proxy
will issue a warning:Upon restarting the data plane, all checks will show up as √ .
When the trust root and/or issuer certificate expired, the
linkerd check
andlinkerd check --proxy
commands should report the errors:Currently,
linkerd check [--proxy]
is returning a 503 error in #3696 when the cert(s) expired, which doesn't tell the users what has gone wrong:I think the certificate checks might need to happen before the
linkerd-api
check.The text was updated successfully, but these errors were encountered: