-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random errors: x509: certificate signed by unknown authority #3497
Comments
@KIVagant IIRC, you installed Linkerd using the Helm templates, right? Did you override the Tap TLS cert and key in your I also wonder if this verification error is caused by clock skew on your servers. Can you confirm? Do you have |
There are still the same certs that were generated in #3414 (comment) taking into account this issue linkerd/website#516 When the problem with extra newlines was solved, L5d works well, but randomly we start getting the
That's a nice point. I will try to check this tomorrow (UTC+3). |
This is referring to an APIService which uses a certificate that is part of the resource configuration ( The fact that restarting the pod fixes it leads me to believe that |
No. I created the cert only once, added it into a secret storage and that's it. I'm updating the Helm chart periodically from the upstream, so this maybe can cause secrets regeneration, but the content of the secret stays the same.
I will try to detect if there's a clock skew when the problem appears, as @ihcsim suggested. |
@KIVagant the certificate in question isn't part of the trust chain at all. |
@KIVagant any new details? |
@grampelberg , sorry, not yet. I am busy with other tickets, but I still see the error (upgraded L5d to 2.6.0 stable). I will return back when I find more. Please, don't close this if it is okay for you. |
My findings:
I cannot confirm that this is correct. From what I see (if I understand it right), the cert was created long time ago.
|
After
So, this is the difference:
And the new date is equal to the last chart update (
At this moment I can't find any recently changed secrets:
So I see the correlation between the last deploy and the certificate issue date ( I guess this can be fixed if Helm always restarts |
Ooooh, you're totally right! That's it. This'd be a really simple PR using helm's shasum. |
Should happen for at least |
@grampelberg I took a look at that but have a few questions. To me it looks like we need to hash (in the case of linkerd-tap )against the certs defined in If the former proves to be cumbersome, cant we use |
@zaharidichev , |
@zaharidichev I'm resonably sure this'll do it:
Now, while that is the "correct" way, it might be easier to just do something like:
As we're creating new certs every time, that'll just roll it every time. |
@grampelberg Yes I tried that exact thing and it seems that we we always render tap-rbac.yaml as an empty string here. So if add this annotation to the spec and you do this twice you get the same hash every time. It does not seem to me that this is what we want: ➜ linkerd2 git:(master) ✗ bin/linkerd install --ignore-cluster | grep checksum
checksum/config: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
➜ linkerd2 git:(master) ✗ bin/linkerd install --ignore-cluster | grep checksum
checksum/config: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 Am I missing something here? Alternatively, yes we can simply do a timestamp |
Hmmm, that's not how I would have expected it to work. Let's just use the timestamp. |
Yes will give it a shot, although I have the uneasy feeling that it will bring a different set of problems wrt to testing and |
So turns out that we need to make sure Also, we need to make sure the annotation is add to the pod template, not the deployment. This diff works for me: diff --git a/charts/linkerd2/templates/tap.yaml b/charts/linkerd2/templates/tap.yaml
index 42d6cd71..d6ed4256 100644
--- a/charts/linkerd2/templates/tap.yaml
+++ b/charts/linkerd2/templates/tap.yaml
@@ -49,6 +49,7 @@ spec:
template:
metadata:
annotations:
+ linkerd.io/config-checksum: {{ include (print $.Template.BasePath "/tap-rbac.yaml") $ | sha256sum }}
{{.CreatedByAnnotation}}: {{default (printf "linkerd/helm %s" .LinkerdVersion) .CliVersion}}
{{- include "partials.proxy.annotations" .Proxy| nindent 8}}
labels: To reproduce this problem, run:
With the new annotation, the |
Bug Report
What is the issue?
I don't understand all details, but periodically I see the error in different places but Linkerd works in general. The error appears randomly. Pods restarting helps to solve it but I don't think it's a good workaround.
How can it be reproduced?
Logs, error output, etc
I didn't find any other errors in other L5d pods.
linkerd check
outputEnvironment
Possible solution
Additional context
The text was updated successfully, but these errors were encountered: