New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.6 istio-proxy SDS not updating certificates SSLV3_ALERT_CERTIFICATE_EXPIRED #28050
Comments
cc. @howardjohn @mandarjog 🤷 |
@Stono 1.6 updates will EOL when 1.8 releases in mid November. I am wondering if you considered moving directly to 1.7 instead to stay in the window? |
Definitely not! If we can't do sequential version updates (1.5 -> 1.6 -> 1.7) which is the approved upgrade route maintaining backwards compatibility, I have no confidence in an untrodden and untested path of 1.5 -> 1.7 working. We decided to stay at least one minor version behind because of the painful track record of issues and breaking changes with new releases. After the nightmare experience of getting onto 1.5 (telemetry v2 etc), we decided to leave it a little longer for 1.6 to stabilise. I really believe Istio are going to alienate any serious users with this overly rapid release process and T-1 support policy. There needs to be a focus on stability where you are right now and ensuring people are actually able to upgrade, not releasing more enhancements. 12 patch versions into 1.6 and I wouldn't describe it as anything other than broken and for us, actually unusable: Two weeks of testing it and we we've not been able to get it past our most basic qualifying environments:
I've always been a massive advocate of Istio. But I'm finding it harder and harder to continue to recommend it. 1.5 -> 1.6 has easily been the worst experience yet. |
I am not super familiar with this option without looking into it but from the logs it looks like we are refreshing the cert from the file rather than istiod. The doc suggests this option is for VMs where the user is providing the (long expiry) cert which would explain this behavior although I am not sure it's expected since I think this is how we recommended Prometheus getting certs. |
Looks like sds-agent uses existence of Could you mount
You probably want to try proxyMetadata in proxyConfig. |
I think I recall now the /etc/certs mounting is controlled by |
FYI we have And then as extra belt and braces, not allowing folks to use @bianpengyuan re I've updated our mount as you have suggested and will update here if it worked. |
I can confirm changing the directory means we're now getting certificate rotation, however that seems to have highlighted a cli bug: #28099 Do you want me to keep this issue open to track UX improvements as I think this path needs to be documented clearly as others are going to need to do it? I've documented what I did in https://karlstoney.com/2020/10/15/istio-upgrades-prometheus-sds/index.html |
@Stono you can do it with:
I think we need to update some of the mesh config docs for this annotation though, it was new |
Basically anything under meshConfig.defaultConfig can be overriden per-pod |
@Stono Yes, please keep this issue open. I am going to add the document and will borrow some content from your blog.
I will read through the logic and see how to make it not depend on presence of the cert dir.
John beat me on it, the option is still hidden from doc, we should update that. |
I will update the doc for mesh config |
Can confirm this combo of annotations does the job:
|
Closing this issue as there's a PR merged to document it. |
Bug description
In our
1.6
qualifying environment we have an application that is rejecting requests withupstream connect error or disconnect/reset before headers. reset reason: connection failure
.We have three (almost) identical test apps:
We continually send load that goes 1 -> 2 -> 3.
We rolling restart
istio-test-app-1
every 10 minutes to create config churnThe pod is 25 hours old, however as you can see here something happened at 14:35 which broke
istio-test-app-3
(1 and 2 remain fine).The istio proxy logs contain lots of sds noise:
The only difference that
istio-test-app-3
has vs 1 and 2 is that it uses:As per this documentation.
To write certs to this
volumeMount
:In order to write the certs to disk (as I was experimenting with for prometheus)
Looking at the logs I see:
Which makes me check:
Which shows the certificates are indeed out of date - and expired exactly when we started seeing issues.
This made me check another app where we mount certificates in this way (prometheus) and sure enough I see its certs are out of date too:
Restarting the pod renewed the certificates:
[ ] Docs
[ ] Installation
[ ] Networking
[ ] Performance and Scalability
[ ] Extensions and Telemetry
[x] Security
[ ] Test and Release
[x] User Experience
[ ] Developer Infrastructure
Expected behavior
SDS certs to be updated correctly when using
OUTPUT_CERTS
Steps to reproduce the bug
No idea
Version (include the output of
istioctl version --remote
andkubectl version --short
andhelm version
if you used Helm)1.6.12
How was Istio installed?
Helm
Environment where bug was observed (cloud vendor, OS, etc)
GKE
The text was updated successfully, but these errors were encountered: