Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nondeterministic behaviour of istio-ingressgateway with mTLS #29214

Closed
Demonsthere opened this issue Nov 26, 2020 · 6 comments
Closed

Nondeterministic behaviour of istio-ingressgateway with mTLS #29214

Demonsthere opened this issue Nov 26, 2020 · 6 comments
Labels
area/perf and scalability area/security area/user experience lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while

Comments

@Demonsthere
Copy link
Contributor

Demonsthere commented Nov 26, 2020

Bug description
In our system we are using istio-ingressgateway to expose a certain amount of services using a simple TLS gateway (tls-gateway) and a single mTLS service using a dedicated mTLS gateway (mtls-gateway).
Both gateways use the same TLS certificate having the same credentialName: tls-gateway-certs field in the configuration. The secret is managed by cert-manager, so it created with ca.crt, tls.crt, tls.key fields.
Additionally, we have a custom secret tls-gateway-certs-cacert, which holds a custom cacert, which we want to use for our mtls-gateway.

After upgrading istio to 1.7.4 we noticed a nondeterministic behavior related to scaling the istio-ingressgateway. Calling our mtls application would sometimes fail with an error:1401E418:SSL routines:CONNECT_CR_FINISHED:tlsv1 alert unknown ca error.

After some debugging we found out that some pods/instances of istio-ingressgateway were serving the CA from tls-gateway-certs secret (which was wrong) and some from tls-gateway-certs-cacert (which was correct)

This behavior was not present in previous versions of Istio (1.5 and previous)

[ ] Docs
[ ] Installation
[ ] Networking
[ X ] Performance and Scalability
[ ] Extensions and Telemetry
[ X ] Security
[ ] Test and Release
[ X ] User Experience
[ ] Developer Infrastructure
[ ] Upgrade

Expected behavior
Istio-ingressgateway uses a single secret in a deterministic manner or uses the dedicated secret with higher priority than the combined one.

Steps to reproduce the bug

Version (include the output of istioctl version --remote and kubectl version --short and helm version if you used Helm)
1.7.4

How was Istio installed?
istioctl install with custom profile

Environment where bug was observed (cloud vendor, OS, etc)
Cluster created with Gardener on GCE

Additionally, please consider running istioctl bug-report and attach the generated
cluster-state tarball to this issue.
Refer cluster state archive
for more details.

@howardjohn
Copy link
Member

Can you post the Gateway configs?

@Demonsthere
Copy link
Contributor Author

Demonsthere commented Nov 27, 2020

@howardjohn Of course, adding them to the first post. I am linking to the configuration before a workaround (separate secret + job copying the tls cert and key). Those are the gateways we were using until istio 1.7. As you can see, they are both configured to use the same secret, one is tls and the other mtls.
The gateways work as expected for self-managed certs, but once we put cert-manager into the picture, it goes haywire. I suspect it is related to changes/moving the SDS from ingress-gateway to istiod

@howardjohn
Copy link
Member

I think this is the same as #13589, can you check if that is consistent with this issue? Does it work with curl and fail only with browsers?

@howardjohn
Copy link
Member

I suspect it is related to changes/moving the SDS from ingress-gateway to istiod

This happened in 1.8, you are seeing this after upgrading from 1.6 to 1.7 right?

@Demonsthere
Copy link
Contributor Author

Demonsthere commented Dec 1, 2020

@howardjohn I will check #13589, thanks for the tip.
No, we observed the same behavior with both curl and browser tests.
We had to upgrade from 1.5 to 1.7, and observed this after running 1.7.4

Edit:
I have looked into #13589, and it does not seem to be related, as in our case a rollout restart of ingressgateway causes all new instances to serve the single TLS cert from tls-gateway-certs.
IMHO it is more related to how istio loads gateway secrets into memory, and that gateways only have 1 configuration options for secrets. Adding a new option to gateway (credentialNameCA) which would take priority over credentialName would enable us the desired behavior (cert-manager provided tls secret + custom mTLS ca secret)

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Mar 1, 2021
@istio-policy-bot
Copy link

🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2020-11-30. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.

Created by the issue and PR lifecycle manager.

@istio-policy-bot istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/perf and scalability area/security area/user experience lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while
Projects
None yet
Development

No branches or pull requests

3 participants