New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v2 <-> v3 TlsContext causes temporary downtime #13864
Comments
See istio#28120 See envoyproxy/envoy#13864 This resolves a downtime event on in place upgrade from 1.6 to 1.7. This is a couple seconds of 503s. This is intentionally sent only to 1.7 as it is only relevant for this branch. Please note this feature flag is shipped by on by default. We have two choices: * Off by default. Anyone upgrading from 1.6 to 1.7 will continue to get downtime unless they read the release notes and add the flag. * On by default. Anyone with 1.7 already deployed, but that still has 1.6 proxies will encur a downtime unless they read the release notes and remove the flag. I have chosen on by default, as the set of people with 1.6 proxies with 1.7.x Istiod upgrading to 1.7.5 seems far smaller than the impacted set of "off by default", and the mitigation is the same. Additionally, for those that are impacted, the impact will be exclusively the proxies on 1.6, which is presumably not 100% of proxies, whereas in the other case ALL proxies are 1.6 and thus impacted.
* Do not switch TLS version on 1.6 -> 1.7 upgrade See #28120 See envoyproxy/envoy#13864 This resolves a downtime event on in place upgrade from 1.6 to 1.7. This is a couple seconds of 503s. This is intentionally sent only to 1.7 as it is only relevant for this branch. Please note this feature flag is shipped by on by default. We have two choices: * Off by default. Anyone upgrading from 1.6 to 1.7 will continue to get downtime unless they read the release notes and add the flag. * On by default. Anyone with 1.7 already deployed, but that still has 1.6 proxies will encur a downtime unless they read the release notes and remove the flag. I have chosen on by default, as the set of people with 1.6 proxies with 1.7.x Istiod upgrading to 1.7.5 seems far smaller than the impacted set of "off by default", and the mitigation is the same. Additionally, for those that are impacted, the impact will be exclusively the proxies on 1.6, which is presumably not 100% of proxies, whereas in the other case ALL proxies are 1.6 and thus impacted. * fix nil * Fix initial fetch
@howardjohn I could only reproduce (w a mocking config) this behavior on 1.14.x branch, i.e. istio proxy 1.6.x, but not with 1.15.x, 1.16.x or latest master. Can you confirm that's the case? If that's the case the upgrade path will be update to 1.7.x proxy first then migrate to v3 transport socket. I'll try to figure out which commit exactly fixes this and report back. |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions. |
Title: v2 <-> v3 TlsContext causes temporary downtime
Description:
Switching the TLS context version on a cluster causes a downtime
Apply this change (full cluster below) to a v2 cluster via XDS:
Fails a few requests with
client disconnected, failure reason: TLS error: Secret is not supplied by SDS
.Repro steps:
I can easily reproduce this with Istio control plane just swapping out the version. I assume it could be reproduce with file based XDS but haven't produced a minimal reproducer
Full cluster:
The text was updated successfully, but these errors were encountered: