Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behaviour change in .Values.global.trustDomain between 1.5 and 1.6 breaks mTLS during upgrade #27828

Closed
Stono opened this issue Oct 8, 2020 · 10 comments
Labels
area/environments area/test and release area/user experience kind/docs lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while

Comments

@Stono
Copy link
Contributor

Stono commented Oct 8, 2020

Bug description
All MTLS connectivity between 1.5 proxies and 1.6 proxies following an upgrade is broken.
Connectivity between 1.6 and 1.6 proxies is fine.

Please note we're also going from microservice to monolith.

All info below is from a testing cluster so don't worry about the certs being shared here.

Certificates from /etc/certs on 1.5 proxy:

root-cert.pem:
-----BEGIN CERTIFICATE-----
MIIC3TCCAcWgAwIBAgIQefH2ay+CuXk3L0Vsh/MSNjANBgkqhkiG9w0BAQsFADAY
MRYwFAYDVQQKEw1jbHVzdGVyLmxvY2FsMB4XDTE5MTEwNzIyMTgxNloXDTI5MTEw
NDIyMTgxNlowGDEWMBQGA1UEChMNY2x1c3Rlci5sb2NhbDCCASIwDQYJKoZIhvcN
AQEBBQADggEPADCCAQoCggEBAKur36RpZ8OPMpo/ij7PqkPYPjxtbUtTS6H94etO
Fvx237wfmILSjhdUeCY6encM3KoOrL0tncSuJTWh/HtynI0Z75ak8SCYoHABWatf
Eo+p3jw04yJnFcQn1UhxTzAMMB5YQ/hGuBkUEfwq7fgSFs9iqxxfk+Mf41GOJpSR
/fXBEfSVl2/x7ItP11Tm+xrKyjwWmcnn2UcseAt7YquzYOjyC57O1Plar/shI51i
iSSUNKB3FMT5CrPc7Y/tqbbPvnBla2dt45cu3PtWvdZoEGBwJltdt05MB7HwWRiF
PAVFwoNRXvhlsYKCFgbyYGkp5CJKV+1yEQemMDhUNsjkjZkCAwEAAaMjMCEwDgYD
VR0PAQH/BAQDAgIEMA8GA1UdEwEB/wQFMAMBAf8wDQYJKoZIhvcNAQELBQADggEB
AFjhfhDNZY/EjPQnibSDw55PRNCXmHJsuXL3dCSfvJiCbEO4SxDvrub+0Z2z1HKJ
FnYUGlXAVFASVcPUR/dWqUyUYuOicFdXInBkEfg6I6vc1SyKDjC/z+aMpPP0fdtC
wtwWc0jd0JOhMQfoNFXI0R5khYHYHW6sUF6xJ4gtf1JpjmcT312rnfcB/V+3Lu9W
BUcRKkcHoP7vtNgZdQ+t9/z7wNEB54k/Lr0dIAbhdDxFeq1759ipxg3+cd+tucwJ
JipJEMYS7LT976uQNjqd+DqkvNsdy+czJDALSS98vpncZWhRA3yBv90/TjxZeMjR
WnHWEh/L1s7oKCFGbiYmBYc=
-----END CERTIFICATE-----

key.pem:
-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEA58IE61eTT38P4qnu3/RllbOmWvYU9JryGfULDB1ACsTMTryZ
evEjJyEuRtcWqa99FAc1ReSAnk7VmbTLpUCBpvapoMADD7PKK939i7iCUwg9L31E
GU/2Ac2ePs6AFVUXdnU8qRrdmrbkXVFVby3chgelZCAgE7bhASTk4cjALLteW/69
UIo1eu/qhjG0uL5jEyZVhBFtrZLPqF+bu3iVR+lYtUUorXJet0MwUyN5rK4+qmwn
ZfC3IgTZAZ8oZWlHXvNa+FtiY8bdyVlx+gmkJCHRtES7ae+fiVLtRIhAa/GhQlp/
7JhWQBN3Q9JQ07bgHYFWlPmhpoYc8fp7xbTO+wIDAQABAoIBAEACqYMq1AgP6x5K
myyF/wzC6r6S2yTYKugacyusZITU6C0TED0RnwjutC5it+K0EpLWjtM4EoIm+f4T
HnANCgJIfH9mqTHMEZneHWpa0rwGOYgFTCrFmAfVd9CXDm9V6j4QkWmPfwd9XkOb
9EHZ49+s+vRVnyZyy2CahRELdnCj5RJvbaOd0pIg6DN2736JJf6NRl6gnVGCDf/X
BSGWq/V1E10NHbqqFljJly+W7T/hLSmqd5nYErxZ5R6jlvqvxK/OGGqYvAcZ2s8w
58dA2KnFN9OPVnTO+/rhqCkcegYfz2TrKfPOTuXXHDcSqXkL0AvrUHxlQ5b58h2b
mJZEJ/kCgYEA8EEhO6XU00UB/YE/MBATEPeRXhHsH3p03hwrWxmP6OnWVCW4bRhx
QOHu8JnMQQovTgsw7lNYCwODaq0B2XikTGVNPE25OcYlpPM5Glqe3IHHRZKyAVS/
xfPehk+R3kKhdmk4w2kziOAfAAqHfCUDwoZ+M7wUEQX6TP9zE2t+3k8CgYEA9vJW
wJLbbOAE1SexxdKQwMBpxn+x30aEBQJ+qJLrlDlJyBHl0l6K8+bIQ2JqxNmP6Sgj
trak4jIKx6KaM8fjcFa2SC4htE6hH9EqjcmjjOse0+jJh78JV4clslA3ov93fcko
Fp4tOZgLoYXDt15OvgtjhzWJOfTNQAHy4GKEJZUCgYEA1F4dzCXXrDBRhA3dFtT+
pX7QZkCdYW3TJAnuYQaFaLJaG+OD1BtI8LtFhDPOkqc4DVpjFCdjqcifP4pSGjND
t6vLy8RAOEtoNxgvn5X+2pd015DF+9s38PiR+FMZc0eehZaY8FJrlU/W1yh3fksR
ub88iupzKmEIUUt0IvRgHo0CgYEA8YB2TsZQSN2cXEkBlhEi9x41U6a18UEpAy05
aOql8MNF4J+APoevJG/iEeRBvll2X/KdWqasAXonCK2AWHt3dfmXMmfLFmZ/NHp+
P0Oe6sVV1K+nx1WQcUT+HLBOeN2VojIDPntahySm20PR75YPM9Q3dZdpqIA76gj9
2Wr/CTECgYBpHr8FofeO4Pim+ZI/Gif6i9k1tYqUaU7SoBFKpFORcN3hnm6Hoe5B
9YqJm6mGOwrsciKHBxGinnyy5maPKqfw0+teZ+POuunvB1kyeV+esOOXhxXV15LN
F2Ap4WV23ewqnQCJMcombnXakdDIOhMxKUAqNKgCg3wVC+wXutvMew==
-----END RSA PRIVATE KEY-----

cert-chain.pem:
-----BEGIN CERTIFICATE-----
MIIDOzCCAiOgAwIBAgIRAI5g544E3bXh6oJZAEdr63MwDQYJKoZIhvcNAQELBQAw
GDEWMBQGA1UEChMNY2x1c3Rlci5sb2NhbDAeFw0yMDA5MTcyMjIzMzhaFw0yMDEy
MTYyMjIzMzhaMAAwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDnwgTr
V5NPfw/iqe7f9GWVs6Za9hT0mvIZ9QsMHUAKxMxOvJl68SMnIS5G1xapr30UBzVF
5ICeTtWZtMulQIGm9qmgwAMPs8or3f2LuIJTCD0vfUQZT/YBzZ4+zoAVVRd2dTyp
Gt2atuRdUVVvLdyGB6VkICATtuEBJOThyMAsu15b/r1QijV67+qGMbS4vmMTJlWE
EW2tks+oX5u7eJVH6Vi1RSitcl63QzBTI3msrj6qbCdl8LciBNkBnyhlaUde81r4
W2Jjxt3JWXH6CaQkIdG0RLtp75+JUu1EiEBr8aFCWn/smFZAE3dD0lDTtuAdgVaU
+aGmhhzx+nvFtM77AgMBAAGjgZcwgZQwDgYDVR0PAQH/BAQDAgWgMB0GA1UdJQQW
MBQGCCsGAQUFBwMBBggrBgEFBQcDAjAMBgNVHRMBAf8EAjAAMFUGA1UdEQEB/wRL
MEmGR3NwaWZmZTovL2NsdXN0ZXIubG9jYWwvbnMvaW5ncmVzcy1uZ2lueC9zYS9u
Z2lueC1pbmdyZXNzLXNlcnZpY2VhY2NvdW50MA0GCSqGSIb3DQEBCwUAA4IBAQB+
olkDXfE81ilbJKNQWoTlTfz2TXFTkODpQQd0rcTTAQQeIOQ8+wX04L7y4nHTLiSc
+iTxls+oWdNyw36QFefFLcQNbx2gLVl0ROT9hpCMvUGz7Lbx7GpTyHp/QPj0qVpj
DjlrVJoZinnnpIyZnXVRbkLD6pi2qQ+bTpPyi+dY9V2uDifkRieaCOPJ6AnPeulv
vB9RQCaIaWXravXVYtnRkn2ipV3ELbvGH/zTS72TIhTvW3mhzqxKCWxY14QT0YcN
wiNgYchBqKWWaOCBSS3bFvcsWvtGLLdKYSOQRyBWISwSU8OCfkmn0qifuD9UVTWn
/IAi+IksYpCtPlWllwqM
-----END CERTIFICATE-----

Debug request from 1.5 (outbound) proxy:

[Envoy (Epoch 1)] [2020-10-08 09:48:53.624][41][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:95] creating a new connection
[Envoy (Epoch 1)] [2020-10-08 09:48:53.624][41][debug][client] [external/envoy/source/common/http/codec_client.cc:34] [C56416] connecting
[Envoy (Epoch 1)] [2020-10-08 09:48:53.624][41][debug][connection] [external/envoy/source/common/network/connection_impl.cc:698] [C56416] connecting to 10.206.4.214:8080
[Envoy (Epoch 1)] [2020-10-08 09:48:53.624][41][debug][connection] [external/envoy/source/common/network/connection_impl.cc:707] [C56416] connection in progress
[Envoy (Epoch 1)] [2020-10-08 09:48:53.624][41][debug][pool] [external/envoy/source/common/http/conn_pool_base.cc:55] queueing request due to no available connections
[Envoy (Epoch 1)] [2020-10-08 09:48:53.625][41][debug][connection] [external/envoy/source/common/network/connection_impl.cc:570] [C56416] connected
[Envoy (Epoch 1)] [2020-10-08 09:48:53.625][41][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:191] [C56416] handshake expecting read
[Envoy (Epoch 1)] [2020-10-08 09:48:53.628][41][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:191] [C56416] handshake expecting read
[Envoy (Epoch 1)] [2020-10-08 09:48:53.628][41][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:191] [C56416] handshake expecting read
[Envoy (Epoch 1)] [2020-10-08 09:48:53.629][41][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:198] [C56416] handshake error: 1
[Envoy (Epoch 1)] [2020-10-08 09:48:53.629][41][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:226] [C56416] TLS error: 268436502:SSL routines:OPENSSL_internal:SSLV3_ALERT_CERTIFICATE_UNKNOWN
[Envoy (Epoch 1)] [2020-10-08 09:48:53.629][41][debug][connection] [external/envoy/source/common/network/connection_impl.cc:192] [C56416] closing socket: 0
[Envoy (Epoch 1)] [2020-10-08 09:48:53.629][41][debug][client] [external/envoy/source/common/http/codec_client.cc:91] [C56416] disconnect. resetting 0 pending requests
[Envoy (Epoch 1)] [2020-10-08 09:48:53.629][41][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:136] [C56416] client disconnected, failure reason: TLS error: 268436502:SSL routines:OPENSSL_internal:SSLV3_ALERT_CERTIFICATE_UNKNOWN
[Envoy (Epoch 1)] [2020-10-08 09:48:53.629][41][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:166] [C56416] purge pending, failure reason: TLS error: 268436502:SSL routines:OPENSSL_internal:SSLV3_ALERT_CERTIFICATE_UNKNOWN
[Envoy (Epoch 1)] [2020-10-08 09:48:53.629][41][debug][router] [external/envoy/source/common/router/router.cc:990] [C56401][S6529854247311198098] upstream reset: reset reason connection failure

Boot logs from 1.6 proxy (showing CA):

2020-10-08T09:44:28.580940Z	info	Proxy role: &model.Proxy{ClusterID:"", Type:"sidecar", IPAddresses:[]string{"10.206.4.214"}, ID:"istio-test-app-1-7cc6d46746-2dv94.istio-test-app-1", Locality:(*envoy_api_v2_core.Locality)(nil), DNSDomain:"istio-test-app-1.svc.cluster.local", ConfigNamespace:"", Metadata:(*model.NodeMetadata)(nil), SidecarScope:(*model.SidecarScope)(nil), PrevSidecarScope:(*model.SidecarScope)(nil), MergedGateway:(*model.MergedGateway)(nil), ServiceInstances:[]*model.ServiceInstance(nil), IstioVersion:(*model.IstioVersion)(nil), ipv6Support:false, ipv4Support:false, GlobalUnicastIP:"", XdsResourceGenerator:model.XdsResourceGenerator(nil), Active:map[string]*model.WatchedResource(nil)}
2020-10-08T09:44:28.580946Z	info	JWT policy is third-party-jwt
2020-10-08T09:44:28.580993Z	info	PilotSAN []string{"istiod.istio-system.svc"}
2020-10-08T09:44:28.581002Z	info	MixerSAN []string{"spiffe://cluster.local/ns/istio-system/sa/istio-mixer-service-account"}
2020-10-08T09:44:28.627273Z	info	serverOptions.CAEndpoint == istiod.istio-system.svc:15012
2020-10-08T09:44:28.627315Z	info	Using user-configured CA istiod.istio-system.svc:15012
2020-10-08T09:44:28.627318Z	info	istiod uses self-issued certificate
2020-10-08T09:44:28.627367Z	info	the CA cert of istiod is: -----BEGIN CERTIFICATE-----
MIIC3TCCAcWgAwIBAgIQefH2ay+CuXk3L0Vsh/MSNjANBgkqhkiG9w0BAQsFADAY
MRYwFAYDVQQKEw1jbHVzdGVyLmxvY2FsMB4XDTE5MTEwNzIyMTgxNloXDTI5MTEw
NDIyMTgxNlowGDEWMBQGA1UEChMNY2x1c3Rlci5sb2NhbDCCASIwDQYJKoZIhvcN
AQEBBQADggEPADCCAQoCggEBAKur36RpZ8OPMpo/ij7PqkPYPjxtbUtTS6H94etO
Fvx237wfmILSjhdUeCY6encM3KoOrL0tncSuJTWh/HtynI0Z75ak8SCYoHABWatf
Eo+p3jw04yJnFcQn1UhxTzAMMB5YQ/hGuBkUEfwq7fgSFs9iqxxfk+Mf41GOJpSR
/fXBEfSVl2/x7ItP11Tm+xrKyjwWmcnn2UcseAt7YquzYOjyC57O1Plar/shI51i
iSSUNKB3FMT5CrPc7Y/tqbbPvnBla2dt45cu3PtWvdZoEGBwJltdt05MB7HwWRiF
PAVFwoNRXvhlsYKCFgbyYGkp5CJKV+1yEQemMDhUNsjkjZkCAwEAAaMjMCEwDgYD
VR0PAQH/BAQDAgIEMA8GA1UdEwEB/wQFMAMBAf8wDQYJKoZIhvcNAQELBQADggEB
AFjhfhDNZY/EjPQnibSDw55PRNCXmHJsuXL3dCSfvJiCbEO4SxDvrub+0Z2z1HKJ
FnYUGlXAVFASVcPUR/dWqUyUYuOicFdXInBkEfg6I6vc1SyKDjC/z+aMpPP0fdtC
wtwWc0jd0JOhMQfoNFXI0R5khYHYHW6sUF6xJ4gtf1JpjmcT312rnfcB/V+3Lu9W
BUcRKkcHoP7vtNgZdQ+t9/z7wNEB54k/Lr0dIAbhdDxFeq1759ipxg3+cd+tucwJ
JipJEMYS7LT976uQNjqd+DqkvNsdy+czJDALSS98vpncZWhRA3yBv90/TjxZeMjR
WnHWEh/L1s7oKCFGbiYmBYc=
-----END CERTIFICATE-----

2020-10-08T09:44:28.627513Z	info	parsed scheme: ""
2020-10-08T09:44:28.627539Z	info	scheme "" not registered, fallback to default scheme
2020-10-08T09:44:28.627574Z	info	ccResolverWrapper: sending update to cc: {[{istiod.istio-system.svc:15012  <nil> 0 <nil>}] <nil> <nil>}
2020-10-08T09:44:28.627627Z	info	ClientConn switching balancer to "pick_first"
2020-10-08T09:44:28.627634Z	info	Channel switches to new LB policy "pick_first"
2020-10-08T09:44:28.627675Z	info	Subchannel Connectivity change to CONNECTING
2020-10-08T09:44:28.627825Z	info	Subchannel picks a new address "istiod.istio-system.svc:15012" to connect
2020-10-08T09:44:28.627957Z	info	sds	SDS gRPC server for workload UDS starts, listening on "./etc/istio/proxy/SDS"

2020-10-08T09:44:28.627896Z	info	pickfirstBalancer: HandleSubConnStateChange: 0xc000d767a0, {CONNECTING <nil>}
2020-10-08T09:44:28.628243Z	info	Starting proxy agent
2020-10-08T09:44:28.628278Z	info	Channel Connectivity change to CONNECTING
2020-10-08T09:44:28.628461Z	info	Opening status port 15020

2020-10-08T09:44:28.628535Z	info	Received new config, creating new Envoy epoch 0
2020-10-08T09:44:28.628116Z	info	sds	Start SDS grpc server
2020-10-08T09:44:28.630303Z	info	Epoch 0 starting
2020-10-08T09:44:28.643648Z	info	Envoy command: [-c etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster istio-test-app-1 --service-node sidecar~10.206.4.214~istio-test-app-1-7cc6d46746-2dv94.istio-test-app-1~istio-test-app-1.svc.cluster.local --max-obj-name-len 189 --local-address-ip-version v4 --log-format %Y-%m-%dT%T.%fZ	%l	envoy %n	%v -l error --concurrency 2]
2020-10-08T09:44:28.661541Z	info	Subchannel Connectivity change to READY
2020-10-08T09:44:28.661588Z	info	pickfirstBalancer: HandleSubConnStateChange: 0xc000d767a0, {READY <nil>}
2020-10-08T09:44:28.661596Z	info	Channel Connectivity change to READY
2020-10-08T09:44:28.755324Z	info	sds	resource:default new connection
2020-10-08T09:44:28.755466Z	info	sds	Skipping waiting for ingress gateway secret
2020-10-08T09:44:28.938157Z	info	cache	Root cert has changed, start rotating root cert for SDS clients
2020-10-08T09:44:28.938250Z	info	cache	GenerateSecret default
2020-10-08T09:44:28.938481Z	info	sds	resource:default pushed key/cert pair to proxy
2020-10-08T09:44:29.032280Z	info	sds	resource:default new connection
2020-10-08T09:44:29.032310Z	info	sds	resource:ROOTCA new connection
2020-10-08T09:44:29.032423Z	info	sds	Skipping waiting for ingress gateway secret
2020-10-08T09:44:29.032428Z	info	sds	Skipping waiting for ingress gateway secret
2020-10-08T09:44:29.032453Z	info	cache	Loaded root cert from certificate ROOTCA
2020-10-08T09:44:29.032556Z	info	sds	resource:ROOTCA pushed root cert to proxy
2020-10-08T09:44:29.176473Z	info	cache	GenerateSecret default
2020-10-08T09:44:29.176589Z	info	sds	resource:default pushed key/cert pair to proxy
2020-10-08T09:44:29.212745Z	warn	Envoy proxy is NOT ready: failed to get readiness stats: listener_manager.workers_started is not yet updated: server.state: 0

2020-10-08T09:44:30.223439Z	info	Envoy proxy is ready

Debug request logs from 1.6 (inbound) proxy:

2020-10-08T09:59:45.263555Z     debug   envoy conn_handler      [external/envoy/source/server/connection_handler_impl.cc:411] [C511] new connection
2020-10-08T09:59:45.264321Z     debug   envoy connection        [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:191] [C511] handshake expecting read
2020-10-08T09:59:45.264338Z     debug   envoy connection        [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:191] [C511] handshake expecting read
2020-10-08T09:59:45.265708Z     debug   envoy connection        [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:198] [C511] handshake error: 1
2020-10-08T09:59:45.265733Z     debug   envoy connection        [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:226] [C511] TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
2020-10-08T09:59:45.265738Z     debug   envoy connection        [external/envoy/source/common/network/connection_impl.cc:200] [C511] closing socket: 0
2020-10-08T09:59:45.265782Z     debug   envoy conn_handler      [external/envoy/source/server/connection_handler_impl.cc:111] [C511] adding to cleanup list
2020-10-08T09:59:45.303076Z     debug   envoy filter    [external/envoy/source/extensions/filters/listener/original_dst/original_dst.cc:18] original_dst: New connection accepted
2020-10-08T09:59:45.303117Z     debug   envoy filter    [external/envoy/source/extensions/filters/listener/tls_inspector/tls_inspector.cc:78] tls inspector: new connection accepted
2020-10-08T09:59:45.303152Z     debug   envoy filter    [external/envoy/source/extensions/filters/listener/tls_inspector/tls_inspector.cc:148] tls:onServerName(), requestedServerName: outbound_.80_._.app.istio-test-app-1.svc.clust
er.local

Making a request directly using curl and the certs from istio-proxy:

istio-proxy@ingress-nginx-internal-controller-6cbf5b6995-2d6tk:/etc/certs$ curl https://app.istio-test-app-1:80 -v --key /etc/certs/key.pem --cert /etc/certs/cert-chain.pem --cacert /etc/certs/root-cert.pem -k
* Rebuilt URL to: https://app.istio-test-app-1:80/
*   Trying 10.192.96.150...
* TCP_NODELAY set
* Connected to app.istio-test-app-1 (10.192.96.150) port 80 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/certs/root-cert.pem
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Unknown (8):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Client hello (1):
* TLSv1.3 (OUT), TLS Unknown, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS Unknown, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, CERT verify (15):
* TLSv1.3 (OUT), TLS Unknown, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: [NONE]
*  start date: Oct  8 09:44:29 2020 GMT
*  expire date: Oct  9 09:44:29 2020 GMT
*  issuer: O=cluster.local
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* Using Stream ID: 1 (easy handle 0x55de5a1b2580)
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
> GET / HTTP/2
> Host: app.istio-test-app-1:80
> User-Agent: curl/7.58.0
> Accept: */*
>
* TLSv1.3 (IN), TLS Unknown, Unknown (21):
* TLSv1.3 (IN), TLS alert, Server hello (2):
* OpenSSL SSL_read: error:14094416:SSL routines:ssl3_read_bytes:sslv3 alert certificate unknown, errno 0
* Failed receiving HTTP2 data
* Connection #0 to host app.istio-test-app-1 left intact
curl: (56) OpenSSL SSL_read: error:14094416:SSL routines:ssl3_read_bytes:sslv3 alert certificate unknown, errno 0

[x] Docs
[x] Installation
[ ] Networking
[ ] Performance and Scalability
[ ] Extensions and Telemetry
[ ] Security
[ ] Test and Release
[x] User Experience
[x] Developer Infrastructure

Expected behavior
mtls between 1.5 and 1.6 pods to work

Steps to reproduce the bug

Version (include the output of istioctl version --remote and kubectl version --short and helm version if you used Helm)
1.5 + 1.6

How was Istio installed?
Helm

Environment where bug was observed (cloud vendor, OS, etc)

@howardjohn
Copy link
Member

How did you upgrade? I got a similar issue when I tried but it was because the 1.5 pods were pointing to the old 1.5 pilot which I had removed in the 1.6 install, so it didn't get reconfigured for the new services. However, my curl from istio-proxy worked whereas yours did not so probably a different issue then what you see here.

@Stono
Copy link
Contributor Author

Stono commented Oct 8, 2020

We have a bespoke deployment of istio deployed via helm.

In order to facilitate a microservice to monolith deployment we have an istio-pilot service pointing to istiod so that the 1.5 proxies maintain connectivity. The 1.5 proxy was not reporting any errors.

Perhaps it would be easier to screen share and you can interrogate a cluster in a broken state?

@Stono
Copy link
Contributor Author

Stono commented Oct 8, 2020

OK so a bit more info, it seems mtls for any 1.5 pods is broken, so included 1.5 -> 1.5

I've attached config dumps for:

  • config_dump/1.5-control-plane-destination.json
  • config_dump/1.5-control-plane-source.json

I then upgraded the control plane and have provided:

  • config_dump/1.6-control-plane-destination.json
  • config_dump/1.6-control-plane-source.json

config_dump.tar.gz

In all of these dumps, the proxy is v1.5, the source is istio-test-app-1 and the destination is istio-test-app-2

Some extra information, both services have DestinationRules that have:

  spec:
      tls:
        mode: ISTIO_MUTUAL

On 1.5 we have meshpolicies.authentication.istio.io with:

spec:
  peers:
  - mtls: {}

And on 1.6 we have peerauthentications.security.istio.io with:

spec:
  mtls:
    mode: STRICT

@Stono Stono changed the title mTLS broken between 1.5 and 1.6 pods following upgrade mTLS broken on 1.5 pods, following upgrade of control plane to 1.6 Oct 8, 2020
@howardjohn
Copy link
Member

          "match_subject_alt_names": [
           {
            "exact": "spiffe:///ns/istio-test-app-2/sa/default"
           }

vs

          "verify_subject_alt_name": [
           "spiffe://cluster.local/ns/istio-test-app-2/sa/default"
          ]

is one difference. should be the same behavior though

@howardjohn
Copy link
Member

$ ik pc l --file 1.5-control-plane-destination.json | rg 15006
0.0.0.0       15006 App: TCP TLS; Addr: 0.0.0.0/0 Non-HTTP/Non-TCP
0.0.0.0       15006 Addr: 0.0.0.0/0               Non-HTTP/Non-TCP
0.0.0.0       15006 Addr: 10.206.4.62/32:8080     Non-HTTP/Non-TCP

$ ik pc l --file 1.6-control-plane-destination.json | rg 15006
0.0.0.0       15006 Trans: tls; Addr: 0.0.0.0/0 Non-HTTP/Non-TCP
0.0.0.0       15006 Addr: 10.206.4.62/32:8080   Non-HTTP/Non-TCP

this seems likely suspicious

@Stono
Copy link
Contributor Author

Stono commented Oct 8, 2020

For what it's worth, if I create a PeerAuthentication resource in the namespace of the app:

spec:
  mtls:
    mode: DISABLE

and set the DestinationRule to:

  spec:
    trafficPolicy:
      tls:
        mode: DISABLE

Thereby effectively disabling mtls for the service, it works.

@howardjohn
Copy link
Member

I posted the exact snippet that is wrong and didn't notice.... "spiffe:///ns/istio-test-app-2/sa/default" is completely wrong. The trustDomain in the istio configmap got set to "" somehow. I set it to cluster.local and it works again

@Stono
Copy link
Contributor Author

Stono commented Oct 9, 2020

@howardjohn you're a star. There are certainly UX things to think about here.

As you know we've been using istio for ages so have had the same values.yaml file forever.

In that file for <= 1.5 we had .Values.globla.trustDomain: '' which I can only presume was historically the default value, it's not something we've ever changed and works fine on 1.5 (our 1.5 istio configmap shows '' and everything is rosey)

We then used istioctl manifest migrate to move to the 1.6 compatible values which correctly copied trustDomain: ''.

So it feels to me like the default behaviour when that key is an empty string has some how changed between the two releases and in 1.5 it defaulted to cluster.local and in 1.6 its taken literally as nothing.

@howardjohn
Copy link
Member

Yeah seems like somewhere down the line something changed from "" meaning "use the default" to "override the default to empty". I'll see how that happened

@Stono Stono changed the title mTLS broken on 1.5 pods, following upgrade of control plane to 1.6 Behaviour change in .Values.global.trustDomain between 1.5 and 1.6 breaks mTLS during upgrade Oct 20, 2020
@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jan 8, 2021
@istio-policy-bot
Copy link

🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2020-10-09. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.

Created by the issue and PR lifecycle manager.

@istio-policy-bot istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Jan 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/environments area/test and release area/user experience kind/docs lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while
Projects
None yet
Development

No branches or pull requests

3 participants