Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any HTTP service will block all HTTPS traffic on the same port #16458

Open
howardjohn opened this issue Aug 21, 2019 · 28 comments
Open

Any HTTP service will block all HTTPS traffic on the same port #16458

howardjohn opened this issue Aug 21, 2019 · 28 comments

Comments

@howardjohn
Copy link
Member

@howardjohn howardjohn commented Aug 21, 2019

(Assumes ALLOW_ANY mode)

Normally, traffic to external HTTPS services works due to the ALLOW_ANY mode changes we added. However, when an http service is added on port 443 (or any port, but generally this happens on 443), this breaks. This is because we add a new 0.0.0.0_443 listener, which will match everything and direct to routes. This will fail.

To reproduce:

curl https://www.google.com # works

then

cat <<EOF | kaf -
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: https-breaker
spec:
  hosts:
  - wikipedia.org
  location: MESH_EXTERNAL
  ports:
  - number: 443
    name: http
    protocol: HTTP
  resolution: DNS
EOF

now

$ curl https://www.google.com
curl: (35) error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol.

One possible idea is to do similar to protocol sniffing and detect TLS connection and direct to passthrough cluster. This may be possible with tls inspector.

Related issues:

@howardjohn

This comment has been minimized.

Copy link
Member Author

@howardjohn howardjohn commented Aug 21, 2019

@yxue think this is something that could get tied in with your protocol sniffing work?

@yxue

This comment has been minimized.

Copy link
Member

@yxue yxue commented Aug 21, 2019

Technically protocol sniffing can resolve these issue but it requires tons of efforts.

To resolve these problem, we need to sniff the traffic using tls_inspector. The tough part is that we need to generate multiple clusters and multiple routes. Currently, the protocol sniffing only generates multiple filter chains without touching clusters and routes. For this issue, we need to generate two upstream clusters, one with tls_context and another one without. Also, we need to generate two routes one used for https and one used for http.

Protocol sniffing for h2 is similar. That's why protocol sniffing for h2 is not support yet.

@howardjohn

This comment has been minimized.

Copy link
Member Author

@howardjohn howardjohn commented Aug 22, 2019

@yxue for these we just need to send to passthrough cluster though, since this is just for external ALLOW_ANY traffic. If a user has an explicitly defined https service it should still work because it will have a filter chain match on the SNI name, right?

@vadimeisenbergibm

This comment has been minimized.

Copy link
Contributor

@vadimeisenbergibm vadimeisenbergibm commented Aug 22, 2019

@howardjohn First, please note that defining HTTP for 443 port is usually a mistake, the port and protocol in your example of wikipedia.org should be HTTPS. Second, I think the same issue can happen inside the mesh, if for some port inside the mesh HTTP and non-HTTP protocols will be used for the same port.

@howardjohn

This comment has been minimized.

Copy link
Member Author

@howardjohn howardjohn commented Aug 22, 2019

@vadimeisenbergibm my understanding is if you try to talk to an https service in-mesh it should work, because we will generate a listener with the SNI match that will match before it matches the HTTP routes (Which is where the issue arises). Untested though.

Its fine to say that you can't have http on port 443, but if that is the case we should explicitly block it -- otherwise one service in some random namespace can destroy a whole cluster

@vadimeisenbergibm

This comment has been minimized.

Copy link
Contributor

@vadimeisenbergibm vadimeisenbergibm commented Aug 22, 2019

@howardjohn Agreed.

@Dev25

This comment has been minimized.

Copy link

@Dev25 Dev25 commented Aug 28, 2019

@howardjohn @vadimeisenbergibm

I've also ran into this last week when trying to do egress TLS origination + ALLOW_ANY so we can slowly integrate external services with istio capabilities, FYI regarding the HTTP on port 443 the current docs explicity mention to do that as part of tls origination.

I've not yet tested what happens when the SE's 443 port is set to HTTPS whilst doing tls origination.

https://istio.io/docs/tasks/traffic-management/egress/egress-tls-origination/#tls-origination-for-egress-traffic

As you can see, the VirtualService redirects HTTP requests on port 80 to port 443 where the corresponding DestinationRule then performs the TLS origination. Notice that unlike the ServiceEntry in the previous section, this time the protocol on port 443 is HTTP, instead of HTTPS. This is because clients will only send HTTP requests and Istio will upgrade the connection to HTTPS.

@Dev25

This comment has been minimized.

Copy link

@Dev25 Dev25 commented Aug 28, 2019

Update: Setting the ServiceEntry port to HTTPS works fine with TLS Origination, so this is also a documentation issue.

@vadimeisenbergibm

This comment has been minimized.

Copy link
Contributor

@vadimeisenbergibm vadimeisenbergibm commented Aug 28, 2019

@Dev25 You mean setting the 443 port to be HTTPS works fine with TLS origination? So the documentation must be fixed.

@Dev25

This comment has been minimized.

Copy link

@Dev25 Dev25 commented Aug 28, 2019

@vadimeisenbergibm Yep setting ServiceEntry port 443 to HTTPS works just fine with doing TLS origination.

That being said i'm not sure if this is intended behaviour but trying to connect to port 443 using HTTPS directly (when configured for tls origination) will result in connection failures.

Python/Requests:
Caused by SSLError(SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:852)'),)

wget:
ssl_client: <domain>: handshake failed: error:1408F10B:SSL routines:ssl3_get_record:wrong version number
wget: error getting response: Connection reset by peer

Access logs do get generated by istio-proxy for these failed attempts

{
  "authority": "-",
  "bytes_received": "439",
  "bytes_sent": "890",
  "downstream_local_address": "<redacted ip>:443",
  "downstream_remote_address": "10.240.2.70:34604",
  "duration": "20",
  "istio_policy_status": "-",
  "method": "-",
  "path": "-",
  "protocol": "-",
  "request_id": "-",
  "requested_server_name": "api.mydomain.com",
  "response_code": "0",
  "response_flags": "-",
  "start_time": "2019-08-28T13:53:44.234Z",
  "upstream_cluster": "outbound|443||api.mydomain.com",
  "upstream_host": "<redacted ip>:443",
  "upstream_local_address": "10.240.2.70:34610",
  "upstream_service_time": "-",
  "upstream_transport_failure_reason": "-",
  "user_agent": "-",
  "x_forwarded_for": "-"
}
@vadimeisenbergibm

This comment has been minimized.

Copy link
Contributor

@vadimeisenbergibm vadimeisenbergibm commented Aug 28, 2019

@Dev25

trying to connect to port 443 using HTTPS directly (when configured for tls origination) will result in connection failures.

this is totally by design - the gateway expects HTTP traffic on port 443, so sending it HTTPS traffic will not work.

@howardjohn

This comment has been minimized.

Copy link
Member Author

@howardjohn howardjohn commented Aug 29, 2019

One alternative solution -- reject http Services on 443. Not sure if that is a worse evil than the bug though...

Or treat http on 443 as https or tcp (either one works). Both of these are pretty hacky

@howardjohn

This comment has been minimized.

Copy link
Member Author

@howardjohn howardjohn commented Aug 30, 2019

Spoke with @costinm and @andraxylia , seems rejecting http Services on port 443 is likely the best move here. Specifically this would involve rejecting ServiceEntries with this pattern in Galley validation, then rejecting Services (or ServiceEntries that bypass validation, I suppose) internally in pilot (and definitely increment pilot_total_rejected_configs! or some metric)

@rshriram

This comment has been minimized.

Copy link
Member

@rshriram rshriram commented Sep 16, 2019

we have fixed these conflicts with the http inspector.. right @yxue ?

@rshriram rshriram closed this Sep 16, 2019
@howardjohn howardjohn reopened this Sep 16, 2019
@howardjohn

This comment has been minimized.

Copy link
Member Author

@howardjohn howardjohn commented Sep 16, 2019

This is not fixed

@howardjohn

This comment has been minimized.

Copy link
Member Author

@howardjohn howardjohn commented Sep 16, 2019

We will create a listener with HCM only. We need to detect HTTPS traffic and not send them through HCM

@viktorpeacock

This comment has been minimized.

Copy link

@viktorpeacock viktorpeacock commented Sep 20, 2019

Hello.

This affects us as we use TLS origination using the provided documentation, i.e. having HTTP on port 443 and then using destination route to originate TLS.

This has stopped working recently and I can only assume this was caused by the ISTIO upgrade by the infrastructure team.

Dev25 suggested that changing port to HTTPS works, but it didn't for us. Can you please provide some clarity on whether this will change or how it can be fixed as we cannot talk to the external service whilst using virtual services for the retries.

If it helps, this is the issue that we are now seeing when accessing an external party on HTTP via TLS origination.

[2019-09-20T15:47:25.483Z] "GET /ommited HTTP/1.1" 503 UF,URX "-" "TLS error: 268436496:SSL routines:OPENSSL_internal:SSLV3_ALERT_HANDSHAKE_FAILURE 268435610:SSL routines:OPENSSL_internal:HANDSHAKE_FAILURE_ON_CLIENT_HELLO" 0 91 473 - "172.26.24.17" "curl/7.52.1" "f63b5390-2ca9-4e15-a22f-aa62b80ca101" "ommited.ommited..ie" "99.86.163.107:443" outbound|443||omitted.ommited.ie - 143.204.15.112:80 172.26.24.17:54090 -

Thank you.

@vadimeisenbergibm

This comment has been minimized.

Copy link
Contributor

@vadimeisenbergibm vadimeisenbergibm commented Sep 23, 2019

@viktorpeacock let me check this. So the problem started to appear after upgrade to Istio 1.3.0?

@vadimeisenbergibm

This comment has been minimized.

Copy link
Contributor

@vadimeisenbergibm vadimeisenbergibm commented Sep 23, 2019

@viktorpeacock For me specifying the port as HTTPS or TLS works in Istio 1.3.0. I will update the configuration.

howardjohn added a commit to howardjohn/istio that referenced this issue Sep 30, 2019
This is not a complete fix for
istio#16458 but does help resolve some
of the common error cases.
istio-testing added a commit that referenced this issue Oct 1, 2019
* Reject HTTP listeners on port 443

This is not a complete fix for
#16458 but does help resolve some
of the common error cases.

* Fix lint
@y0zg

This comment has been minimized.

Copy link

@y0zg y0zg commented Oct 9, 2019

@viktorpeacock For me specifying the port as HTTPS or TLS works in Istio 1.3.0. I will update the configuration.

@vadimeisenbergibm could it help if HTTP/HTTPS are on the same port?

@goruha

This comment has been minimized.

Copy link

@goruha goruha commented Oct 17, 2019

We had the problem with https external service call on a default installation of Istio with helm chart version 1.3.3
The reason was in enabled by default enableProtocolSniffingForOutbound
https://github.com/istio/istio/blob/master/install/kubernetes/helm/istio/charts/pilot/values.yaml#L16

I belive for ALLOW_ANY mode enableProtocolSniffingForOutbound should be false like enableProtocolSniffingForInbound is.

@howardjohn

This comment has been minimized.

Copy link
Member Author

@howardjohn howardjohn commented Oct 17, 2019

@goruha I don't think the two features are related at all, why do you think it broke? Can you provide more details about your setup and the config dump? Probably in another issue since it doesn't seem related

@goruha

This comment has been minimized.

Copy link

@goruha goruha commented Oct 17, 2019

@howardjohn
Platform AWS

Schema looks like that
ELB (https termination) -> Service -> Istio Gateway -> VirtualService -> Pod

Ingress Gateway service

apiVersion: v1
kind: Service
metadata:
  annotations:
    external-dns.alpha.kubernetes.io/hostname: '*.example.com'
    external-dns.alpha.kubernetes.io/ttl: "300"
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:eu-west-2:XXXXXX:certificate/XXXXX
    service.beta.kubernetes.io/aws-load-balancer-ssl-negotiation-policy: ELBSecurityPolicy-TLS-1-2-2017-01
    service.beta.kubernetes.io/aws-load-balancer-ssl-ports: https
  creationTimestamp: "2019-09-20T10:19:40Z"
  labels:
    app: istio-ingressgateway
    chart: gateways
    heritage: Tiller
    istio: ingressgateway
    release: istio
  name: istio-ingressgateway
  namespace: istio-system
  resourceVersion: "51793146"
  selfLink: /api/v1/namespaces/istio-system/services/istio-ingressgateway
  uid: 280f42d0-db90-11e9-be4f-06c7a02953b4
spec:
  clusterIP: 100.68.205.205
  externalTrafficPolicy: Cluster
  ports:
  - name: status-port
    nodePort: 32570
    port: 15020
    protocol: TCP
    targetPort: 15020
  - name: http2
    nodePort: 31380
    port: 80
    protocol: TCP
    targetPort: 80
  - name: https
    nodePort: 31390
    port: 443
    protocol: TCP
    targetPort: 443
  - name: tcp
    nodePort: 31400
    port: 31400
    protocol: TCP
    targetPort: 31400
  - name: https-kiali
    nodePort: 30073
    port: 15029
    protocol: TCP
    targetPort: 15029
  - name: https-prometheus
    nodePort: 30667
    port: 15030
    protocol: TCP
    targetPort: 15030
  - name: https-grafana
    nodePort: 31968
    port: 15031
    protocol: TCP
    targetPort: 15031
  - name: https-tracing
    nodePort: 31662
    port: 15032
    protocol: TCP
    targetPort: 15032
  - name: tls
    nodePort: 30441
    port: 15443
    protocol: TCP
    targetPort: 15443
  selector:
    app: istio-ingressgateway
    istio: ingressgateway
    release: istio
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - hostname: XXXXXX.eu-west-2.elb.amazonaws.com

Ingress Gateway

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  creationTimestamp: "2019-10-17T17:37:47Z"
  generation: 3
  labels:
    app: raw
    chart: raw-0.1.0
    heritage: Tiller
    release: istio-additional
  name: istio-ingressgateway
  namespace: istio-system
  resourceVersion: "51792792"
  selfLink: /apis/networking.istio.io/v1alpha3/namespaces/istio-system/gateways/istio-ingressgateway
  uid: d566be97-f104-11e9-adbc-0a8a771e0a30
spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts:
    - '*'
    port:
      name: http
      number: 80
      protocol: HTTP
    tls:
      httpsRedirect: true
  - hosts:
    - '*'
    port:
      name: https
      number: 443
      protocol: HTTP

VirtualService

kind: VirtualService
apiVersion: networking.istio.io/v1alpha3
metadata:
  name: web-api-gateway-blue-default
  namespace: prod
  resourceVersion: '51792119'
  generation: 1
  creationTimestamp: '2019-10-17T17:53:40Z'
  labels:
    app: web-api-gateway
    chart: monochart-0.18.2
    heritage: Tiller
    release: prod-web-api-gateway
spec:
  hosts:
    - web-api-gateway.example.com
  gateways:
    - istio-system/istio-ingressgateway
  http:
    - match:
        - uri:
            prefix: /
      name: default
      route:
        - destination:
            host: web-api-gateway-blue
            port:
              number: 80
          weight: 100
  tcp: ~
  tls: ~
  exportTo: ~

Everything else is default chart values you can find here

When I have enableProtocolSniffingForOutbound: false
everything works.
If it is true we got

bash-4.4# curl https://google.com
curl: (35) error:1408F10B:SSL routines:ssl3_get_record:wrong version number

That probably different issue, but related.

Why enableProtocolSniffingForOutbound should be true by default?

@goruha

This comment has been minimized.

Copy link

@goruha goruha commented Oct 17, 2019

@howardjohn another workaround to make it work is to create ServiceEntry for google.com, but in case of ALLOW_ANY such entity can not be required for correct behaviour

@howardjohn

This comment has been minimized.

Copy link
Member Author

@howardjohn howardjohn commented Oct 17, 2019

@goruha can you capture a config dump and create another issue and assign/mention @yxue ? instructions https://github.com/istio/istio/wiki/Troubleshooting-Istio#collecting-information-2

@howardjohn

This comment has been minimized.

howardjohn added a commit to howardjohn/istio that referenced this issue Oct 31, 2019
* Reject HTTP listeners on port 443

This is not a complete fix for
istio#16458 but does help resolve some
of the common error cases.

* Fix lint

(cherry picked from commit e50280e)
istio-testing added a commit that referenced this issue Nov 6, 2019
* Reject HTTP listeners on port 443

This is not a complete fix for
#16458 but does help resolve some
of the common error cases.

* Fix lint

(cherry picked from commit e50280e)
@jonmoter

This comment has been minimized.

Copy link

@jonmoter jonmoter commented Nov 8, 2019

We are seeing similar behavior. We have a Kubernetes Service deployed in our cluster that has a port listening on port 443, and the port is named grpc. When that Service is present, any pod that has the istio sidecar injected is unable to make outbound requests.

$ kubectl -n mesh-enabled exec -it ${pod_name} -- curl -I https://www.google.com
curl: (35) error:1400410B:SSL routines:CONNECT_CR_SRVR_HELLO:wrong version number
command terminated with exit code 35

But if I rename the Service's port from grpc to https-grpc, then it works, and the same curl command above succeeds.

So it's not just naming the port http that triggers this issue.

@howardjohn

This comment has been minimized.

Copy link
Member Author

@howardjohn howardjohn commented Nov 11, 2019

Quick update - short term solution to reject http ports on port 80 is shipped in 1.4 (on by default) and in 1.3.5 (off by default). Long term fix still pending design/implmentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
10 participants
You can’t perform that action at this time.