Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing jager via ingress gateway gives strange errors with http2 - Envoy breaks http1.1 <-> http2 ALPN #19040

Open
mooperd opened this issue Nov 18, 2019 · 18 comments

Comments

@mooperd
Copy link

@mooperd mooperd commented Nov 18, 2019

Bug description

I want to expose the standard jaeger service deployed with Istio via ingress gateway. When hitting the url with a http2 request I get 503 - Upstream connect error or disconnect/reset before headers. reset reason: connection termination

# curl --http1.1  -I https://jaeger.foo.com/
HTTP/1.1 200 OK
content-type: text/html; charset=utf-8
date: Sat, 16 Nov 2019 17:10:56 GMT
x-envoy-upstream-service-time: 0
server: istio-envoy
transfer-encoding: chunked

# curl --http2  -I https://jaeger.foo.com/
HTTP/2 503 
content-length: 95
content-type: text/plain
date: Sat, 16 Nov 2019 17:11:01 GMT
server: istio-envoy

Not sure how to answer this question - seems to be an envoy bug.

Expected behavior

200s are returned with http2

Steps to reproduce the bug

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: jaeger-gateway
spec:
  selector:
    istio: ingressgateway # use Istio default gateway implementation
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "jaeger.foo.com"
    tls:
      httpsRedirect: true # sends 301 redirect for http requests
  - port:
      number: 443
      name: https
      protocol: HTTPS
    hosts:
    - "jaeger.foo.com"
    tls:
      mode: SIMPLE
      # these keys have to exist in a secret called 'istio-ingressgateway-certs' in the istio-system namespace.
      privateKey: /etc/istio/ingressgateway-certs/tls.key
      serverCertificate: /etc/istio/ingressgateway-certs/tls.crt


apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: jaeger-query
spec:
  hosts:
  - "jaeger.foo.com"
  gateways:
  - jaeger-gateway
  http:
  - match:
    - uri:
        prefix: /
    - uri:
        prefix: /ping
    route:
    - destination:
        port:
          number: 16686
        host: jaeger-query-ext


apiVersion: v1
kind: Service
metadata:
  name: jaeger-query-ext
  namespace: istio-system
  annotations:
  labels:
    app: jaeger
    jaeger-infra: jaeger-service
    chart: tracing
    heritage: Tiller
    release: istio
spec:
  ports:
    - name: query-http
      port: 16686
      protocol: TCP
      targetPort: 16686
  selector:
    app: jaeger

Confirm that its working properly on http1.1

curl --http1.1  -I https://jaeger.foo.com/

Then hit it with a http2 call

curl --http2  -I https://jaeger.foo.com/

Version (include the output of istioctl version --remote and kubectl version and helm version if you used Helm)

$ istioctl version --remote
client version: 1.4.0-beta.5
control plane version: 1.4.0
data plane version: 1.4.0-beta.5 (2 proxies), 1.3.5 (1 proxies), 1.4.0 (1 proxies)
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.11-dispatcher", GitCommit:"2e298c7e992f83f47af60cf4830b11c7370f6668", GitTreeState:"clean", BuildDate:"2019-09-19T22:26:40Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.11-gke.14", GitCommit:"56d89863d1033f9668ddd6e1c1aea81cd846ef88", GitTreeState:"clean", BuildDate:"2019-11-07T19:12:22Z", GoVersion:"go1.12.11b4", Compiler:"gc", Platform:"linux/amd64"}

How was Istio installed?
helm template install/kubernetes/helm/istio --name istio --namespace istio-system --set grafana.enabled=false --set kiali.enabled=true --set prometheus.enabled=false --set tracing.enabled=true --set tracing.ingress.enabled=true --set pilot.traceSampling=100

Environment where bug was observed (cloud vendor, OS, etc)

GKE

@objectiser

This comment has been minimized.

Copy link
Contributor

@objectiser objectiser commented Nov 19, 2019

When the default tracing gateway is deployed using these instructions, and port forwarding setup (due to using minikube), I am able to run:

$ curl --http1.1  -I http://localhost:15032/jaeger
HTTP/1.1 200 OK
content-type: text/html; charset=utf-8
date: Tue, 19 Nov 2019 14:04:35 GMT
x-envoy-upstream-service-time: 0
server: istio-envoy
transfer-encoding: chunked

$ curl --http2  -I http://localhost:15032/jaeger
HTTP/1.1 200 OK
content-type: text/html; charset=utf-8
date: Tue, 19 Nov 2019 14:04:40 GMT
x-envoy-upstream-service-time: 0
server: istio-envoy
transfer-encoding: chunked

So Jaeger can be accessed with either http1.1 or http2 - therefore the problem appears to be elsewhere.

@objectiser objectiser removed their assignment Nov 19, 2019
@objectiser

This comment has been minimized.

Copy link
Contributor

@objectiser objectiser commented Nov 19, 2019

Noticed that the response was still showing HTTP/1.1 so exposed port 16686 on the jaeger all-in-one pod to show the verbose response:

$ curl --http2  -v -I http://localhost:16686/jaeger
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 16686 (#0)
> HEAD /jaeger HTTP/1.1
> Host: localhost:16686
> User-Agent: curl/7.64.0
> Accept: */*
> Connection: Upgrade, HTTP2-Settings
> Upgrade: h2c
> HTTP2-Settings: AAMAAABkAARAAAAAAAIAAAAA
> 
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8
Content-Type: text/html; charset=utf-8
< Date: Tue, 19 Nov 2019 14:24:02 GMT
Date: Tue, 19 Nov 2019 14:24:02 GMT

< 
* Connection #0 to host localhost left intact

So it is trying to upgrade the connection to http2.

@mooperd

This comment has been minimized.

Copy link
Author

@mooperd mooperd commented Nov 19, 2019

It seems there is something weird happening on the wire between envoy and jaeger. In the 4th frame we can see that the ingress gateway(10.32.1.3) is firing some data at jaeger(10.32.4.5) which I assume is http2 although I haven't been able to decode it or look at the contents.

In frame 11 the Jaeger gives a HTTP1.1 301 redirect to /* which doesn't make much sense to me. I guess it's interpreting something.

I found this ticket which seems to be related: envoyproxy/envoy#7925

~# tshark -T text -i eth0 'tcp port 16686'

    1 0.000000000    10.32.1.3 ? 10.32.4.5    TCP 74 53154 ? 16686 [SYN] Seq=0 Win=28400 Len=0 MSS=1420 SACK_PERM=1 TSval=4018161834 TSecr=0 WS=128
    2 0.001309586    10.32.4.5 ? 10.32.1.3    TCP 74 16686 ? 53154 [SYN, ACK] Seq=0 Ack=1 Win=28160 Len=0 MSS=1420 SACK_PERM=1 TSval=474254736 TSecr=4018161834 WS=128
    3 0.001328469    10.32.1.3 ? 10.32.4.5    TCP 66 53154 ? 16686 [ACK] Seq=1 Ack=1 Win=28416 Len=0 TSval=4018161836 TSecr=474254736
    4 0.001441996    10.32.1.3 ? 10.32.4.5    TCP 800 53154 ? 16686 [PSH, ACK] Seq=1 Ack=1 Win=28416 Len=734 TSval=4018161836 TSecr=474254736
    5 0.001561246    10.32.4.5 ? 10.32.1.3    TCP 66 16686 ? 53154 [ACK] Seq=1 Ack=735 Win=29696 Len=0 TSval=474254737 TSecr=4018161836
    6 0.001738366    10.32.4.5 ? 10.32.1.3    TCP 75 16686 ? 53154 [PSH, ACK] Seq=1 Ack=735 Win=29696 Len=9 TSval=474254737 TSecr=4018161836
    7 0.001745476    10.32.1.3 ? 10.32.4.5    TCP 66 53154 ? 16686 [ACK] Seq=735 Ack=10 Win=28416 Len=0 TSval=4018161836 TSecr=474254737
    8 0.001790715    10.32.1.3 ? 10.32.4.5    TCP 75 53154 ? 16686 [PSH, ACK] Seq=735 Ack=10 Win=28416 Len=9 TSval=4018161836 TSecr=474254737
    9 0.001925185    10.32.4.5 ? 10.32.1.3    TCP 75 16686 ? 53154 [PSH, ACK] Seq=10 Ack=744 Win=29696 Len=9 TSval=474254737 TSecr=4018161836
   10 0.001953463    10.32.1.3 ? 10.32.4.5    TCP 75 53154 ? 16686 [PSH, ACK] Seq=744 Ack=19 Win=28416 Len=9 TSval=4018161836 TSecr=474254737
   11 0.002137800    10.32.4.5 ? 10.32.1.3    HTTP 191 HTTP/1.1 301 Moved Permanently 
   12 0.002141757    10.32.4.5 ? 10.32.1.3    TCP 66 16686 ? 53154 [RST, ACK] Seq=144 Ack=753 Win=29696 Len=0 TSval=474254737 TSecr=4018161836

~# tcpdump -i eth0 'tcp port 16686'

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
15:31:35.333984 IP istio-ingressgateway-6b789644bf-n6qpq.53558 > 10.32.4.5.16686: Flags [S], seq 1917073276, win 28400, options [mss 1420,sackOK,TS val 4018367662 ecr 0,nop,wscale 7], length 0
15:31:35.335453 IP 10.32.4.5.16686 > istio-ingressgateway-6b789644bf-n6qpq.53558: Flags [S.], seq 3934162185, ack 1917073277, win 28160, options [mss 1420,sackOK,TS val 474460546 ecr 4018367662,nop,wscale 7], length 0
15:31:35.335468 IP istio-ingressgateway-6b789644bf-n6qpq.53558 > 10.32.4.5.16686: Flags [.], ack 1, win 222, options [nop,nop,TS val 4018367663 ecr 474460546], length 0
15:31:35.335574 IP istio-ingressgateway-6b789644bf-n6qpq.53558 > 10.32.4.5.16686: Flags [P.], seq 1:734, ack 1, win 222, options [nop,nop,TS val 4018367663 ecr 474460546], length 733
15:31:35.335697 IP 10.32.4.5.16686 > istio-ingressgateway-6b789644bf-n6qpq.53558: Flags [.], ack 734, win 232, options [nop,nop,TS val 474460547 ecr 4018367663], length 0
15:31:35.335973 IP 10.32.4.5.16686 > istio-ingressgateway-6b789644bf-n6qpq.53558: Flags [P.], seq 1:10, ack 734, win 232, options [nop,nop,TS val 474460547 ecr 4018367663], length 9
15:31:35.335978 IP istio-ingressgateway-6b789644bf-n6qpq.53558 > 10.32.4.5.16686: Flags [.], ack 10, win 222, options [nop,nop,TS val 4018367664 ecr 474460547], length 0
@mooperd

This comment has been minimized.

Copy link
Author

@mooperd mooperd commented Nov 19, 2019

Jaeger doesn't support http2.

# curl --http2-prior-knowledge http://localhost:16686/ -Iv
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 16686 (#0)
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x55d9f016b580)
> HEAD / HTTP/2
> Host: localhost:16686
> User-Agent: curl/7.58.0
> Accept: */*
> 
* http2 error: Remote peer returned unexpected data while we expected SETTINGS frame.  Perhaps, peer does not support HTTP/2 properly.
* Connection #0 to host localhost left intact
curl: (16) Error in the HTTP2 framing layer
@objectiser

This comment has been minimized.

Copy link
Contributor

@objectiser objectiser commented Nov 19, 2019

@mooperd Could you create a feature request in the Jaeger repo?

@mooperd

This comment has been minimized.

Copy link
Author

@mooperd mooperd commented Nov 19, 2019

@objectiser - I can do - but does that mean that the solution to this problem is 'upgrade your http1.1 services to http2? I have a feeling that proxies should terminate and reestablish a connection or allow clients to negotiate their connection with the backend service. Honestly I'm out of my depth but I'm pretty surprised this doesn't work.

@objectiser

This comment has been minimized.

Copy link
Contributor

@objectiser objectiser commented Nov 19, 2019

@mooperd Not my area of expertise but agree with you that Istio/Envoy should be able to cope. So @douglas-reid is there someone from the networking area that could comment on whether Istio/Envoy could handle this?

However adding support for http2 in Jaeger may be worthwhile anyway.

@douglas-reid

This comment has been minimized.

Copy link
Contributor

@douglas-reid douglas-reid commented Nov 19, 2019

@rshriram can we borrow your expertise here?

@mooperd

This comment has been minimized.

Copy link
Author

@mooperd mooperd commented Nov 19, 2019

Out of curiosity I had a look what nginx does in this situation - it seems to be able to allow negotiating the protocol properly.

server {
	location / {
    		proxy_set_header Accept-Encoding "";
    		proxy_pass http://localhost:16686;
	}
}
root@vagrant:~# curl --http2 http://localhost:80/ -Iv
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 80 (#0)
> HEAD / HTTP/1.1
> Host: localhost
> User-Agent: curl/7.58.0
> Accept: */*
> Connection: Upgrade, HTTP2-Settings
> Upgrade: h2c
> HTTP2-Settings: AAMAAABkAARAAAAAAAIAAAAA
> 
< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Tue, 19 Nov 2019 19:44:31 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 1179
< Connection: keep-alive
< 
* Connection #0 to host localhost left intact

--http2-prior-knowledge forces http2.

root@vagrant:~# curl --http2-prior-knowledge http://localhost:80/ -Iv
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 80 (#0)
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x561f24505580)
> HEAD / HTTP/2
> Host: localhost
> User-Agent: curl/7.58.0
> Accept: */*
> 
* http2 error: Remote peer returned unexpected data while we expected SETTINGS frame.  Perhaps, peer does not support HTTP/2 properly.
* Connection #0 to host localhost left intact
curl: (16) Error in the HTTP2 framing layer
@mooperd

This comment has been minimized.

Copy link
Author

@mooperd mooperd commented Nov 19, 2019

@mooperd mooperd changed the title Exposing jager via ingress gateway gives strange errors with http2 Exposing jager via ingress gateway gives strange errors with http2 - Envoy doesn't support proper http1.1 -> http2 ALPN Nov 20, 2019
@mooperd mooperd changed the title Exposing jager via ingress gateway gives strange errors with http2 - Envoy doesn't support proper http1.1 -> http2 ALPN Exposing jager via ingress gateway gives strange errors with http2 - Envoy breaks http1.1 <-> http2 ALPN Nov 20, 2019
@mooperd

This comment has been minimized.

Copy link
Author

@mooperd mooperd commented Nov 20, 2019

It’s a bug in Envoy. It doesn’t support/screws up Application-Layer Protocol Negotiation(ALPN) If you have a http1.1 service in the backend and a client sending Upgrade: h2c - Envoy will upgrade the connection to http2 and then force the backend to use http2 even if its not supported.

I have been investigating what is "normal" and have found nginx is able to accept and incoming http2 connection and then connect to http1.1 services in the backend transparently.

@objectiser

This comment has been minimized.

Copy link
Contributor

@objectiser objectiser commented Nov 21, 2019

@mooperd Can this issue be closed then? As it is an envoy bug and there is a nginx workaround.

@mooperd

This comment has been minimized.

Copy link
Author

@mooperd mooperd commented Nov 21, 2019

@objectiser I think this should stay open. The envoy peeps don't actually seem very keen on fixing it and I don't think that using nginx is really a valid workaround. Could we see if its getting any +1s in the next month and then let the stale bot kill it?

@jonbcampos-alto

This comment has been minimized.

Copy link

@jonbcampos-alto jonbcampos-alto commented Nov 26, 2019

just upgraded to istio 1.4 and having the same issue. exactly the same

@jonbcampos-alto

This comment has been minimized.

Copy link

@jonbcampos-alto jonbcampos-alto commented Nov 26, 2019

fyi. adding http- in front of the service port name fixed the issue if you weren't doing that already. We found the issue by finding this thread and confirmed it with 2 methods:

  1. curl with and without http2
    curl --http1.1 -I https://api...com/your-path/
    and
    curl --http2 -I https://api...com/your-path/
    if the http1.1 works and the http2 doesn't, you are in this issue. But the other tests.

  2. chrome with http2 turned off.
    Our sites came back on and worked.

These 2 tests provided the clear definition and then all we did was update the service to get things working again.

@yurishkuro

This comment has been minimized.

Copy link

@yurishkuro yurishkuro commented Nov 27, 2019

Don't know if this could be related, but Jaeger serves both HTTP and gRPC traffic on the same port using the cmux library.

@PsychoSid

This comment has been minimized.

Copy link

@PsychoSid PsychoSid commented Jan 9, 2020

Hi - I got this when loading up 1.4.3 today and using Spinnaker which doesn't expose gRPC so that combination doesn't seem to be an issue here.

@cdyue

This comment has been minimized.

Copy link

@cdyue cdyue commented Jan 13, 2020

same issue on 1.3.5, 1.3.6 and 1.4.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.