Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to connect to a http/2 gRPC application through envoy/contour #152

Closed
glerchundi opened this issue Jan 15, 2018 · 12 comments
Closed
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@glerchundi
Copy link
Contributor

First of all, thanks for this project. Looks very promising.

TLTR, I'm receiving upstream connect error or disconnect/reset before headers which seems that I misconfigured something but I'm not able to discover what!

Lets get hands dirty, I deployed everything in GKE with Kubernetes 1.8.6. Two namespaces edge-system & space-system. edge-system contains contour + envoy. space-system contains my testing gRPC service (take a look to the proto). The application service is working as expected because I made aN in-cluster test by running an one-shot client container and check if everything is behaving as I would expect.

It worked even deploying it inside the envoy container. Concretely, I made two tests, one accessing directly to the endpoint and the other going through the service:

$> kubectl get services/test -n space-system
NAME      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
test      ClusterIP   10.39.243.31   <none>        8080/TCP   19h
$> kubectl get endpoints/test -n space-system
NAME      ENDPOINTS         AGE
test      10.36.12.5:8080   19h
$> kubectl exec -ti contour-84fc9b758d-4gnx4 -c envoy -n edge-system sh
/ # ./test-client -addr 10.39.243.31:8080
2018/01/15 09:36:06 GetUser(_) = id:"ec5ae4c1-35ee-42d1-95cc-1ce3bc7bb551" display_name:"Gorka" 
2018/01/15 09:36:07 id:"b1d577a0-f9e9-454b-af8a-024e4c52c20c" content:"Door left opened" 
2018/01/15 09:36:08 id:"d263e680-d1a9-41b6-93a1-e61eabd6de57" content:"Intrusion detected" 
^C
/ # ./test-client -addr 10.36.12.5:8080
2018/01/15 09:36:06 GetUser(_) = id:"ec5ae4c1-35ee-42d1-95cc-1ce3bc7bb551" display_name:"Gorka" 
2018/01/15 09:36:19 id:"b1d577a0-f9e9-454b-af8a-024e4c52c20c" content:"Door left opened" 
2018/01/15 09:36:20 id:"d263e680-d1a9-41b6-93a1-e61eabd6de57" content:"Intrusion detected" 

So the conclusion is that something is going between envoy and the upstream. Now lets dig into the ingresses and if contour configure envoy as expected. I created just one ingress test.mydomain.com:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: test
  namespace: space-system
  annotations:
    kubernetes.io/ingress.class: "contour"
spec:
  rules:
  - host: test.mydomain.com
    http:
      paths:
      - backend:
          serviceName: test
          servicePort: 8080
        path: /

And seems like contour caught it:

$> kubectl exec -ti contour-84fc9b758d-4gnx4 -c envoy -n edge-system sh
/ # wget -O - http://127.0.0.1:9001/routes | cat
Connecting to 127.0.0.1:9001 (127.0.0.1:9001)
-                   [
{
"version_info": "40",
"route_config_name": "ingress_http",
"cluster_name": "xds_cluster",
"route_table_dump": {"name":"ingress_http","virtual_hosts":[{"name":"test.mydomain.com","domains":["test.mydomain.com"],"routes":[{"match":{"prefix":"/"},"route":{"cluster":"space-system/test/8080"}}]}]}
}
]
/ # wget -O - http://127.0.0.1:9001/listeners | cat
Connecting to 127.0.0.1:9001 (127.0.0.1:9001)
-                   ["0.0.0.0:8080"]
/ # wget -O - http://127.0.0.1:9001/clusters | grep space-system
Connecting to 127.0.0.1:9001 (127.0.0.1:9001)
-                   space-system/test/http::default_priority::max_connections::1024
space-system/test/http::default_priority::max_pending_requests::1024
space-system/test/http::default_priority::max_requests::1024
space-system/test/http::default_priority::max_retries::3
space-system/test/http::high_priority::max_connections::1024
space-system/test/http::high_priority::max_pending_requests::1024
space-system/test/http::high_priority::max_requests::1024
space-system/test/http::high_priority::max_retries::3
space-system/test/http::added_via_api::true
space-system/test/http::10.36.12.5:8080::cx_active::0
space-system/test/http::10.36.12.5:8080::cx_connect_fail::0
space-system/test/http::10.36.12.5:8080::cx_total::0
space-system/test/http::10.36.12.5:8080::rq_active::0
space-system/test/http::10.36.12.5:8080::rq_error::0
space-system/test/http::10.36.12.5:8080::rq_success::0
space-system/test/http::10.36.12.5:8080::rq_timeout::0
space-system/test/http::10.36.12.5:8080::rq_total::0
space-system/test/http::10.36.12.5:8080::health_flags::healthy
space-system/test/http::10.36.12.5:8080::weight::1
space-system/test/http::10.36.12.5:8080::region::ap-southeast-2
space-system/test/http::10.36.12.5:8080::zone::2b
space-system/test/http::10.36.12.5:8080::sub_zone::banana
space-system/test/http::10.36.12.5:8080::canary::false
space-system/test/http::10.36.12.5:8080::success_rate::-1
space-system/test/8080::default_priority::max_connections::1024
space-system/test/8080::default_priority::max_pending_requests::1024
space-system/test/8080::default_priority::max_requests::1024
space-system/test/8080::default_priority::max_retries::3
space-system/test/8080::high_priority::max_connections::1024
space-system/test/8080::high_priority::max_pending_requests::1024
space-system/test/8080::high_priority::max_requests::1024
space-system/test/8080::high_priority::max_retries::3
space-system/test/8080::added_via_api::true
space-system/test/8080::10.36.12.5:8080::cx_active::0
space-system/test/8080::10.36.12.5:8080::cx_connect_fail::0
space-system/test/8080::10.36.12.5:8080::cx_total::13
space-system/test/8080::10.36.12.5:8080::rq_active::0
space-system/test/8080::10.36.12.5:8080::rq_error::13
space-system/test/8080::10.36.12.5:8080::rq_success::0
space-system/test/8080::10.36.12.5:8080::rq_timeout::0
space-system/test/8080::10.36.12.5:8080::rq_total::13
space-system/test/8080::10.36.12.5:8080::health_flags::healthy
space-system/test/8080::10.36.12.5:8080::weight::1
space-system/test/8080::10.36.12.5:8080::region::ap-southeast-2
space-system/test/8080::10.36.12.5:8080::zone::2b
space-system/test/8080::10.36.12.5:8080::sub_zone::banana
space-system/test/8080::10.36.12.5:8080::canary::false
space-system/test/8080::10.36.12.5:8080::success_rate::-1

It seems like something is not working properly in envoy (see rq_errors) but I cannot figure out what. Checking Envoys log didn't help at all (at least for me) because it seems like access and error logs are interleaved and I didn't find any way to debug application errors easily...

Sorry for the long-post, can anyone help ME shedding some light diagnosing what could be causing this?

Thanks!

test.proto:

syntax = "proto3";

package test;

service TestService {
  // Client sends a single request and gets back a single response.
  rpc GetUser(GetUserRequest) returns (GetUserResponse);

  // Streams from the server.
  rpc GetEvents(GetEventsRequest) returns (stream Event);
}

message GetUserRequest {
  string id = 1;
}

message GetUserResponse {
  string id = 1;
  string display_name = 2;
}

message GetEventsRequest {
}

message Event {
  string id = 1;
  string content = 2;
}
@glerchundi
Copy link
Contributor Author

glerchundi commented Jan 15, 2018

I found part of the problem by putting envoy in debug mode (and understanding, btw, more o less how everything works!) and realising that I wasn't setting authority correctly.

But...now it seems that envoy is trying to connect to the upstreams with http1 instead of http2/gRPC. How does envoy/contour detect if it should use one or the other? I expected it to do so every time it sees an application/grpc content type.

This is the error: [2018-01-15 15:07:15.523][6][info][client] source/common/http/codec_client.cc:104] [C17] protocol error: http/1.1 protocol error: HPE_INVALID_CONSTANT

The error seems reasonable because my service is sending responses in h2.

Full request logs:

[2018-01-15 15:07:15.520][6][debug][main] source/server/connection_handler_impl.cc:129] [C16] new connection
[2018-01-15 15:07:15.520][6][debug][http2] source/common/http/http2/codec_impl.cc:625] [C16] setting stream-level initial window size to 268435456
[2018-01-15 15:07:15.520][6][debug][http2] source/common/http/http2/codec_impl.cc:649] [C16] updating connection-level initial window size to 268435456
[2018-01-15 15:07:15.521][6][debug][http] source/common/http/conn_manager_impl.cc:181] [C16] new stream
[2018-01-15 15:07:15.521][6][debug][http] source/common/http/conn_manager_impl.cc:439] [C16][S17638583713240259270] request headers complete (end_stream=false):
[2018-01-15 15:07:15.521][6][debug][http] source/common/http/conn_manager_impl.cc:444] [C16][S17638583713240259270]   ':method':'POST'
[2018-01-15 15:07:15.521][6][debug][http] source/common/http/conn_manager_impl.cc:444] [C16][S17638583713240259270]   ':scheme':'http'
[2018-01-15 15:07:15.521][6][debug][http] source/common/http/conn_manager_impl.cc:444] [C16][S17638583713240259270]   ':path':'/test.TestService/GetUser'
[2018-01-15 15:07:15.521][6][debug][http] source/common/http/conn_manager_impl.cc:444] [C16][S17638583713240259270]   ':authority':'test.mydomain.com'
[2018-01-15 15:07:15.521][6][debug][http] source/common/http/conn_manager_impl.cc:444] [C16][S17638583713240259270]   'content-type':'application/grpc'
[2018-01-15 15:07:15.521][6][debug][http] source/common/http/conn_manager_impl.cc:444] [C16][S17638583713240259270]   'user-agent':'grpc-go/1.9.1'
[2018-01-15 15:07:15.521][6][debug][http] source/common/http/conn_manager_impl.cc:444] [C16][S17638583713240259270]   'te':'trailers'
[2018-01-15 15:07:15.521][6][debug][router] source/common/router/router.cc:239] [C16][S17638583713240259270] cluster 'space-system/test/8080' match for URL '/test.TestService/GetUser'
[2018-01-15 15:07:15.521][6][debug][router] source/common/router/router.cc:284] [C16][S17638583713240259270]   ':method':'POST'
[2018-01-15 15:07:15.521][6][debug][router] source/common/router/router.cc:284] [C16][S17638583713240259270]   ':scheme':'http'
[2018-01-15 15:07:15.521][6][debug][router] source/common/router/router.cc:284] [C16][S17638583713240259270]   ':path':'/test.TestService/GetUser'
[2018-01-15 15:07:15.521][6][debug][router] source/common/router/router.cc:284] [C16][S17638583713240259270]   ':authority':'test.mydomain.com'
[2018-01-15 15:07:15.521][6][debug][router] source/common/router/router.cc:284] [C16][S17638583713240259270]   'content-type':'application/grpc'
[2018-01-15 15:07:15.521][6][debug][router] source/common/router/router.cc:284] [C16][S17638583713240259270]   'user-agent':'grpc-go/1.9.1'
[2018-01-15 15:07:15.521][6][debug][router] source/common/router/router.cc:284] [C16][S17638583713240259270]   'te':'trailers'
[2018-01-15 15:07:15.521][6][debug][router] source/common/router/router.cc:284] [C16][S17638583713240259270]   'x-forwarded-for':'10.36.10.1'
[2018-01-15 15:07:15.521][6][debug][router] source/common/router/router.cc:284] [C16][S17638583713240259270]   'x-forwarded-proto':'http'
[2018-01-15 15:07:15.521][6][debug][router] source/common/router/router.cc:284] [C16][S17638583713240259270]   'x-envoy-internal':'true'
[2018-01-15 15:07:15.521][6][debug][router] source/common/router/router.cc:284] [C16][S17638583713240259270]   'x-request-id':'075f7021-9768-49aa-b55a-dde63098bd97'
[2018-01-15 15:07:15.521][6][debug][router] source/common/router/router.cc:284] [C16][S17638583713240259270]   'x-envoy-expected-rq-timeout-ms':'15000'
[2018-01-15 15:07:15.521][6][debug][pool] source/common/http/http1/conn_pool.cc:73] creating a new connection
[2018-01-15 15:07:15.521][6][debug][client] source/common/http/codec_client.cc:23] [C17] connecting
[2018-01-15 15:07:15.521][6][debug][connection] source/common/network/connection_impl.cc:489] [C17] connecting to 10.36.8.7:8080
[2018-01-15 15:07:15.521][6][debug][connection] source/common/network/connection_impl.cc:498] [C17] connection in progress
[2018-01-15 15:07:15.521][6][debug][pool] source/common/http/http1/conn_pool.cc:99] queueing request due to no available connections
[2018-01-15 15:07:15.521][6][debug][http] source/common/http/conn_manager_impl.cc:653] [C16][S17638583713240259270] request end stream
[2018-01-15 15:07:15.522][6][debug][connection] source/common/network/connection_impl.cc:449] [C17] connected
[2018-01-15 15:07:15.522][6][debug][client] source/common/http/codec_client.cc:52] [C17] connected
[2018-01-15 15:07:15.522][6][debug][pool] source/common/http/http1/conn_pool.cc:225] [C17] attaching to next request
[2018-01-15 15:07:15.522][6][debug][router] source/common/router/router.cc:914] [C16][S17638583713240259270] pool ready
[2018-01-15 15:07:15.523][6][info][client] source/common/http/codec_client.cc:104] [C17] protocol error: http/1.1 protocol error: HPE_INVALID_CONSTANT
[2018-01-15 15:07:15.523][6][debug][connection] source/common/network/connection_impl.cc:139] [C17] closing data_to_write=0 type=1
[2018-01-15 15:07:15.523][6][debug][connection] source/common/network/connection_impl.cc:173] [C17] closing socket: 1
[2018-01-15 15:07:15.523][6][debug][client] source/common/http/codec_client.cc:70] [C17] disconnect. resetting 1 pending requests
[2018-01-15 15:07:15.523][6][debug][client] source/common/http/codec_client.cc:91] [C17] request reset
[2018-01-15 15:07:15.523][6][debug][router] source/common/router/router.cc:454] [C16][S17638583713240259270] upstream reset
[2018-01-15 15:07:15.523][6][debug][http] source/common/http/conn_manager_impl.cc:860] [C16][S17638583713240259270] encoding headers via codec (end_stream=false):
[2018-01-15 15:07:15.523][6][debug][http] source/common/http/conn_manager_impl.cc:865] [C16][S17638583713240259270]   ':status':'503'
[2018-01-15 15:07:15.523][6][debug][http] source/common/http/conn_manager_impl.cc:865] [C16][S17638583713240259270]   'content-length':'57'
[2018-01-15 15:07:15.523][6][debug][http] source/common/http/conn_manager_impl.cc:865] [C16][S17638583713240259270]   'content-type':'text/plain'
[2018-01-15 15:07:15.523][6][debug][http] source/common/http/conn_manager_impl.cc:865] [C16][S17638583713240259270]   'date':'Mon, 15 Jan 2018 15:07:15 GMT'
[2018-01-15 15:07:15.523][6][debug][http] source/common/http/conn_manager_impl.cc:865] [C16][S17638583713240259270]   'server':'envoy'
[2018-01-15 15:07:15.523][6][debug][http2] source/common/http/http2/codec_impl.cc:518] [C16] stream closed: 0
[2018-01-15 15:07:15.523][6][debug][pool] source/common/http/http1/conn_pool.cc:115] [C17] client disconnected
[2018-01-15 15:07:15.654][6][debug][connection] source/common/network/connection_impl.cc:433] [C16] remote close
[2018-01-15 15:07:15.654][6][debug][connection] source/common/network/connection_impl.cc:173] [C16] closing socket: 0
[2018-01-15 15:07:15.654][6][debug][main] source/server/connection_handler_impl.cc:59] [C16] adding to cleanup list
[2018-01-15 15:07:19.629][1][debug][main] source/server/server.cc:137] flushing stats
[2018-01-15T15:07:15.521Z] "POST /test.TestService/GetAccessPoint HTTP/2" 503 UC 43 57 2 - "10.36.10.1" "grpc-go/1.9.1" "075f7021-9768-49aa-b55a-dde63098bd97" "test.mydomain.com" "10.36.8.7:8080"

@davecheney
Copy link
Contributor

Thanks for that debugging, this looks like it is related to #15. The TL;DR is contour defaults to all backends (what envoy calls clusters) in http/1 non tls.

@davecheney davecheney changed the title Unable to connect to a gRPC application through envoy/contour Unable to connect to a http/2 gRPC application through envoy/contour Jan 15, 2018
@davecheney davecheney added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 15, 2018
@davecheney davecheney added this to the 0.4.0 milestone Jan 15, 2018
@davecheney
Copy link
Contributor

Also issue #14

@glerchundi
Copy link
Contributor Author

Thanks @davecheney.

Thanks for that debugging, this looks like it is related to #15. The TL;DR is contour defaults to all backends (what envoy calls clusters) in http/1 non tls.

It's related to both with and without tls. Basically is the same as the first bullet point in #14:

  • defining a protocol to identify the protocol a port speaks. The naive implementation would be the port's name, but this seems shortsighted.

What I don't know, and don't have time to read docs and investigate further, how envoy knows which protocol to use in the upstream. It seems like we can hint envoy by telling it which features it has but i don't really know :\

I would have expected it to work automagically, at least for gRPC, based on the Content-Type it received on the front.

@sercand
Copy link

sercand commented Feb 15, 2018

@glerchundi adding grpc support is easy just adding Http2ProtocolOptions to cluster config. I have implemented it in order to test if it is possible.

@glerchundi
Copy link
Contributor Author

@sercand cool! do you mind sending a PR and we can discuss the final solution there? We can put on the table grpc-web support as well.

@davecheney
Copy link
Contributor

I'd like to see this annotation defined on the ingress object, not the service. This means its blocked behind #211

@davecheney davecheney modified the milestones: 0.4.0, 0.5.0 Mar 12, 2018
@davecheney davecheney self-assigned this Mar 12, 2018
@davecheney davecheney added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Mar 12, 2018
@davecheney
Copy link
Contributor

Hmm. On reflection I think service is fine, but the annotation should probable be something like

contour.heptio.com/upstream-protocol.h2="port,port,port"

That is, you call out that port (name or number) is h2. A similar pattern could apply to backend TLS options.

@davecheney davecheney added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Mar 12, 2018
@davecheney
Copy link
Contributor

@glerchundi do you feel like submitting a PR? We're closing in on 0.4 this thursday, but maybe you want to try for this release?

@glerchundi
Copy link
Contributor Author

glerchundi commented Mar 12, 2018

Ok, so in order to confirm that we're talking about the same approach, let me put some examples:

kind: Service
apiVersion: v1
metadata:
  name: my-service
  annotations:
    contour.heptio.com/upstream-protocol.h2: "80,https"
spec:
  selector:
    app: MyApp
  ports:
  - name: http
    protocol: TCP
    port: 80
    targetPort: 9376
  - name: https
    protocol: TCP
    port: 443
    targetPort: 9377

Or these ones, following the example above:

    contour.heptio.com/upstream-protocol.h2: "80,443"
    contour.heptio.com/upstream-protocol.h2: "http,443"
    contour.heptio.com/upstream-protocol.h2: "http,https"

Or even this would be possible:

    contour.heptio.com/upstream-protocol.h2: "80,http,443,https"

I'm right?

@davecheney
Copy link
Contributor

Yup, all of those (although the last one is sheer masochism :( )

The way we should process these is put them into a map; then for each service entry in the Service document; add http2 options if the services' name or port is in the map.

@glerchundi
Copy link
Contributor Author

glerchundi commented Mar 12, 2018

Yep that is what I did in this PR: #272.

Lookyan pushed a commit to Lookyan/contour that referenced this issue Mar 30, 2018
Fixes projectcontour#152

Signed-off-by: Gorka Lerchundi Osa <glertxundi@gmail.com>
sunjayBhatia pushed a commit that referenced this issue Jan 30, 2023
Signed-off-by: Daneyon Hansen <daneyonhansen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

3 participants