-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gRPC 1.20.1 upgrade causes gRPC connections to fail #6897
Comments
@lizan any thoughts? |
@lizan @mattklein123 I am experiencing a similar issue. I can provide a reproducible example if needed. |
@gyang274 a reproducible example will be very helpful! |
@dio I actually fixed the problem by using Btw, this is on ubuntu 16.04 with grpc 1.20.3, google-protobuf 3.8.0-rc1. |
@gyang274 The non-deprecated way of specifying hosts is shown in the following example - https://github.com/envoyproxy/envoy/blob/master/examples/grpc-bridge/config/s2s-grpc-envoy.yaml#L35-L43 load_assignment:
cluster_name: local_service_grpc
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: host.docker.internal
port_value: 50051 |
My issue is definitely not the same one as @gyang274 's. As stated in my original report, the only change (quite reproducibly; a rollback fixed it, then another update broke things again) is the update of that service to the new gRPC version. I also verified that this error looks different than when connecting to a host that does not exist and an http/2 host that does not speak gRPC, in order to eliminate basic issues. We may be able to do some more experimentation soon (I'd personally like to see if this is a TLS issue), but it would be great to get some pointers on where to start. |
@TBoshoven gRPC 1.21.0 has just been released - would be interesting to see if you still have the issue using this - https://github.com/grpc/grpc/releases/tag/v1.21.0 |
I'll try to get it tested within a week. |
Unfortunately, after updating the service to gRPC 1.21.1, things still fail in the same way. |
I can add some more detail to what @TBoshoven said - I've spent much of the day working with combinations here, and with the same server code, if I swap grpc 1.19.0 for 1.20.0, it breaks in the way he describes. I did some packet capturing with Wireshark, and in the 1.20 case, the 19th packet is a I should note that this only is a problem with TLS enabled - without, 1.20 also works just fine. |
@alertedsnake @TBoshoven TLS enabled where? between client and envoy, or between envoy and upstream gRPC server, or both? |
@lizan sorry, I should have mentioned that. TLS enabled between envoy and the upstream gRPC service. It uses authentication, so it only listens with TLS. If we have the gRPC service listen without TLS, everything works fine. I'm using @TBoshoven 's envoy config, and I don't know anything about envoy here at all, so I'm not sure how to provide you the logs you're looking for. |
The logs I provided in my initial report are still valid. Here's my cleaned-up config with a bunch of stuff that I poked at in a comment: admin:
access_log_path: /tmp/admin_access.log
address:
socket_address: { address: 0.0.0.0, port_value: 9901 }
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 0.0.0.0, port_value: 8080 }
filter_chains:
- filters:
- name: envoy.http_connection_manager
config:
codec_type: auto
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match:
prefix: "/prefix"
route:
cluster: the_service
max_grpc_timeout: 0s
cors:
allow_origin:
- "http://localhost:8081"
allow_methods: GET, PUT, DELETE, POST, OPTIONS
allow_headers: keep-alive,user-agent,cache-control,content-type,content-transfer-encoding,custom-header-1,x-accept-content-transfer-encoding,x-accept-response-streaming,x-user-agent,x-grpc-web,grpc-timeout,authorization
max_age: "1728000"
expose_headers: grpc-status,grpc-message
http_filters:
- name: envoy.grpc_web
- name: envoy.cors
- name: envoy.router
clusters:
# Stuff I have poked at
# - name: the_service
# connect_timeout: 5s
# type: logical_dns
# http2_protocol_options:
# hpack_table_size: 0
# lb_policy: round_robin
# load_assignment:
# cluster_name: local_service_grpc
# endpoints:
# - lb_endpoints:
# - endpoint:
# address:
# socket_address:
# address: hostname
# port_value: 443
# tls_context:
# allow_renegotiation: true
# sni: hostname
# common_tls_context:
# validation_context:
# trusted_ca:
# filename: /etc/ssl/cert.pem
- name: the_service
connect_timeout: 5s
type: logical_dns
http2_protocol_options: {}
lb_policy: round_robin
hosts:
- socket_address:
address: hostname
port_value: 443
tls_context: {} The uncommented part of this worked with gRPC 1.19 and not 1.20. |
@lizan This is still an issue. Is there anything else we can provide to help identify and resolve the problem? EDIT: Just read that you asked for a trace log. Let me see what I can do. |
This is very useful. I was not aware of the
The line that struck me as odd is About the cert: curl accepts the certificate just fine ( |
I ended up switching to grpcwebproxy. It successfully connects to the backend services. |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions. |
I get the point of the stale bot, but this is a real issue that has not been solved, and so it shouldn't be closed. |
I had a similar connect issue and |
This fails due to a combination of different unrelated issues with docker and envoy. See: docker/for-mac#2965 envoyproxy/envoy#6897 moby/moby#1143
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions. |
I have similar issue, |
Title: gRPC 1.20.1 upgrade causes gRPC connections to fail
Description:
I use Envoy's gRPC-Web filter to integrate against a few gRPC applications built in Python. These applications use a secured connection.
Since they upgraded to gRPC v1.20.1 (from v1.19.x), Envoy has been unable to get proper responses. The version upgrade is the only change and other applications have no problem communicating with the updated services.
The error I am getting is gRPC status 14 with this message:
I tried running this against the current stable version as well as a (yesterday's) dev build and got the same results.
I have also been unable to reproduce this issue. I created a small gRPC application using Python and gRPC 1.20.1 and I could use gRPC to communicate with it. My little test service was unsecured but otherwise pretty similar, so I suspect perhaps a TLS issue.
The only other relevant thing I can think of is that all of these services are behind (TCP) load balancers.
Config:
My config is more or less the same as in the example. I fixed the indentation of the bottom part and (of course) updated addresses to match my services.
Logs:
These are the logs of a failing request (minus the request itself):
I tried a bunch of things on the Envoy config side, but nothing made any difference.
Could anyone point me at what might be causing the issue or what might help me get to the bottom of this?
The text was updated successfully, but these errors were encountered: