-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream removed, when server stream takes more then 60 seconds in very specific scenario #31640
Comments
@jtattermusch can you do the initial triage and if it is a Core issue, can you re-assign? |
As far as I can tell the server (or something between the server and the client) is doing the disconnect (the RST) - since it is going in the direction 5000 -> 52687 (and server is on port 5000). Could you provide a gRPC debug trace on the client side. See https://github.com/grpc/grpc/blob/master/TROUBLESHOOTING.md |
Since the server is likely running behind a proxy (has private IP and there's an "internal network hub"), this looks like a classic case of the proxy terminating the connection (which then shows up as a "reset / stream removed" on the client). It would be good to understand the scenario more:
|
Thanks for you reactions! sorry for my late reaction, got caught up in other work :). I have changed the test scenario a bit, to get some more information. To answer your questions:
Attached I have the client side logging when using Grpc.Core.Channel(ChannelOutput.log) and when using Grpc.Net.Client.GrpcChannel(GrpcChannelOutput.log) to do the same calls. |
@Pascal1986 Thank you for the additional information. A very brief look at the log files does confirm that the connection is being dropped on the non-client side (server or something in between). I will spend some time later seeing whether I can reproduce this. but I expect it is specific to you environment. I see you have provided extracts for the code used above, but are you have to provide complete example code (zip or github repository) for a test client and server so I can make sure I'm testing exactly the same code as you are using? If you have any more information about firewalls or other network configuration then that might be helpful. Have you tried to reproduce this connecting to a server in your local network rather than Azure? |
The source code can be found at: https://github.com/Pascal1986/GrpcDemoRepository I will try to gather more information on the network configuration in the meantime. I also tested this on a server on the local network and everything worked as expected. Can you think of anything that differs between the two types of Channel that explains why everything works as expected for GrpcChannel and not for Channel? |
@Pascal1986 that you for the code. Working fine in my environment but as I don't currently have access to Azure I know I'm not replicating your setup. There are a few things that you can do to try and diagnose the problem and the difference between the gRPC-core and gRPC-dotnet clients.
set GRPC_TRACE=http,http1,http_keepalive,http2_stream_state
set GRPC_VERBOSITY=DEBUG this will give us the HTTP chatter and the keepalive pings.
|
And the log file of the client using Channel |
@Pascal1986 thank you for the information. The wireshark logs do show that the connection is being terminated by something in between the client and the server since both the client and the server receive a RST from the "remote" port. I'm not sure if there are any known functional differences in the HTTP2 implementations (Grpc.Core and Grpc-dotnet) - I'll have to get input from elsewhere. I know they are built on completely different network stacks. I also note that the HTTP2 implementation in the In the meantime if you do find any information about what may be in the middle that might be terminating the connections then that might help. @jtattermusch @JamesNK - any ideas why Grpc.Core and Grpc-dotnet might behave differently? Or if there have been fixes to HTTP2 that need to be merged to the branch? |
We looked at what might cause the termination of the connection in the middle, but unfortunately could not find anything. Is there any more information on the difference between Grpc.Core and Grpc-dotnet? |
We did another session to try to pinpoint the issue somewhere in the connection between the client and the server.
Also we played around with the time between the messages being send over the stream
Also Is it possible for us to try using the Grpc.Core channel with a newer implementation of Http2 so we can maybe try to find a solution this way? |
@Pascal1986 Without access to your network or the full network traces it is hard to say what is happening. I agree the behaviour doesn't seem to make sense with the scenarios 3 and 4 above. It is unlikely that all the changes to the core code for HTTP/2 will be back ported to the C# 1.46.x branch without a good reason to do so (as in, we can identify the exact issue that needs fixing). Two suggestions I have:
|
I got the ChannelOptions working on the server side by using Grpc.Core.Server. And when playing around with some traces and settings I found that sometimes the stream does continue succesfully after 30 seconds. But this only works Sometimes under the following conditions:
I gathered the trace logs for the client and server for when it fails and when it succeeds: There are 2 different errors being logged at the client when it fails, included them both. Can you maybe find a reason why it sometimes works under these circumstances? |
@Pascal1986 The errors in the logs are consistent with something in the middle terminating the connections. So no wiser I'm afraid. |
Since we already put quite a bit of effort into investigating this and we don't seem to be getting anywhere, I'll close this issue as "not actionable". We can reopen if there are some more findings/inputs that would help us determine if this is an actual grpc bug or if something else in user's environment is at fault. |
What version of gRPC and what language are you using?
gRPC.Core 2.0.0.0 net standard 2.0
What operating system (Linux, Windows,...) and version?
Windows 10
What runtime / compiler are you using (e.g. python version or version of gcc)
Client within local network
Server in Azure with private IP connected to our local network using an internal Network hub.
What did you do?
A stream removed error occurs in the following specific scenario:
What did you expect to see?
A succesfull response
What did you see instead?
The following exception message:
Status(StatusCode="Unknown", Detail="Stream removed", DebugException="Grpc.Core.Internal.CoreErrorDetailException: {"created":"@1668425723.993000000","description":"Error received from peer ipv4::5000","file":"......\src\core\lib\surface\call.cc","file_line":953,"grpc_message":"Stream removed","grpc_status":2}")
Anything else we should know about your project / environment?
Wireshark capture:
![image](https://user-images.githubusercontent.com/9460303/201672376-46fae2ff-37f7-43cc-b1f3-134ad02db231.png)
Proto file:
Client code:
Server code:
I tried using the following ChannelOptions to prevent this, but without success:
grpc.keepalive_time_ms => 1000
grpc.http2.max_pings_without_data => 0
grpc.keepalive_permit_without_calls => 1
The text was updated successfully, but these errors were encountered: