New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C# crashes with AccessViolation exception if a unary call is started after the all the channels have been shutdown #19090
Comments
seeing the native stack would be useful. The native symbols are in Grpc.Core.NativeDebug packages. |
Also, what kind of error does the python server returns? Do you have a reproduction of the problem? Please see https://github.com/grpc/grpc/blob/master/TROUBLESHOOTING.md and try to include some extra logs. |
Hi,
Thanks for your response.
The crash is not systematic. I cannot understand for now whether there are other factor for the bug.
I will try to get the native stack on next occurrence
Thanks
From: Jan Tattermusch <notifications@github.com>
Sent: Wednesday, May 22, 2019 5:31 PM
To: grpc/grpc <grpc@noreply.github.com>
Cc: Olivier Uzan <Olivier.Uzan@clicksoftware.com>; Author <author@noreply.github.com>
Subject: Re: [grpc/grpc] C# AccessViolation error when Python server returns an error (#19090)
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Also, what kind of error does the python server returns? Do you have a reproduction of the problem?
Please see https://github.com/grpc/grpc/blob/master/TROUBLESHOOTING.md<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgrpc%2Fgrpc%2Fblob%2Fmaster%2FTROUBLESHOOTING.md&data=02%7C01%7C%7Cb2cd867323044a1112c708d6dec22de9%7C59274c447fb746e5bda99c6a6d5e4668%7C0%7C0%7C636941322901851221&sdata=3HNc4OhK%2Boh3X9CorDxVE4pavabcAxubPB%2FsHI0wW28%3D&reserved=0> and try to include some extra logs.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgrpc%2Fgrpc%2Fissues%2F19090%3Femail_source%3Dnotifications%26email_token%3DAJSH7NPBVBTM2ZJD2AP7C5DPWVKMBA5CNFSM4HOHOFC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV7HUPY%23issuecomment-494828095&data=02%7C01%7C%7Cb2cd867323044a1112c708d6dec22de9%7C59274c447fb746e5bda99c6a6d5e4668%7C0%7C0%7C636941322901861214&sdata=7LIO6lgNnnicmU6Zus48Ya9UIhUFqHBE%2B6EyUky42SQ%3D&reserved=0>, or mute the thread<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAJSH7NLFBATCDJJRZAUUB4TPWVKMBANCNFSM4HOHOFCQ&data=02%7C01%7C%7Cb2cd867323044a1112c708d6dec22de9%7C59274c447fb746e5bda99c6a6d5e4668%7C0%7C0%7C636941322901861214&sdata=vIuJhnU547PsPIz4Xyez3nF0qX%2FiTTeVJT4QA3auhco%3D&reserved=0>.
CONFIDENTIALITY NOTICE: This email may contain ClickSoftware confidential and privileged material for the sole use of the intended recipient(s). Any review, use, distribution or disclosure by others is strictly prohibited. If you have received this communication in error, please notify the sender immediately by email and delete the message and any file attachments from your computer. Thank you.'. If the disclaimer can't be applied, attach the message to a new disclaimer message.
|
Hi,
Is this error more helpful? Can it be connected to the fact that I may try to Cancel the Thread?
```
Application: w3wp.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.AccessViolationException
at Grpc.Core.Internal.CompletionQueueSafeHandle.ReleaseHandle()
at System.Runtime.InteropServices.SafeHandle.InternalDispose()
at Grpc.Core.Internal.AsyncCall`2[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].UnaryCall(System.__Canon)
at Grpc.Core.DefaultCallInvoker.BlockingUnaryCall[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](Grpc.Core.Method`2<System.__Canon,System.__Canon>, System.String, Grpc.Core.CallOptions, System.__Canon)
at Grpc.Core.Interceptors.InterceptingCallInvoker.<BlockingUnaryCall>b__3_0[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System.__Canon, Grpc.Core.Interceptors.ClientInterceptorContext`2<System.__Canon,System.__Canon>)
at Grpc.Core.ClientBase+ClientBaseConfiguration+ClientBaseConfigurationInterceptor.BlockingUnaryCall[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System.__Canon, Grpc.Core.Interceptors.ClientInterceptorContext`2<System.__Canon,System.__Canon>, BlockingUnaryCallContinuation`2<System.__Canon,System.__Canon>)
at Grpc.Core.Interceptors.InterceptingCallInvoker.BlockingUnaryCall[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](Grpc.Core.Method`2<System.__Canon,System.__Canon>, System.String, Grpc.Core.CallOptions, System.__Canon)
at PredictiveEstimation+PredictiveEstimationClient.GetPredictiveEstimation(EstimatorRequest, Grpc.Core.CallOptions)
at PredictiveEstimation+PredictiveEstimationClient.GetPredictiveEstimation(EstimatorRequest, Grpc.Core.Metadata, System.Nullable`1<System.DateTime>, System.Threading.CancellationToken)
at W6GISEProvider.W6GISEProvider.W6GISEProvider.IW6GISEstimationProviderServices.QueryRoute(W6GISObjectModel.W6Location, W6GISObjectModel.W6Location, W6GISObjectModel.W6RouteParameters, Int32, Int32)
at W6ObjectModel.W6GISERetriever.GetRouteFromProvider()
at System.Threading.Tasks.Task.Execute()
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)
at System.Threading.Tasks.Task.ExecuteEntry(Boolean)
at System.Threading.ThreadPoolWorkQueue.Dispatch()
```
|
I'm seeing the same exception/stack trace on 2.26.0, and (in my case) it seems to be related to calls being made around the time of a channel being shut down. I have been able to capture the following native stack:
I have also been able to put together a fairly compact repro, though it seems timing-dependent, so may take some time for the access violation to occur. You should expect to see "Caught" being written to the console every few seconds, followed eventually by the exception in question. I have tested x64 only in both Debug and Release modes:
The above code uses following .proto definitions for the service:
|
This issue/PR has been automatically marked as stale because it has not had any update (including commits, comments, labels, milestones, etc) for 30 days. It will be closed automatically if no further update occurs in 7 day. Thank you for your contributions! |
@cosborne83 on what platform do I have to run to reproduce? |
@olivieruzan what exactly do you mean by "try to Cancel the thread?" |
@cosborne83 it looks like you only reproduced the exception that is thrown why you try to start a call on an already-disposed channel (which is working as expected, you shouldn't try to invoke calls on a disposed channel), but you haven't reproduced the crash that @olivieruzan has mentioned. |
For now I'll close the issue as "cannot reproduce" and the instructions to reproduce aren't clear. I can reopen if more evidence is provided. |
@jtattermusch - I've created https://github.com/cosborne83/GrpcBug which contains a self-contained Visual Studio 2019 solution that reproduces the original AccessViolation issue per the code in my comment above and using the latest gRPC NuGet package (2.28.1). I have confirmed it reproduces the issue on Windows (10, v1909, build 18363.778 in my case, though I don't believe OS is a factor), and in both Debug/Release configuration for both x86 and x64 targets. My expectation is that you would simply see
|
@cosborne83 I can reproduce using the code you provided. I will investigate. |
@cosborne83 some results on analyzing your reproduction:
so basically, your example can be simplified to this:
so basically you're trying to start a new call on a channel after the shutdown of that channel has been requested and it has also fully finished (the That said, while your example doesn't really do anything that a real application should ever do (and is thus mostly theoretical), it does indeed crash and that's wrong.
Possible workarounds:
|
I think the unary calls crashes here:
(when one attempts to start a unary call after all channel have been shutdown and thus the internal grpc environment has been shutdown too). |
@jtattermusch - I wouldn't say it's entirely theoretical, the basis of the repro I posted was a slimmed-down version of our production client code. In our case we use short-lived SSL certificates for client authentication, and found that if the certificate was rejected by the server (e.g. due to it having expired), that the We have however managed to work around the issue using the "dummy channel" approach you mention. |
@cosborne83 a fix that prevents the crash on the grpc side is here: #23003. What I meant by "theoretical" is that the issue is completely preventable in the user code (maybe my wording wasn't the best but you get the idea). |
Using grpc 1.19.0 for C# as Nuget package for VisualStudio 2017 on Win2012 r2 server
The C# program is crashing with AccessViolation error when the remote Python server returns an error:
The text was updated successfully, but these errors were encountered: