New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] WSAEFAULT #27292
Comments
Hi @Kiddinglife. Following up, did you determine there was no bug here? |
I am now testing it with 1.40.0 I can see there is bug fix in that version regarding to windsock. |
Not working. |
Do you have the exact line of code from the logs or trace that triggered the WSAEFAULT? |
@nicolasnoble please correct me if I'm wrong -
We are only checking for
I'm not sure about this but it doesn't seem to be used anyway, so it should be safe to use NULL. #27361
That does seem to be the case. @nicolasnoble do we need a lock here?
The
These achieve two separate things. In the first case, we already have a result from the read call and we just want to invoke the
I believe the reason is a shorthand for performance reasons, so as to avoid an asynchronous path . |
(Closed by mistake by clicking on |
What I'm really asking is for a log extract so I can see more specifically what's going on here. I have no idea what line of code in our codebase can emit the "overlapped WSARecv call in gRPC that fails with a WSAEFAULT error" message for instance. So before actually committing to investigate further, I'd like to make sure we have enough context and data, which we don't. |
Hey @nico I am now also very confused about the error. you got efault error and then stream removed and disconnected. Nothing special I would say so. As I said, I am doubting if there is a buffer slice overwrite/corruption no matter in heap/stack due to the potential race condition in iocp and background threads. I am now doing the experiment with only using the first |
Hi @Kiddinglife, @nicolasnoble said, if you can provide logs/traces that show the error that would be very helpful. |
Hi @yashykt, [2021-9-20 14:4:43.597] GRPCPlugin(Runtime) [3016] INFO: OP[message_decompress:185D1400]: CANCEL:{"created":"@1632164683.597000000","description":"Error in HTTP transport completing operation","file":"E:\GDK-VLT\Runtime\core\GDK\Externals\gRPC_1_40_0\src\core\ext\transport\chttp2\transport\chttp2_transport.cc","file_line":1236,"referenced_errors":[{"created":"@1632164683.597000000","description":"Attempt to send initial metadata after stream was closed","file":"E:\GDK-VLT\Runtime\core\GDK\Externals\gRPC_1_40_0\src\core\ext\transport\chttp2\transport\chttp2_transport.cc","file_line":1472,"referenced_errors":[{"created":"@1632164683.595000000","description":"Endpoint read failed","file":"E:\GDK-VLT\Runtime\core\GDK\Externals\gRPC_1_40_0\src\core\ext\transport\chttp2\transport\chttp2_transport.cc","file_line":2520,"grpc_status":14,"occurred_during_write":0,"referenced_errors":[{"created":"@1632164683.594000000","description":"OS Error","file":"E:\GDK-VLT\Runtime\core\GDK\Externals\gRPC_1_40_0\src\core\lib\iomgr\tcp_windows.cc","file_line":307,"os_error":"The system detected an invalid pointer address in attempting to use a pointer argument in a call.\r\n","syscall":"WSARecv","wsa_error":10014}]}]},{"created":"@1632164683.597000000","description":"Attempt to send trailing metadata after stream was closed","file":"E:\GDK-VLT\Runtime\core\GDK\Externals\gRPC_1_40_0\src\core\ext\transport\chttp2\transport\chttp2_transport.cc","file_line":1548}],"target_address":"ipv6:[::1]:50506"} |
#27457 might be related |
@nicolasnoble is driving this |
I am concerned the fact that the WSABUF buffers[MAX_WSABUF_COUNT] is allocated on the stack is problematic. Could that explain the error reported by @cfandrich ? grpc/src/core/lib/iomgr/tcp_windows.cc Line 248 in ea389c0
I have similar concerns for the WSASend side. |
This issue has become a collection of Windows I/O bugs, some of which have been fixed. I'm going to focus exclusively on the WSAEFAULT and its reproduction cases. For any other issues, please file another issue if they are still relevant. @cfandrich can you create a minimal, complete example that reproduces this issue? If it is still reproducible on the latest gRPC version, I'll be curious to dig in. If you suspect use-after-free, and can't produce an example, then consider using some Windows tools (some built into visual studio) that can help catch various memory corruption problems. You could consider running your application with a JIT debugger enabled, or running it in a debugger with various detection tools enabled.
@Schramp I believe we discussed this on another issue, and it is fine per the Windows API spec. |
yes, we already discussed this (#28432 (comment)), and you rightfully corrected me on my misunderstanding of the API. So you should ignore my remark on the stack allocation of buffers[MAX_WSABUF_COUNT] |
More than 30 days have passed since label "disposition/requires reporter action" was added. Closing this issue. Please feel free to re-open/create a new issue if this is still relevant. |
Problem
Brieftly to say, the erorr is
overlapped WSARecv call in gRPC that fails with a WSAEFAULT error
.What version of gRPC and what language are you using?
1.37.1
What operating system (Linux, Windows,...) and version?
windows 10
What runtime / compiler are you using (e.g. python version or version of gcc)
Python3.9 and VS2019 compiler
What did you do?
Run an sample ping-pong program with server side written by c++ and client side written written by csharp. let it run whole night.
What did you expect to see?
no errors
What did you see instead?
WSAEFAULT
I suspect if this line of code causes the bug
grpc/src/core/lib/iomgr/tcp_windows.cc
Line 290 in ea389c0
according to the msdn api doc,
is it handled correctly when the error is other than
WSAEWOULDBLOCK
? so I am wondering if the right check here isif (info->wsa_error == 0)
Also why not continuing reading more data when the received bytes == the recv buffer size in case there are more data in the socket buffer. there shoudl be a while loop for non-overlapped
wsarecv
until it gotewouldblock
error.I also suspect if this line of code causes the bug.
grpc/src/core/lib/iomgr/tcp_windows.cc
Line 299 in ea389c0
according to msdn api doc,
So I am wondering if
&bytes_read
shoudl be changed toNULL
.Also I am wondering how this piece of code works ?
grpc/src/core/lib/iomgr/tcp_windows.cc
Line 192 in ea389c0
why it is checking tcp->shutting_down instead of checking the retunred error code from
wsarecv
to detect the local/remote socket shutdown ?Also why it does not apply the locker when read the value of
tcp->shutting_down
? I can see it is protected with locker when set to true.grpc/src/core/lib/iomgr/tcp_windows.cc
Line 456 in ea389c0
Also why not using the erro code from the next time
wsarecv
orwsasend
to detect the local/remote socket abort/shutdown. The use oftcp-shutting_down
looks a bit doggy.Also there is inconsistency here:
grpc/src/core/lib/iomgr/tcp_windows.cc
Line 292 in ea389c0
grpc/src/core/lib/iomgr/tcp_windows.cc
Line 311 in ea389c0
finally I am wondering if it is correct to mix use of overlapped and non-overlapped
wsarecv
api ?when you created the socket, I can see it is overlapped socket. but why the non-overlapped
wsarecv
is used here ?grpc/src/core/lib/iomgr/socket_windows.cc
Line 188 in fd3bd70
Proposed Solution:
The way to operate with winsock in grpc looks a little bit doggy. the winsock apis are quite hard to make it right.
Please refer to the libuv as an example about how to operate winsock: either use overlapped or non-overlapped
wsarecv
.overlapped
wsarecv
: https://github.com/libuv/libuv/blob/9918a1743816dc49d6c664e41641d78ffd4a0705/src/win/tcp.c#L513non-overlapped
wsarecv
: https://github.com/libuv/libuv/blob/9918a1743816dc49d6c664e41641d78ffd4a0705/src/win/tcp.c#L1068I will recompile the grpc with the proposed changes later and then let us see how it goes.
The text was updated successfully, but these errors were encountered: