-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/net/http2: It seems like frames are getting corrupted in HTTP/2 when using a Unix domain socket on Windows #66372
Comments
@golang/windows, @neild can you have a look at this? |
This is using grpc-go, which uses the HTTP/2 frame encoder/decoder from The framer is reasonably straightforward. It reads from an I think the first thing I'd try is to dump all the bytes passing through the socket on both sides of the connection, to see if they're making it through as expected, and if not, what the variation is. |
I also experienced this issue. After digging into the problem, I figured out that the problem is not the corruption of the reader buffer but the mismatch between the length of buffer actually copied and the value returned by the Read function. The Read for the unix domain socket ultimately calls the Read method of FD in go/src/internal/poll/fd_windows.go, which calls system call WSARecv through execIO with o.qty pointer for bytes transfered parameter.
go/src/internal/poll/fd_windows.go Line 437 in a65a2bb
The callback for the WSARecv is handled by netpoll using GetQueuedCompletionStatusEx in go/src/runtime/netpoll_windows.go. go/src/runtime/netpoll_windows.go Line 120 in a65a2bb
And, at the handlecompletion function op.qty is written as the qty returned by WSAGetOverlappedResult. func handlecompletion(toRun *gList, op *net_op, errno int32, qty uint32) int32 { The issue comes from the fact that when the execIO awakes from the waiting, the o.qty is somewhat changed to the different value from what was set in the handlecompletion. To prove the theory, I added additional value op.pqty to keep the original value of op.qty in operation and net_op structure, and changed handlecompletion as follows. func handlecompletion(toRun *gList, op *net_op, errno int32, qty uint32) int32 { And I checked the o.pqty and o.qty after execIO was executed as follows.
As a result, I got the proof of the difference between two values. unexpected bytes transferred o.qty: 1082 , o.pqty: 26 , n: 1082 I'm not the 100% sure about when the overwriting happens. But I'm suspicious about the way that passes the o.qty as the parameter of the WSARecv, which may lead to the possible overwritting the value set in the handlecompletion.
But, in order to pinpoint the root cause of this issue, further investigation might be needed. |
cc @golang/runtime @golang/windows |
@seankhliao, And, I noticed a difference in the FileCompletionNotificationModes used between TCP and Unix domain socket while running various tests, referring to @abarelk's comment. TCP utilizes syscall.FILE_SKIP_COMPLETION_PORT_ON_SUCCESS, whereas Unix domain socket do not. Therefore, after adding 'unix' as follows, it's functioning without the aforementioned issue. In this test, I did not include passing nil instead of &o.qty as tested by @abarelk.
Is there any reason why Unix domain socket do not use syscall.FILE_SKIP_COMPLETION_PORT_ON_SUCCESS? |
Go version
go version go1.21.8 windows/amd64
Output of
go env
in your module/workspace:What did you do?
I initially observed these errors in Unix domain socket connection between daprd and its pluggable component. After modifying the RouteChat code in grpc-go/examples/route_guide, I was able to reproduce them.
The issue is reproducible on other machines as well, and it occurs consistently across different versions like go1.19 and go1.22. Additionally, it persists on Windows 11.
What I did
The changes to server.go and client.go can be found at this repository.
Use Unix domain socket
Include a dummy string in the Message for sizing purposes
Continuously send requests without concurrency and log status
Respond to received requests as they are
The line where the error occurred during debugging
D:\github.com\grpc-go\examples\vendor\golang.org\x\net\http2\frame.go
or
D:\github.com\grpc-go\examples\vendor\google.golang.org\grpc\internal\transport\http2_client.go
How to reproduce
Install go and set up environment variables
And if 'GODEBUG: http2debug=2' is present, delete it. When printing this debug log, no errors were observed to occur.
Clone the repositories and compile for Unix domain socket
Execute 'server.exe' and 'client.exe' in separate command windows.
Be prepared to wait for errors; they may occur after several minutes or even tens of minutes.
What did you see happen?
Try 1
Error occurred within 15 seconds.
server.exe logs
client.exe logs
Try 2
Error occurred within 1 seconds.
server.exe logs
client.exe logs
Try 3
Error occurred within 1 minute and 16 seconds.
server.exe logs
client.exe logs
Try 4 - seconds later
Error occurred within 2 minutes and 27 seconds.
server.exe logs
client.exe logs
Try 5
Error occurred within 2 minutes and 3 seconds.
server.exe logs
client.exe logs
Try 6
Error occurred within 2 minutes and 47 seconds.
This was debugging mode.
server.exe logs
client.exe logs
What did you expect to see?
Continue running as if using a TCP socket connection.
When using TCP connection, there are no code differences besides the address specified in 'server.go' and 'client.go'. They can be found at this repository.
The text was updated successfully, but these errors were encountered: