Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: It seems like frames are getting corrupted in HTTP/2 when using a Unix domain socket on Windows #66372

Open
dizecto opened this issue Mar 18, 2024 · 5 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.

Comments

@dizecto
Copy link

dizecto commented Mar 18, 2024

Go version

go version go1.21.8 windows/amd64

Output of go env in your module/workspace:

set GO111MODULE=
set GOARCH=amd64
set GOBIN=C:\Users\dze\go\bin
set GOCACHE=C:\Users\dze\AppData\Local\go-build
set GOENV=C:\Users\dze\AppData\Roaming\go\env  
set GOEXE=.exe
set GOEXPERIMENT=
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=git.ncsoft.net/*
set GOMODCACHE=C:\Users\dze\go\pkg\mod
set GONOPROXY=git.ncsoft.net/*
set GONOSUMDB=git.ncsoft.net/*
set GOOS=windows
set GOPATH=C:\Users\dze\go
set GOPRIVATE=git.ncsoft.net/*
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=C:\Program Files\Go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLCHAIN=auto
set GOTOOLDIR=C:\Program Files\Go\pkg\tool\windows_amd64
set GOVCS=
set GOVERSION=go1.21.8
set GCCGO=gccgo
set GOAMD64=v1
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=0
set GOMOD=D:\github.com\grpc-go\examples\go.mod
set GOWORK=
set CGO_CFLAGS=-O2 -g
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-O2 -g
set CGO_FFLAGS=-O2 -g
set CGO_LDFLAGS=-O2 -g
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=C:\Users\dze\AppData\Local\Temp\go-build297662695=/tmp/go-build -gno-record-gcc-switches

What did you do?

I initially observed these errors in Unix domain socket connection between daprd and its pluggable component. After modifying the RouteChat code in grpc-go/examples/route_guide, I was able to reproduce them.
The issue is reproducible on other machines as well, and it occurs consistently across different versions like go1.19 and go1.22. Additionally, it persists on Windows 11.

What I did

The changes to server.go and client.go can be found at this repository.

Use Unix domain socket

    // server.go
    os.Remove("D://temp/test.sock")
    lis, err := net.Listen("unix", "D://temp/test.sock")

    // client.go
    serverAddr = flag.String("addr", "unix:///temp/test.sock", "The server address in the format of host:port")

Include a dummy string in the Message for sizing purposes

    // client.go
    {Location: &pb.Point{Latitude: 0, Longitude: 1}, Message: "First message: 1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"}

Continuously send requests without concurrency and log status

    // client.go
    for {
        for _, note := range notes {
            if err := stream.Send(note); err != nil {
                log.Fatalf("client.RouteChat: stream.Send(%v) failed: %v, outstanding: %d", note, err, outstanding)
            }

            atomic.AddInt64(&sent, 1)
            // current := atomic.AddInt64(&outstanding, 1)
            atomic.AddInt64(&outstanding, 1)
            if true {
                if sent%10000 == 1 {
                    log.Printf("client.RouteChat: a lot of outstandings:  %4d, sent: %d, recv: %d", outstanding, sent, recv)
                }
                // time.Sleep(time.Microsecond)
                if sent%10000 == 1 {
                    log.Printf("client.RouteChat: decreased outstandings: %4d, sent: %d, recv: %d", outstanding, sent, recv)
                }
            }

            //<-recvc
        }
    }

Respond to received requests as they are

// server.go
func (s *routeGuideServer) RouteChat(stream pb.RouteGuide_RouteChatServer) error {
    for {
        in, err := stream.Recv()
        if err == io.EOF {
            return nil
        }
        if err != nil {
            return err
        }
        if err := stream.Send(in); err != nil {
            return err
        }
    }
}

The line where the error occurred during debugging

D:\github.com\grpc-go\examples\vendor\golang.org\x\net\http2\frame.go

func (fr *Framer) ReadFrame() (Frame, error) {
	fr.errDetail = nil
	if fr.lastFrame != nil {
		fr.lastFrame.invalidate()
	}
	fh, err := readFrameHeader(fr.headerBuf[:], fr.r)
	if err != nil {
		return nil, err
	}
	if fh.Length > fr.maxReadSize {
		return nil, ErrFrameTooLarge    // <-----
	}
    ...

or
D:\github.com\grpc-go\examples\vendor\google.golang.org\grpc\internal\transport\http2_client.go

func (t *http2Client) reader(errCh chan<- error) {
	defer close(t.readerDone)

	if err := t.readServerPreface(); err != nil {
		errCh <- err
		return
	}
	close(errCh)
	if t.keepaliveEnabled {
		atomic.StoreInt64(&t.lastRead, time.Now().UnixNano())
	}

	// loop to keep reading incoming messages on this transport.
	for {
		t.controlBuf.throttle()
		frame, err := t.framer.fr.ReadFrame()
		if t.keepaliveEnabled {
			atomic.StoreInt64(&t.lastRead, time.Now().UnixNano())
		}
		if err != nil {
			// Abort an active stream if the http2.Framer returns a
			// http2.StreamError. This can happen only if the server's response
			// is malformed http2.
			if se, ok := err.(http2.StreamError); ok {
				t.mu.Lock()
				s := t.activeStreams[se.StreamID]
				t.mu.Unlock()
				if s != nil {
					// use error detail to provide better err message
					code := http2ErrConvTab[se.Code]
					errorDetail := t.framer.fr.ErrorDetail()
					var msg string
					if errorDetail != nil {
						msg = errorDetail.Error()
					} else {
						msg = "received invalid frame"
					}
					t.closeStream(s, status.Error(code, msg), true, http2.ErrCodeProtocol, status.New(code, msg), nil, false)
				}
				continue
			} else {
				// Transport error.
				t.Close(connectionErrorf(true, err, "error reading from server: %v", err))
				return
			}
		}
		switch frame := frame.(type) {
		case *http2.MetaHeadersFrame:
			t.operateHeaders(frame)
		case *http2.DataFrame:
			t.handleData(frame)
		case *http2.RSTStreamFrame:
			t.handleRSTStream(frame)
		case *http2.SettingsFrame:
			t.handleSettings(frame, false)
		case *http2.PingFrame:
			t.handlePing(frame)
		case *http2.GoAwayFrame:
			t.handleGoAway(frame)
		case *http2.WindowUpdateFrame:
			t.handleWindowUpdate(frame)
		default:
			if logger.V(logLevel) {
				logger.Errorf("transport: http2Client.reader got unhandled frame type %v.", frame) // <-----
			}
		}
	}
}

How to reproduce

Install go and set up environment variables

  1. Install go1.21..windows-amd64.msi
  2. Set the following environment variables.
GRPC_GO_LOG_SEVERITY_LEVEL: info
GRPC_GO_LOG_VERBOSITY_LEVEL: 99

And if 'GODEBUG: http2debug=2' is present, delete it. When printing this debug log, no errors were observed to occur.

Clone the repositories and compile for Unix domain socket

> mkdir github.com
> cd github.com
github.com> git clone --depth 1 --branch v1.62.1 https://github.com/grpc/grpc-go.git
github.com> git clone https://github.com/dizecto/grpc-go-uds-test.git
github.com> copy /Y .\grpc-go-uds-test\route_guide\unix\client\client.go .\grpc-go\examples\route_guide\client\client.go
github.com> copy /Y .\grpc-go-uds-test\route_guide\unix\server\server.go .\grpc-go\examples\route_guide\server\server.go
github.com> cd grpc-go\exmaples
github.com\grpc-go\exmaples> go mod vendor
github.com\grpc-go\exmaples> go build -mod vendor route_guide\server\server.go
github.com\grpc-go\exmaples> go build -mod vendor route_guide\client\client.go

Execute 'server.exe' and 'client.exe' in separate command windows.

  1. Create a "temp" directory in the D drive
github.com\grpc-go\exmaples> mkdir D:\temp
  1. Launch 'server.exe'
github.com\grpc-go\exmaples> server.exe
2024/03/14 10:13:40 INFO: [core] [Server #1] Server created
2024/03/14 10:13:41 INFO: [core] [Server #1 ListenSocket #2] ListenSocket created

  1. Launch 'client.exe'
github.com\grpc-go\exmaples> client.exe
2024/03/14 10:14:28 INFO: [core] [Channel #1] Channel created
2024/03/14 10:14:28 INFO: [core] [Channel #1] original dial target is: "unix:///temp/test.sock"
2024/03/14 10:14:28 INFO: [core] [Channel #1] parsed dial target is: resolver.Target{URL:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/temp/test.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}

  1. Wait for errors

Be prepared to wait for errors; they may occur after several minutes or even tens of minutes.

What did you see happen?

Try 1

Error occurred within 15 seconds.

server.exe logs

D:\github.com\grpc-go\examples>server.exe
2024/03/14 10:13:40 INFO: [core] [Server #1] Server created
2024/03/14 10:13:41 INFO: [core] [Server #1 ListenSocket #2] ListenSocket created
2024/03/14 10:14:28 INFO: [core] CPU time info is unavailable on non-linux environments.
2024/03/14 10:14:43 INFO: [transport] [server-transport 0xc0000841a0] Closing: http2: frame too large
2024/03/14 10:14:43 INFO: [transport] [server-transport 0xc0000841a0] loopyWriter exiting with error: transport closed by client

client.exe logs

D:\github.com\grpc-go\examples>client.exe
2024/03/14 10:14:28 INFO: [core] [Channel #1] Channel created
2024/03/14 10:14:28 INFO: [core] [Channel #1] original dial target is: "unix:///temp/test.sock"
2024/03/14 10:14:28 INFO: [core] [Channel #1] parsed dial target is: resolver.Target{URL:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/temp/test.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
...
read unix @->/temp/test.sock: wsarecv: An existing connection was forcibly closed by the remote host."
2024/03/14 10:14:43 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to IDLE
2024/03/14 10:14:43 INFO: [transport] [client-transport 0xc0000e6000] loopyWriter exiting with error: transport closed by client
2024/03/14 10:14:43 client.RouteChat failed: rpc error: code = Unavailable desc = error reading from server: read unix @->/temp/test.sock: wsarecv: An existing connection was forcibly closed by the remote host.

Try 2

Error occurred within 1 seconds.

server.exe logs

D:\github.com\grpc-go\examples>server.exe
2024/03/14 11:18:32 INFO: [core] [Server #1] Server created
2024/03/14 11:18:33 INFO: [core] [Server #1 ListenSocket #2] ListenSocket created
2024/03/14 11:18:38 INFO: [core] CPU time info is unavailable on non-linux environments.
2024/03/14 11:18:39 INFO: [transport] [server-transport 0xc0001ae000] Closing: connection error: PROTOCOL_ERROR
2024/03/14 11:18:39 INFO: [transport] [server-transport 0xc0001ae000] loopyWriter exiting with error: transport closed by client

client.exe logs

D:\github.com\grpc-go\examples>client.exe
2024/03/14 11:18:38 INFO: [core] [Channel #1] Channel created
2024/03/14 11:18:38 INFO: [core] [Channel #1] original dial target is: "unix:///temp/test.sock"
2024/03/14 11:18:38 INFO: [core] [Channel #1] parsed dial target is: resolver.Target{URL:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/temp/test.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
...
2024/03/14 11:18:39 client.RouteChat: decreased outstandings:  115, sent: 430001, recv: 429886
2024/03/14 11:18:39 INFO: [transport] [client-transport 0xc00012c480] Closing: connection error: desc = "error reading from server: read unix @->/temp/test.sock: wsarecv: An existing connection was forcibly closed by the remote host."
2024/03/14 11:18:39 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to IDLE
2024/03/14 11:18:39 INFO: [transport] [client-transport 0xc00012c480] loopyWriter exiting with error: transport closed by client
2024/03/14 11:18:39 client.RouteChat: stream.Send(location:{longitude:1} message:"Fourth message: 1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890") failed: EOF, outstanding: 182

Try 3

Error occurred within 1 minute and 16 seconds.

server.exe logs

D:\github.com\grpc-go\examples>server.exe
2024/03/14 16:04:31 INFO: [core] [Server #1] Server created
2024/03/14 16:04:31 INFO: [core] [Server #1 ListenSocket #2] ListenSocket created
2024/03/14 16:04:34 INFO: [core] CPU time info is unavailable on non-linux environments.
2024/03/14 16:05:50 INFO: [transport] [server-transport 0xc0001ae000] Closing: http2: frame too large
2024/03/14 16:05:50 INFO: [transport] [server-transport 0xc0001ae000] loopyWriter exiting with error: connection error: desc = "transport is closing"

client.exe logs

D:\github.com\grpc-go\examples>client.exe
2024/03/14 16:04:34 INFO: [core] [Channel #1] Channel created
2024/03/14 16:04:34 INFO: [core] [Channel #1] original dial target is: "unix:///temp/test.sock"
2024/03/14 16:04:34 INFO: [core] [Channel #1] parsed dial target is: resolver.Target{URL:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/temp/test.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
...
2024/03/14 16:05:50 client.RouteChat: decreased outstandings:  126, sent: 19350001, recv: 19349875
2024/03/14 16:05:50 INFO: [transport] [client-transport 0xc0000d2000] Closing: connection error: desc = "error reading from server: read unix @->/temp/test.sock: wsarecv: An existing connection was forcibly closed by the remote host."
2024/03/14 16:05:50 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to IDLE
2024/03/14 16:05:50 INFO: [transport] [client-transport 0xc0000d2000] loopyWriter exiting with error: transport closed by client
2024/03/14 16:05:50 INFO: [core] [pick-first-lb 0xc00019be60] Received SubConn state update: 0xc00019bf80, {ConnectivityState:IDLE ConnectionError:<nil>}
2024/03/14 16:05:50 INFO: [core] [Channel #1] Channel Connectivity change to IDLE
2024/03/14 16:05:50 client.RouteChat failed: rpc error: code = Unavailable desc = error reading from server: read unix @->/temp/test.sock: wsarecv: An existing connection was forcibly closed by the remote host.

Try 4 - seconds later

Error occurred within 2 minutes and 27 seconds.

server.exe logs

D:\github.com\grpc-go\examples>server.exe
2024/03/14 16:09:59 INFO: [core] [Server #1] Server created
2024/03/14 16:09:59 INFO: [core] [Server #1 ListenSocket #2] ListenSocket created
2024/03/14 16:10:02 INFO: [core] CPU time info is unavailable on non-linux environments.
2024/03/14 16:12:29 INFO: [transport] [server-transport 0xc000084340] Closing: read unix D://temp/test.sock->@: wsarecv: An existing connection was forcibly closed by the remote host.
2024/03/14 16:12:29 INFO: [transport] [server-transport 0xc000084340] loopyWriter exiting with error: transport closed by client

client.exe logs

D:\github.com\grpc-go\examples>client.exe
2024/03/14 16:10:02 INFO: [core] [Channel #1] Channel created
2024/03/14 16:10:02 INFO: [core] [Channel #1] original dial target is: "unix:///temp/test.sock"
2024/03/14 16:10:02 INFO: [core] [Channel #1] parsed dial target is: resolver.Target{URL:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/temp/test.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
...
2024/03/14 16:12:29 client.RouteChat: decreased outstandings:  136, sent: 37900001, recv: 37899865
2024/03/14 16:12:29 INFO: [transport] [client-transport 0xc0000c1200] Closing: connection error: desc = "error reading from server: http2: frame too large"
2024/03/14 16:12:29 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to IDLE
2024/03/14 16:12:29 INFO: [transport] [client-transport 0xc0000c1200] loopyWriter exiting with error: transport closed by client
2024/03/14 16:12:29 INFO: [core] [pick-first-lb 0xc00021a030] Received SubConn state update: 0xc00021a180, {ConnectivityState:IDLE ConnectionError:<nil>}
2024/03/14 16:12:29 client.RouteChat: stream.Send(location:{longitude:3} message:"Third message: 1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890") failed: EOF, outstanding: 181

Try 5

Error occurred within 2 minutes and 3 seconds.

server.exe logs

D:\github.com\grpc-go\examples>server.exe
2024/03/14 16:40:26 INFO: [core] [Server #1] Server created
2024/03/14 16:40:26 INFO: [core] [Server #1 ListenSocket #2] ListenSocket created
2024/03/14 16:40:30 INFO: [core] CPU time info is unavailable on non-linux environments.
2024/03/14 16:42:33 INFO: [transport] [server-transport 0xc0001ae000] Closing: http2: frame too large
2024/03/14 16:42:33 INFO: [transport] [server-transport 0xc0001ae000] loopyWriter exiting with error: connection error: desc = "transport is closing"

client.exe logs

D:\github.com\grpc-go\examples>client.exe
2024/03/14 16:40:30 INFO: [core] [Channel #1] Channel created
2024/03/14 16:40:30 INFO: [core] [Channel #1] original dial target is: "unix:///temp/test.sock"
2024/03/14 16:40:30 INFO: [core] [Channel #1] parsed dial target is: resolver.Target{URL:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/temp/test.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
...
2024/03/14 16:42:33 client.RouteChat: decreased outstandings:  137, sent: 31430001, recv: 31429864
2024/03/14 16:42:33 INFO: [transport] [client-transport 0xc000141680] Closing: connection error: desc = "error reading from server: read unix @->/temp/test.sock: wsarecv: An existing connection was forcibly closed by the remote host."
2024/03/14 16:42:33 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to IDLE
2024/03/14 16:42:33 INFO: [transport] [client-transport 0xc000141680] loopyWriter exiting with error: transport closed by client
2024/03/14 16:42:33 client.RouteChat: stream.Send(location:{longitude:2} message:"Second message: 1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890") failed: EOF, outstanding: 169

Try 6

Error occurred within 2 minutes and 47 seconds.
This was debugging mode.

server.exe logs

Starting: C:\Users\dze\go\bin\dlv.exe dap --listen=127.0.0.1:63676 from d:\github.com\grpc-go\examples\route_guide\server
DAP server listening at: 127.0.0.1:63676
Type 'dlv help' for list of commands.
2024/03/18 13:47:27 INFO: [core] [Server #1] Server created
2024/03/18 13:47:27 INFO: [core] [Server #1 ListenSocket #2] ListenSocket created
2024/03/18 13:47:46 INFO: [core] CPU time info is unavailable on non-linux environments.
2024/03/18 13:50:33 INFO: [transport] [server-transport 0xc000057ba0] Closing: read unix D://temp/test.sock->@: wsarecv: An existing connection was forcibly closed by the remote host.
2024/03/18 13:50:33 INFO: [transport] [server-transport 0xc000057ba0] loopyWriter exiting with error: write unix D://temp/test.sock->@: wsasend: An existing connection was forcibly closed by the remote host.

client.exe logs

Starting: C:\Users\dze\go\bin\dlv.exe dap --listen=127.0.0.1:63687 from d:\github.com\grpc-go\examples\route_guide\client
DAP server listening at: 127.0.0.1:63687
Type 'dlv help' for list of commands.
2024/03/18 13:47:46 INFO: [core] [Channel #1] Channel created
2024/03/18 13:47:46 INFO: [core] [Channel #1] original dial target is: "unix:///temp/test.sock"
2024/03/18 13:47:46 INFO: [core] [Channel #1] parsed dial target is: resolver.Target{URL:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/temp/test.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
2024/03/18 13:47:46 INFO: [core] [Channel #1] Channel authority set to "localhost"
...
2024/03/18 13:50:33 client.RouteChat: decreased outstandings:  168, sent: 16110001, recv: 16109833
2024/03/18 13:50:33 ERROR: [transport] transport: http2Client.reader got unhandled frame type [FrameHeader UNKNOWN_FRAME_TYPE_215 len=2].
2024/03/18 13:50:33 ERROR: [transport] transport: http2Client.reader got unhandled frame type [FrameHeader UNKNOWN_FRAME_TYPE_210 flags=0x2|0x8 stream=34603794 len=2].
2024/03/18 13:50:33 INFO: [transport] [client-transport 0xc0000d6000] Closing: connection error: desc = "error reading from server: http2: frame too large"
2024/03/18 13:50:33 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to IDLE
2024/03/18 13:50:33 INFO: [transport] [client-transport 0xc0000d6000] loopyWriter exiting with error: transport closed by client
2024/03/18 13:50:33 client.RouteChat: stream.Send(location:{longitude:3}  message:"Sixth message: 1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890") failed: EOF, outstanding: 162
Process 63280 has exited with status 1
Detaching

What did you expect to see?

Continue running as if using a TCP socket connection.

When using TCP connection, there are no code differences besides the address specified in 'server.go' and 'client.go'. They can be found at this repository.

@dizecto dizecto changed the title import/path: issue It seems like frames are getting corrupted in HTTP/2 when using a Unix domain socket on Windows It seems like frames are getting corrupted in HTTP/2 when using a Unix domain socket on Windows Mar 18, 2024
@dr2chase
Copy link
Contributor

@golang/windows, @neild can you have a look at this?

@dr2chase dr2chase added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 18, 2024
@neild
Copy link
Contributor

neild commented Mar 18, 2024

This is using grpc-go, which uses the HTTP/2 frame encoder/decoder from golang.org/x/net/http2 but no other parts of the HTTP/2 implementation.

The framer is reasonably straightforward. It reads from an io.Reader and writes to an io.Writer, and doesn't have any interactions with the network other than through those interfaces. So I think that it's unlikely that the root cause lies there (although perhaps there's something I'm not thinking of), since I'd expect to see any bugs to show up when the underlying transport is TCP as well.

I think the first thing I'd try is to dump all the bytes passing through the socket on both sides of the connection, to see if they're making it through as expected, and if not, what the variation is.

@seankhliao seankhliao changed the title It seems like frames are getting corrupted in HTTP/2 when using a Unix domain socket on Windows x/net/https: It seems like frames are getting corrupted in HTTP/2 when using a Unix domain socket on Windows Mar 22, 2024
@seankhliao seankhliao changed the title x/net/https: It seems like frames are getting corrupted in HTTP/2 when using a Unix domain socket on Windows x/net/http2: It seems like frames are getting corrupted in HTTP/2 when using a Unix domain socket on Windows Mar 22, 2024
@seankhliao seankhliao added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Mar 22, 2024
@abarelk
Copy link

abarelk commented Apr 4, 2024

I also experienced this issue. After digging into the problem, I figured out that the problem is not the corruption of the reader buffer but the mismatch between the length of buffer actually copied and the value returned by the Read function.

The Read for the unix domain socket ultimately calls the Read method of FD in go/src/internal/poll/fd_windows.go, which calls system call WSARecv through execIO with o.qty pointer for bytes transfered parameter.

	n, err = execIO(o, func(o *operation) error {
		return syscall.WSARecv(o.fd.Sysfd, &o.buf, 1, &o.qty, &o.flags, &o.o, nil)
	})

return syscall.WSARecv(o.fd.Sysfd, &o.buf, 1, &o.qty, &o.flags, &o.o, nil)

The callback for the WSARecv is handled by netpoll using GetQueuedCompletionStatusEx in go/src/runtime/netpoll_windows.go.

if stdcall6(_GetQueuedCompletionStatusEx, iocphandle, uintptr(unsafe.Pointer(&entries[0])), uintptr(n), uintptr(unsafe.Pointer(&n)), uintptr(wait), 0) == 0 {

And, at the handlecompletion function op.qty is written as the qty returned by WSAGetOverlappedResult.

func handlecompletion(toRun *gList, op *net_op, errno int32, qty uint32) int32 {
mode := op.mode
if mode != 'r' && mode != 'w' {
println("runtime: GetQueuedCompletionStatusEx returned invalid mode=", mode)
throw("runtime: netpoll failed")
}
op.errno = errno
op.qty = qty // <- here
return netpollready(toRun, op.pd, mode)
}

The issue comes from the fact that when the execIO awakes from the waiting, the o.qty is somewhat changed to the different value from what was set in the handlecompletion.

To prove the theory, I added additional value op.pqty to keep the original value of op.qty in operation and net_op structure, and changed handlecompletion as follows.

func handlecompletion(toRun *gList, op *net_op, errno int32, qty uint32) int32 {
mode := op.mode
if mode != 'r' && mode != 'w' {
println("runtime: GetQueuedCompletionStatusEx returned invalid mode=", mode)
throw("runtime: netpoll failed")
}
op.errno = errno
op.qty = qty //
op.pqty = op.qty // <- added
return netpollready(toRun, op.pd, mode)
}

And I checked the o.pqty and o.qty after execIO was executed as follows.

	n, err = execIO(o, func(o *operation) error {
		return syscall.WSARecv(o.fd.Sysfd, &o.buf, 1, &o.qty, &o.flags, &o.o, nil)
	})

	// added lines
	if (o.qty != o.pqty) {
		println("unexpected bytes transferred o.qty: ", o.qty, 
			", o.pqty: ", o.pqty, ", n: ", n)
	}
	// added

	if race.Enabled {
		race.Acquire(unsafe.Pointer(&ioSync))
	}

As a result, I got the proof of the difference between two values.

unexpected bytes transferred o.qty: 1082 , o.pqty: 26 , n: 1082

I'm not the 100% sure about when the overwriting happens. But I'm suspicious about the way that passes the o.qty as the parameter of the WSARecv, which may lead to the possible overwritting the value set in the handlecompletion.
MSDN suggests that the bytesTransfered parameter of the WSARecv has not to be used or refered when the overlapped is used. In fact, WSARecv do sometimes set the value in unexpected way.
I believe passing the o.qty as the parameter of WSARecv with no use and using the same variable pointer for the result of overlapped result is not a good idea.
My workaround of passing the nil for the bytesTrasnfered parameter for the WSARecv seems to work.

	n, err = execIO(o, func(o *operation) error {
		return syscall.WSARecv(o.fd.Sysfd, &o.buf, 1, nil, &o.flags, &o.o, nil)
	})

But, in order to pinpoint the root cause of this issue, further investigation might be needed.

@seankhliao
Copy link
Member

cc @golang/runtime @golang/windows

@dizecto
Copy link
Author

dizecto commented Apr 12, 2024

@seankhliao,
You've added WaitingForInfo label. If there's any information I need to provide, please let me know.

And, I noticed a difference in the FileCompletionNotificationModes used between TCP and Unix domain socket while running various tests, referring to @abarelk's comment. TCP utilizes syscall.FILE_SKIP_COMPLETION_PORT_ON_SUCCESS, whereas Unix domain socket do not. Therefore, after adding 'unix' as follows, it's functioning without the aforementioned issue. In this test, I did not include passing nil instead of &o.qty as tested by @abarelk.

// C:\Program Files\Go\src\internal\poll\fd_windows.go
	if pollable && useSetFileCompletionNotificationModes {
		// We do not use events, so we can skip them always.
		flags := uint8(syscall.FILE_SKIP_SET_EVENT_ON_HANDLE)
		switch net {
		case "tcp", "tcp4", "tcp6",
			"udp", "udp4", "udp6",
			"unix":
			flags |= syscall.FILE_SKIP_COMPLETION_PORT_ON_SUCCESS
		}
		err := syscall.SetFileCompletionNotificationModes(fd.Sysfd, flags)
		if err == nil && flags&syscall.FILE_SKIP_COMPLETION_PORT_ON_SUCCESS != 0 {
			fd.skipSyncNotif = true
		}
	}

Is there any reason why Unix domain socket do not use syscall.FILE_SKIP_COMPLETION_PORT_ON_SUCCESS?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

5 participants