Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CloseNow that doesn't block #384

Closed
marten-seemann opened this issue Mar 17, 2023 · 4 comments
Closed

Add CloseNow that doesn't block #384

marten-seemann opened this issue Mar 17, 2023 · 4 comments
Milestone

Comments

@marten-seemann
Copy link

marten-seemann commented Mar 17, 2023

@nhooyr I managed to condense it down to a minimal test case that reliably reproduces the bug. There's no read and write calls at all, just the server closing the connection right after accepting it.

func TestWebsocketImmediateClose(t *testing.T) {
	closed := make(chan struct{})
	mux := http.NewServeMux()
	mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		defer close(closed)
		c, err := ws.Accept(w, r, &ws.AcceptOptions{InsecureSkipVerify: true})
		if err != nil {
			return // The upgrader writes a response for us.
		}

		start := time.Now()
		c.Close(ws.StatusUnsupportedData, "bye")
		if took := time.Since(start); took > 2*time.Second {
			t.Fatalf("closing took too long: %s", took)
		}
	})
	server := &http.Server{Handler: mux, Addr: "localhost:8080"}
	done := make(chan struct{})
	go func() {
		defer close(done)
		server.ListenAndServe()
	}()

	time.Sleep(25 * time.Millisecond) // give the server some time to boot

	if _, _, err := ws.Dial(context.Background(), "ws://localhost:8080", nil); err != nil {
		t.Fatal(err)
	}
	<-closed // wait until the HTTP handler has returned

	// test shutdown
	server.Close()
	<-done
}

I'm not very familiar with the code base (nor with the WebSocket protocol itself), but here's where the Close call gets stuck: https://github.com/nhooyr/websocket/blob/14fb98eba64eeb5e9d06a88b98c47ae924ac82b4/close_notjs.go#L108
Maybe I'm thinking in the wrong direction here, but this is surprising to me. Why do we need to read anything when closing? I would've expected us to just write a Close frame and then return.

Originally posted by @marten-seemann in #355 (comment)

@marten-seemann
Copy link
Author

Can you clarify how this is an attack vector/performance issue for you? It's a 5s wait and obeys the set read limit.

I'd argue that Close shouldn't block for long periods of time. Our previous (Gorilla) WebSocket library didn't, and none of our other transports we use in go-libp2p (WebTransport, WebRTC, TCP, QUIC) does. The maximum amount of work any of these transports do is write some kind of CLOSE frame.

We have a component called the connection manager, which at regular intervals prunes existing connections. It's not uncommon for us to close 100s of connections at a time. If (in the worst case) all of those are WebSocket connections and take 5s to close, that's a big problem, as the connection manager loop now spends 20 minutes on closing those connections.

Of course, I could spawn a new Go routine for each Close call, but that seems like a suboptimal solution that comes with its own performance penalties.

I can't change this behaviour in v1 as it's possible others rely on Close returning successfully if the close handshake was successful.

Is there really no way to just tell this library "send the Close frame and walk away"? It would be ok if that function was not called Close, I'm wrapping the connection struct anyway.

@nhooyr nhooyr added this to the v1.8.8 milestone Mar 17, 2023
@nhooyr
Copy link
Contributor

nhooyr commented Mar 17, 2023

Fair enough. I'll add a CloseNow() function for v1.8.8 but I can't give you any ETA sorry.

@nhooyr nhooyr changed the title Close blocks for 5s Add CloseNow that doesn't block Mar 17, 2023
@nhooyr
Copy link
Contributor

nhooyr commented Mar 18, 2023

I should mention, I've written similar connection managers and the way I've solved this problem is I don't have the connection manager close the connection. It merely cancels a context that triggers the goroutine reading from the WebSocket to exit and thus close it. Then I use a waitgroup to wait until all active websocket goroutines are finished.

I've been meaning to make this code available in the library too #209

@nhooyr
Copy link
Contributor

nhooyr commented Oct 14, 2023

Done in dev.

@nhooyr nhooyr closed this as completed Oct 14, 2023
nhooyr added a commit that referenced this issue Oct 14, 2023
nhooyr added a commit that referenced this issue Oct 14, 2023
nhooyr added a commit that referenced this issue Oct 14, 2023
nhooyr added a commit that referenced this issue Oct 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants