Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deadlock when closing transport and/or server #4266

Closed
paralin opened this issue Jan 25, 2024 · 4 comments · Fixed by #4332
Closed

deadlock when closing transport and/or server #4266

paralin opened this issue Jan 25, 2024 · 4 comments · Fixed by #4332
Labels
Milestone

Comments

@paralin
Copy link

paralin commented Jan 25, 2024

I'm noticing this semaphore deadlock which occurs occasionally, but not always:

github.com/quic-go/quic-go.(*Transport).closeServer(0x14000398400)
        /quic-go/transport.go:306 +0x8c
github.com/quic-go/quic-go.(*baseServer).close.func1()
        /quic-go/server.go:341 +0x84
sync.(*Once).doSlow(0x1000200000008?, 0x200010000?)
        /go/1.21.6/libexec/src/sync/once.go:74 +0x100
sync.(*Once).Do(...)
        /go/1.21.6/libexec/src/sync/once.go:65
github.com/quic-go/quic-go.(*baseServer).close(0x140002fb188?, {0x101a0e028?, 0x101fbd560?}, 0x70?)
        /quic-go/server.go:335 +0x64
github.com/quic-go/quic-go.(*baseServer).Close(...)
        /quic-go/server.go:330
github.com/quic-go/quic-go.(*Listener).Close(0x0?)
        /quic-go/server.go:138 +0x34

and


github.com/quic-go/quic-go.(*baseServer).close(0x1400038eae0?, {0x101a0e028?, 0x101fbc040?}, 0x0?)
        /quic-go/server.go:335 +0x64
github.com/quic-go/quic-go.(*Transport).close(0x14000398400, {0x101a0e028, 0x101fbc040})
        /quic-go/transport.go:333 +0x10c

I see that there are two orders that mutex.Lock can be called:

  • baseServer.close -> server.mtx.Lock() -> closeServer() -> transport.mtx.Lock()
  • transport.close -> transport.mtx.lock() -> baseServer.close() -> server.mtx.Lock()

When locking mutexes out of order, a deadlock can occur, which explains this deadlock (as far as I can tell):

Mutex lock ordering rule: Given a total ordering of all mutexes, a program is deadlock-free if each thread acquires its mutexes in order and releases them in reverse order.

@marten-seemann
Copy link
Member

Interesting. Want to submit a fix?

@marten-seemann marten-seemann added this to the v0.42 milestone Jan 25, 2024
@paralin
Copy link
Author

paralin commented Jan 25, 2024

@marten-seemann Perhaps this? It's a bit dirty but it works:

From d143f249fb5ecf6e8ef27a812f2a0b33e4203941 Mon Sep 17 00:00:00 2001
From: Christian Stewart <christian@aperture.us>
Date: Wed, 24 Jan 2024 18:04:35 -0800
Subject: [PATCH] fix: avoid mutex deadlock in transport close

---
 transport.go | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/transport.go b/transport.go
index 77f41b3d14..d4c637c6aa 100644
--- a/transport.go
+++ b/transport.go
@@ -346,12 +346,21 @@ func (t *Transport) Close() error {
 }
 
 func (t *Transport) closeServer() {
-	t.mutex.Lock()
-	t.server = nil
-	if t.isSingleUse {
-		t.closed = true
+	closeLocked := func(lock bool) {
+		if lock {
+			t.mutex.Lock()
+		}
+		t.server = nil
+		if t.isSingleUse {
+			t.closed = true
+		}
+		t.mutex.Unlock()
+	}
+	if t.mutex.TryLock() {
+		closeLocked(false)
+	} else {
+		go closeLocked(true)
 	}
-	t.mutex.Unlock()
 	if t.createdConn {
 		t.Conn.Close()
 	}

@marten-seemann
Copy link
Member

This looks a bit complicated. I'd prefer something that doesn't use TryLock and spawns Go routines.

@paralin
Copy link
Author

paralin commented Jan 25, 2024

I agree that's better, breaking the loop will take some more digging to find a suitable fix. Will have a look at this further

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants