Less write lock contention, better read timeout handling #97

magik6k · 2023-03-01T12:48:33Z

This PR aims to improve/fix filecoin-project/lotus#8362

Reduces lock contention on the write lock - it should now be held only when we're actually sending data, response marshaling happens async
Makes sure that read deadline is reset even if a single frame is taking a long time to read
Makes frame unmarshaling not block data reading until we're really, really backlogged
Adds pprof labels to goroutines handling jsonrpc calls, so if we need to debug this further it's possible to tell apart clients in goroutine dumps.

rpc_test.go

ZenGround0 · 2023-03-02T15:35:53Z

rpc_test.go

+		for n := range ch {
+			fmt.Println("received")
+			if n != prevN+1 {
+				panic("bad order")


Consider use testing.T methods to fail from this goroutine instead of panicing

Technically shouldn't use those in a goroutine that isn't running the test, panic was good enough

ZenGround0 · 2023-03-02T15:44:56Z

websocket.go

+	// json.NewDecoder(r).Decode would read the whole frame as well, so might as well do it
+	// with ReadAll which should be much faster
+	// use a autoResetReader in case the read takes a long time
+	buf, err := io.ReadAll(c.autoResetReader(r)) // todo buffer pool


Ready to merge without buffer pool?

Yeah, that's not really the point of the PR, and can be done separately

Kubuxu · 2023-03-06T14:27:52Z

websocket.go

+	}
+
+	// got the whole frame, can start reading the next one in background
+	go c.nextMessage()


The fact that we are starting goroutines only to kill them and start again bothers me a bit, but it shouldn't matter apart from making goroutine numbers annoying to deal with.

Kubuxu · 2023-03-06T14:31:37Z

websocket.go

+func (r *deadlineResetReader) Read(p []byte) (n int, err error) {
+	n, err = r.r.Read(p)
+	if time.Since(r.lastReset) > onReadDeadlineResetInterval {
+		log.Warnw("slow/large read, resetting deadline while reading the frame", "since", time.Since(r.lastReset), "n", n, "err", err, "p", len(p))


Should this be Warn? It isn't actionable by the user and will happen during any request which requires more than 5s to send.

Arguably whenever any RPC takes that long to transfer, something is broken - so the user should either open an issue, or investigate their networking.

Kubuxu

locking looks correct

magik6k added 2 commits March 1, 2023 13:18

handler: Less write lock contention

92617b9

set pprof labels on websocket handlers

07ab412

magik6k force-pushed the fix/less-write-lk-contention branch from 2010bd6 to 07ab412 Compare March 1, 2023 12:48

magik6k added 3 commits March 1, 2023 18:26

Fix read deadline handling with large frames

696f5d9

Test large response handling

2c1593b

Fix parallel inflight request map access

ae30cc0

magik6k force-pushed the fix/less-write-lk-contention branch from 00105b3 to ae30cc0 Compare March 1, 2023 22:35

magik6k changed the title ~~handler: Less write lock contention~~ Less write lock contention, better read timeout handling Mar 2, 2023

ZenGround0 reviewed Mar 2, 2023

View reviewed changes

Address review in #97

fc81a98

Kubuxu reviewed Mar 6, 2023

View reviewed changes

Kubuxu approved these changes Mar 6, 2023

View reviewed changes

magik6k merged commit 6fff219 into master Mar 6, 2023

This was referenced Mar 6, 2023

websocket error {"error": "websocket: close 1000 (normal)"} #57

Closed

deps: Update go-jsonrpc to v0.2.2 filecoin-project/lotus#10395

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Less write lock contention, better read timeout handling #97

Less write lock contention, better read timeout handling #97

magik6k commented Mar 1, 2023 •

edited

ZenGround0 Mar 2, 2023

magik6k Mar 2, 2023

ZenGround0 Mar 2, 2023

magik6k Mar 2, 2023

Kubuxu Mar 6, 2023 •

edited

Kubuxu Mar 6, 2023

magik6k Mar 6, 2023

Kubuxu left a comment

Less write lock contention, better read timeout handling #97

Less write lock contention, better read timeout handling #97

Conversation

magik6k commented Mar 1, 2023 • edited

ZenGround0 Mar 2, 2023

Choose a reason for hiding this comment

magik6k Mar 2, 2023

Choose a reason for hiding this comment

ZenGround0 Mar 2, 2023

Choose a reason for hiding this comment

magik6k Mar 2, 2023

Choose a reason for hiding this comment

Kubuxu Mar 6, 2023 • edited

Choose a reason for hiding this comment

Kubuxu Mar 6, 2023

Choose a reason for hiding this comment

magik6k Mar 6, 2023

Choose a reason for hiding this comment

Kubuxu left a comment

Choose a reason for hiding this comment

magik6k commented Mar 1, 2023 •

edited

Kubuxu Mar 6, 2023 •

edited