Control the broadcast channel lifecycle in a separate goroutine #3422

pdyraga · 2022-11-29T09:34:45Z

During the tests of the code orchestrating tECDSA signing in #3404, we realized some buffers get full and the broadcast channel keeps writing to them even though the receiver is no longer alive. This was fixed in #3418 by introducing an additional context in the asynchronous state machine that unregisters the handler if the state machine (receiver) exits sooner than the context. This has fixed the problem on the receiver side but we still need to fix the problem on the producer side. And this is what this PR is doing.

A separate goroutine controls the lifecycle of the handler. The message handler is removed from the channel if the context is done. This logic is placed in a separate goroutine because the call to handleWithRetransmission is blocking and we do not want to wait with removing the handler if that call blocks for a longer period of time, especially, when the underlying buffered channel is full.

During the tests of the code orchestrating tECDSA signing, we realized some buffers get full and the broadcast channel keeps writing to them even though the receiver is no longer alive. This was fixed by introducing an additional context in the asynchronous state machine that unregisters the handler if the state machine (receiver) exits sooner than the context. This has fixed the problem on the receiver side but we still need to fix the problem on the producer side. And this is what this commit is doing. A separate goroutine controls the lifecycle of the handler. The message handler is removed from the channel if the context is done. This logic is placed in a separate goroutine because the call to `handleWithRetransmission` is blocking and we do not want to wait with removing the handler if that call blocks for a longer period of time, especially, when the underlying buffered channel is full.

pdyraga · 2022-11-29T09:50:56Z

I was testing this change along with the one from #3423 locally, using the same configuration as for the tests of #3404 where the problems were noticed. I've observed no single "message handler is too slow" warning. 7x10 signatures were produced in the first attempt.

pkg/net/libp2p/channel.go

Left a TODO about goroutine that may possibly be blocked forever. This shouldn't be a problem right now with how Recv consumers are constructed but this may become a problem in future. #3420 captures this work.

Closes #3419 Depends on #3422 During the tests of the code orchestrating tECDSA signing in, we noticed the announcer's buffer sometimes gets full and the messages are dropped with the famous "message handler is too slow" warning. It turns out that the problem lies in the size of the buffer used by the announcer. Before a message in which the announcer is not interested is dropped, it is buffered in this channel. This also includes retransmissions from the previous signing protocols because the retransmission cache filter is scoped to the given `Recv` handler: ``` func WithRetransmissionSupport(delegate func(m net.Message)) func(m net.Message) { mutex := &sync.Mutex{} cache := make(map[string]bool) // (...) } ``` The announcer's buffer size has been increased to 512 elements, just like the buffer of the asynchronous state machine, and the problem seems to be gone based on local machine tests.

pdyraga mentioned this pull request Nov 29, 2022

Increase announcer's receiver buffer to 512 #3423

Merged

pdyraga self-assigned this Nov 29, 2022

pdyraga added the 📟 client label Nov 29, 2022

pdyraga added this to the v2.0.0-m2 milestone Nov 29, 2022

pdyraga marked this pull request as ready for review November 29, 2022 09:57

pdyraga requested a review from lukasz-zimnoch November 29, 2022 09:57

lukasz-zimnoch reviewed Nov 29, 2022

View reviewed changes

pkg/net/libp2p/channel.go Show resolved Hide resolved

Left a note for future-us about possible leak in channel Recv

a080113

Left a TODO about goroutine that may possibly be blocked forever. This shouldn't be a problem right now with how Recv consumers are constructed but this may become a problem in future. #3420 captures this work.

lukasz-zimnoch approved these changes Nov 29, 2022

View reviewed changes

lukasz-zimnoch enabled auto-merge November 29, 2022 11:31

lukasz-zimnoch merged commit 6df9441 into main Nov 29, 2022

lukasz-zimnoch deleted the message-handler-is-not-too-slow branch November 29, 2022 11:50

pdyraga removed their assignment Jan 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Control the broadcast channel lifecycle in a separate goroutine #3422

Control the broadcast channel lifecycle in a separate goroutine #3422

pdyraga commented Nov 29, 2022

pdyraga commented Nov 29, 2022

Control the broadcast channel lifecycle in a separate goroutine #3422

Control the broadcast channel lifecycle in a separate goroutine #3422

Conversation

pdyraga commented Nov 29, 2022

pdyraga commented Nov 29, 2022