Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control the broadcast channel lifecycle in a separate goroutine #3422

Merged
merged 2 commits into from
Nov 29, 2022

Conversation

pdyraga
Copy link
Member

@pdyraga pdyraga commented Nov 29, 2022

Refs #3419

During the tests of the code orchestrating tECDSA signing in #3404, we realized some buffers get full and the broadcast channel keeps writing to them even though the receiver is no longer alive. This was fixed in #3418 by introducing an additional context in the asynchronous state machine that unregisters the handler if the state machine (receiver) exits sooner than the context. This has fixed the problem on the receiver side but we still need to fix the problem on the producer side. And this is what this PR is doing.

A separate goroutine controls the lifecycle of the handler. The message handler is removed from the channel if the context is done. This logic is placed in a separate goroutine because the call to handleWithRetransmission is blocking and we do not want to wait with removing the handler if that call blocks for a longer period of time, especially, when the underlying buffered channel is full.

During the tests of the code orchestrating tECDSA signing, we realized some
buffers get full and the broadcast channel keeps writing to them even though the
receiver is no longer alive. This was fixed by introducing an additional context
in the asynchronous state machine that unregisters the handler if the state
machine (receiver) exits sooner than the context. This has fixed the problem on
the receiver side but we still need to fix the problem on the producer side.
And this is what this commit is doing.

A separate goroutine controls the lifecycle of the handler. The message handler
is removed from the channel if the context is done. This logic is placed in
a separate goroutine because the call to `handleWithRetransmission` is blocking
and we do not want to wait with removing the handler if that call blocks for
a longer period of time, especially, when the underlying buffered channel is
full.
@pdyraga
Copy link
Member Author

pdyraga commented Nov 29, 2022

I was testing this change along with the one from #3423 locally, using the same configuration as for the tests of #3404 where the problems were noticed. I've observed no single "message handler is too slow" warning. 7x10 signatures were produced in the first attempt.

@pdyraga pdyraga self-assigned this Nov 29, 2022
@pdyraga pdyraga added this to the v2.0.0-m2 milestone Nov 29, 2022
@pdyraga pdyraga marked this pull request as ready for review November 29, 2022 09:57
Left a TODO about goroutine that may possibly be blocked forever.

This shouldn't be a problem right now with how Recv consumers are
constructed but this may become a problem in future.

#3420 captures this
work.
@lukasz-zimnoch lukasz-zimnoch merged commit 6df9441 into main Nov 29, 2022
@lukasz-zimnoch lukasz-zimnoch deleted the message-handler-is-not-too-slow branch November 29, 2022 11:50
lukasz-zimnoch added a commit that referenced this pull request Nov 29, 2022
Closes #3419
Depends on #3422

During the tests of the code orchestrating tECDSA signing in, we noticed
the announcer's buffer sometimes gets full and the messages are dropped
with the famous "message handler is too slow" warning. It turns out that
the problem lies in the size of the buffer used by the announcer. Before
a message in which the announcer is not interested is dropped, it is
buffered in this channel. This also includes retransmissions from the
previous signing protocols because the retransmission cache filter is
scoped to the given `Recv` handler:
```
func WithRetransmissionSupport(delegate func(m net.Message)) func(m net.Message) {
  mutex := &sync.Mutex{}
  cache := make(map[string]bool)
  // (...)
}
```

The announcer's buffer size has been increased to 512 elements, just
like the buffer of the asynchronous state machine, and the problem seems
to be gone based on local machine tests.
@pdyraga pdyraga removed their assignment Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants