Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweaks for libp2p message delivery rate improvements #1285

Merged
merged 9 commits into from
Jan 21, 2020
Merged

Conversation

pdyraga
Copy link
Member

@pdyraga pdyraga commented Jan 21, 2020

Close: #1286

This PR contains changes allowing to improve message delivery rate for libp2p by tweaking some libp2p buffers and changing how do we read messages from the topic. There were three warnings from libp2p we were observing locally and on testnet. All of them are addressed here.

Can't deliver message to subscription for topic 4574c8c75d6e88acd28f7e467dac97b5c60c3838d9dad993900bdf402152228e; subscriber too slow pubsub.go:608

Previously, in the broadcast channel, we had just one goroutine reading from pubsub subscription and spawning goroutines handling messages. This one goroutine is not enough. We need multiple workers ready to take the message out of the queue immediately. For this reason, we now instantiate 32 workers reading from the pubsub subscription. It is unclear if we need to spawn sepearate goroutines for message handling now but this will be addressed separately.

Dropping message to peer 16Uiu2HAmS29jNiEX2vAtcjZVhPjU5HWKZNjpWj1gp4SKMGwnSfrg: queue full floodsub.go:95

We increased peer outbound queue size from 32 to 128. Each floodsub should publish 64 original messages per state (not counting retransmissions). Increasing the buffer to 128 messages will allow us to handle all original messages plus potential retransmissions if the retransmissions were sent immediately after the original messages.

message validation throttled; dropping message from 16Uiu2HAmS29jNiEX2vAtcjZVhPjU5HWKZNjpWj1gp4SKMGwnSfrg validation.go:150

Addressing this problem required a contribution to go-libp2p-pubsub: libp2p/go-libp2p-pubsub#255. Until it is accepted, we had to use replace directive in go modules to point to our fork allowing to call WithValidateQueueSize and increase the size of validation queue from 32 to 128 elements.

validateWorker() reads from validateQ and invokes validate function that performs validation of the message. Signature validation is performed synchronously. The number of validate workers defaults to the number of CPUs and can be updated with WithValidateWorkers function. With no additional user validators, signature validation is the bottleneck when receiving new messages.

Increasing the number of validating workers does not help given the context switching and bottleneck nature of this spot. As stated in WithValidateWorkers documentation, this function should be used rather to limit the number of workers to devote less CPU time for synchronous validation. On the other hand, with the default size of validateQ, and 64 messages (with no retransmissions) send for each phase, we experience throttled validation. Increasing the buffer to 128 messages will allow us to handle all original messages plus potential retransmissions if the retransmissions were sent immediately after the original messages and don't congest with too many workers wasting time on context switching.

pdyraga and others added 8 commits January 20, 2020 18:19
This version will let us set peer peer outbound queue size to avoid
problems with messages dropped when this queue is full during DKG.

libp2p/go-libp2p-pubsub#230
The default value which is 32 may not be enough to send all messages
from the given state of DKG. Setting higher queue size we expect to
avoid 'dropping message to peer X: queue is full' failures.
We don't want to use more validate workers for synchronous validation
(signature) than the number of CPUs. This can be only slower because of
the context switching. Instead, we'll increase the size of validate
queue (the next commit).

Also, we are increasing outbound queue size from 64 to 128 in case two
retransmission ticks are very close to another and we need to accommodate
2*64 messages in that queue.
In the forked version, we added WithValidationQueueSize function
allowing us to increase the buffer for synchronous validation and avoid
validator throttling.
Signature validation is done synchronously and it is a bottleneck for
received messages. We increase the validate queue buffer from the
default 32 to 128 to be able to accommodate messages from all group
members even if two retransmissions were very close to each other.
We need this change: https://golang.org/doc/go1.13#crypto/ed25519
after a transitive to-openssl update.
@lukasz-zimnoch lukasz-zimnoch merged commit bd24978 into master Jan 21, 2020
@lukasz-zimnoch lukasz-zimnoch deleted the last-blood branch January 21, 2020 15:03
@@ -1,4 +1,4 @@
FROM golang:1.12-alpine3.10 AS runtime
FROM golang:1.13.6-alpine3.10 AS runtime
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it require any tweaks in provisioning? @sthompson22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

libp2p buffers adjustments - improve message delivery rate
2 participants