Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PeerSubscriber.msg_queue can get too long and slow sync to a crawl #1052

Closed
gsalgado opened this issue Jul 20, 2018 · 2 comments
Closed

PeerSubscriber.msg_queue can get too long and slow sync to a crawl #1052

gsalgado opened this issue Jul 20, 2018 · 2 comments

Comments

@gsalgado
Copy link
Collaborator

What is wrong?

When recently debugging slow sync issues I've seen the ChainSyncer's msg_queue grow to more than 5k items and stay that long pretty constantly. I don't know what caused it to get that long (could be a malicious peer DoSing us, some blocking calls in the main thread that prevented it from consuming messages, or maybe just more connected peers than our processing power could handle), but once it gets that long the sync will pretty much stall as the main loop will timeout waiting for block data (that has already arrived but is at the end of the queue), re-request that data and go back to waiting for it. That means we'll end up downloading/processing the same data multiple times, and even just to detect whether it's duplicated or not, some processing is necessary, so the event loop never manages to catch up and process all pending messages.

How can it be fixed

We probably need a smaller upper limit on the msg queue (current one is 10k), as well as dropping peer/msgs when we reach the limit (currently we'll just raise a QueueFull error).

We may also want to look into keeping track of average messages/second we receive from every peer, possibly disconnecting if it's above a certain limit. Or something more elaborate, with the goal of preventing malicious peers from DoSing us

@carver
Copy link
Contributor

carver commented Jul 20, 2018

the sync will pretty much stall as the main loop will timeout waiting for block data (that has already arrived but is at the end of the queue), re-request that data and go back to waiting for it.

Another thing we should probably do is pause/reduce our data requests if our msg_queue is too long.

@pipermerriam
Copy link
Member

I think that we can close this. We now handle the case where the queue is full, and #1137 reduces the amount of noisy messages that end up in queues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants