Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

stream: limit the number of peers we sync with on bin 0 #1972

Merged
merged 2 commits into from
Nov 26, 2019
Merged

Conversation

acud
Copy link
Member

@acud acud commented Nov 22, 2019

This PR aims to limit the number of peers with sync with on bin 0 using a semaphore

fixes #1436

@acud acud requested review from janos and zelig November 22, 2019 13:47
@acud acud self-assigned this Nov 22, 2019
@acud acud added this to Backlog in Swarm Core - Sprint planning via automation Nov 22, 2019
@acud acud moved this from Backlog to In review (includes Documentation) in Swarm Core - Sprint planning Nov 22, 2019
Copy link
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@acud acud added this to the 0.5.3 milestone Nov 25, 2019
Copy link
Contributor

@holisticode holisticode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why limit the number of peers only in bin 0, and not, say also in 1, or - maybe all bins < depth?
  2. The description of the PR says it's about limiting the number of peers in bin 0 but does not specify how it does that. One has to go through the code.

Maybe it has been discussed in private and is known to the author(s), but as I have been invited to review, I am missing this information, and I had to go through the code.

But what I can see from the code is that actually only concurrent InitPeer calls are limited, but I understand InitPeer to run quickly, and is only meant to establish subscriptions. So, after 3 InitPeers have run, there is no blocking for other peers, which means they end up in the kademlia in bin 0 anyway, so I don't understand what we really are limiting here - just so that there are not too many concurrent InitPeers ?

@acud
Copy link
Member Author

acud commented Nov 26, 2019

  1. Why limit the number of peers only in bin 0, and not, say also in 1, or - maybe all bins < depth?

Bin 0 represents the most potential syncing overhead from all bins. It incurs the highest traffic and it is trivial to implement, since its incurred bandwidth and localstore congestion does not depend on depth. Once you put depth into the equation, you're going to end up with a separate component orchestrating connections and making sure that when the depth changes - the appropriate peers are being synced to. I find this complex to implement and test and so I went for the trivial solution at the first iteration so that people can use Swarm and not incur so much traffic and storage congenstion every time they upload something.

  1. The description of the PR says it's about limiting the number of peers in bin 0 but does not specify how it does that. One has to go through the code.

The PR description says:

This PR aims to limit the number of peers with sync with on bin 0 using a semaphore

And that exactly what it does. Limits the number of peers we sync with on bin 0 using a semaphore, which is exactly what I added (a semaphore).

But what I can see from the code is that actually only concurrent InitPeer calls are limited, but I understand InitPeer to run quickly, and is only meant to establish subscriptions. So, after 3 InitPeers have run, there is no blocking for other peers, which means they end up in the kademlia in bin 0 anyway, so I don't understand what we really are limiting here - just so that there are not too many concurrent InitPeers ?

@holisticode the buffered channel acts as a semaphore. When a peer that has po 0 is added and the protocol run function is called - InitPeer would eventually be called. Then, once the if condition that checks the po of the peer is hit and we end up in the block that adds an element into the buffered channel, after 3 peers with po 0 that are added, the buffered channel will be full, and further peers with po 0 will block on the select case because the buffer is full:

	if po == 0 {
		select {
		case s.binZeroSem <- struct{}{}:
		case <-p.quit:
			return
		case <-s.quit:
			return
		}
		defer func() { <-s.binZeroSem }()
	}

Because of the defer statement, when a peer is therefore disconnected at an unspecified time in the future - a slot in the buffer will be freed and an arbitrary next peer that was blocking on the s.binZeroSem <- struct{}{} line will be scheduled, allowing that peer to start syncing data to the current node.
If the po is larger than 0 then we don't block at all and continue execution normally.
This obviously does not cater for the edge case that all peers are in bin 0 or that the node depth is constantly zero but I don't think we should cater for these use cases anyway.
I hope this is now clearer and if there's any other issues with this please do tell

@holisticode
Copy link
Contributor

Because of the defer statement, when a peer is therefore disconnected at an unspecified time in the future

What I am saying is that it seems to me that "unspecified time in the future" does only apply if the function for subscription is running as a loop until it disconnects. I thought it was a function which would just establish subscriptions and then return right away (as the InitPeer name suggests), which would trigger the defer and thus the channel would be freed again.

So it really depends on the subscription function I guess.

To be honest, I am not so much of a fan of this type of blocking, because all peers exceeding the max number of peers in bin 0 (3) will block, and thus unnecessarily use up resources until, at some "unspecified time in the future", one peer would disconnect.

@acud
Copy link
Member Author

acud commented Nov 26, 2019

To be honest, I am not so much of a fan of this type of blocking, because all peers exceeding the max number of peers in bin 0 (3) will block, and thus unnecessarily use up resources until, at some "unspecified time in the future", one peer would disconnect.

Which resources are we talking about? a Peer struct entry in the registry? I don't think this is a big price to pay.
Besides, the whole point is to block because you want to be connected to this peer - because you want to be able to send this peer a retrieve request or to potentially have that peer sync from you, you also might want to have the peer send retrieve requests to you.

This method of blocking allows you to connect to a peer and have that peer to function correctly with all other protocols except for syncing (other streams in the stream protocol would potentially work uninterrupted in this manner). So you can maintain connectivity with the peer and select when to sync with it without having to connect to it.

If you have any better solutions I'm happy to hear them out and implement them gladly

Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a hack, dont like
do we really need it?

@acud
Copy link
Member Author

acud commented Nov 26, 2019

i would not PR it if we would not need it :/

@acud acud merged commit f114078 into master Nov 26, 2019
Swarm Core - Sprint planning automation moved this from In review (includes Documentation) to Done Nov 26, 2019
@acud acud deleted the bin-zero-sync-sem branch November 26, 2019 15:17
acud added a commit that referenced this pull request Feb 7, 2020
acud added a commit that referenced this pull request Feb 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

investigate kademlia connectivity and suggest peer functionality
4 participants