stream: limit the number of peers we sync with on bin 0 #1972

acud · 2019-11-22T13:47:31Z

This PR aims to limit the number of peers with sync with on bin 0 using a semaphore

janos

LGTM

holisticode

Why limit the number of peers only in bin 0, and not, say also in 1, or - maybe all bins < depth?
The description of the PR says it's about limiting the number of peers in bin 0 but does not specify how it does that. One has to go through the code.

Maybe it has been discussed in private and is known to the author(s), but as I have been invited to review, I am missing this information, and I had to go through the code.

But what I can see from the code is that actually only concurrent InitPeer calls are limited, but I understand InitPeer to run quickly, and is only meant to establish subscriptions. So, after 3 InitPeers have run, there is no blocking for other peers, which means they end up in the kademlia in bin 0 anyway, so I don't understand what we really are limiting here - just so that there are not too many concurrent InitPeers ?

acud · 2019-11-26T03:57:14Z

Why limit the number of peers only in bin 0, and not, say also in 1, or - maybe all bins < depth?

Bin 0 represents the most potential syncing overhead from all bins. It incurs the highest traffic and it is trivial to implement, since its incurred bandwidth and localstore congestion does not depend on depth. Once you put depth into the equation, you're going to end up with a separate component orchestrating connections and making sure that when the depth changes - the appropriate peers are being synced to. I find this complex to implement and test and so I went for the trivial solution at the first iteration so that people can use Swarm and not incur so much traffic and storage congenstion every time they upload something.

The description of the PR says it's about limiting the number of peers in bin 0 but does not specify how it does that. One has to go through the code.

The PR description says:

This PR aims to limit the number of peers with sync with on bin 0 using a semaphore

And that exactly what it does. Limits the number of peers we sync with on bin 0 using a semaphore, which is exactly what I added (a semaphore).

But what I can see from the code is that actually only concurrent InitPeer calls are limited, but I understand InitPeer to run quickly, and is only meant to establish subscriptions. So, after 3 InitPeers have run, there is no blocking for other peers, which means they end up in the kademlia in bin 0 anyway, so I don't understand what we really are limiting here - just so that there are not too many concurrent InitPeers ?

@holisticode the buffered channel acts as a semaphore. When a peer that has po 0 is added and the protocol run function is called - InitPeer would eventually be called. Then, once the if condition that checks the po of the peer is hit and we end up in the block that adds an element into the buffered channel, after 3 peers with po 0 that are added, the buffered channel will be full, and further peers with po 0 will block on the select case because the buffer is full:

	if po == 0 {
		select {
		case s.binZeroSem <- struct{}{}:
		case <-p.quit:
			return
		case <-s.quit:
			return
		}
		defer func() { <-s.binZeroSem }()
	}

Because of the defer statement, when a peer is therefore disconnected at an unspecified time in the future - a slot in the buffer will be freed and an arbitrary next peer that was blocking on the s.binZeroSem <- struct{}{} line will be scheduled, allowing that peer to start syncing data to the current node.
If the po is larger than 0 then we don't block at all and continue execution normally.
This obviously does not cater for the edge case that all peers are in bin 0 or that the node depth is constantly zero but I don't think we should cater for these use cases anyway.
I hope this is now clearer and if there's any other issues with this please do tell

holisticode · 2019-11-26T13:52:52Z

Because of the defer statement, when a peer is therefore disconnected at an unspecified time in the future

What I am saying is that it seems to me that "unspecified time in the future" does only apply if the function for subscription is running as a loop until it disconnects. I thought it was a function which would just establish subscriptions and then return right away (as the InitPeer name suggests), which would trigger the defer and thus the channel would be freed again.

So it really depends on the subscription function I guess.

To be honest, I am not so much of a fan of this type of blocking, because all peers exceeding the max number of peers in bin 0 (3) will block, and thus unnecessarily use up resources until, at some "unspecified time in the future", one peer would disconnect.

acud · 2019-11-26T14:28:50Z

To be honest, I am not so much of a fan of this type of blocking, because all peers exceeding the max number of peers in bin 0 (3) will block, and thus unnecessarily use up resources until, at some "unspecified time in the future", one peer would disconnect.

Which resources are we talking about? a Peer struct entry in the registry? I don't think this is a big price to pay.
Besides, the whole point is to block because you want to be connected to this peer - because you want to be able to send this peer a retrieve request or to potentially have that peer sync from you, you also might want to have the peer send retrieve requests to you.

This method of blocking allows you to connect to a peer and have that peer to function correctly with all other protocols except for syncing (other streams in the stream protocol would potentially work uninterrupted in this manner). So you can maintain connectivity with the peer and select when to sync with it without having to connect to it.

If you have any better solutions I'm happy to hear them out and implement them gladly

zelig

this is a hack, dont like
do we really need it?

acud · 2019-11-26T14:55:34Z

i would not PR it if we would not need it :/

This reverts commit f114078.

…" (#2096) This reverts commit f114078.

stream: add semaphore to limit the number of peers we sync with on bin 0

da7e991

acud added the enhancement label Nov 22, 2019

acud requested review from janos and zelig November 22, 2019 13:47

acud self-assigned this Nov 22, 2019

acud added this to Backlog in Swarm Core - Sprint planning via automation Nov 22, 2019

acud moved this from Backlog to In review (includes Documentation) in Swarm Core - Sprint planning Nov 22, 2019

janos approved these changes Nov 22, 2019

View reviewed changes

stream: fix test to accommodate for the bin zero limit

15b0c2d

acud added this to the 0.5.3 milestone Nov 25, 2019

acud requested a review from holisticode November 25, 2019 13:32

acud added the ready for review label Nov 25, 2019

holisticode reviewed Nov 25, 2019

View reviewed changes

zelig approved these changes Nov 26, 2019

View reviewed changes

acud merged commit f114078 into master Nov 26, 2019

Swarm Core - Sprint planning automation moved this from In review (includes Documentation) to Done Nov 26, 2019

acud deleted the bin-zero-sync-sem branch November 26, 2019 15:17

acud added a commit that referenced this pull request Feb 7, 2020

Revert "stream: limit the number of peers we sync with on bin 0 (#1972)"

7fa2005

This reverts commit f114078.

acud added a commit that referenced this pull request Feb 7, 2020

Revert "stream: limit the number of peers we sync with on bin 0 (#1972)…

a836208

…" (#2096) This reverts commit f114078.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stream: limit the number of peers we sync with on bin 0 #1972

stream: limit the number of peers we sync with on bin 0 #1972

acud commented Nov 22, 2019 •

edited

Loading

janos left a comment

holisticode left a comment •

edited

Loading

acud commented Nov 26, 2019

holisticode commented Nov 26, 2019

acud commented Nov 26, 2019 •

edited

Loading

zelig left a comment

acud commented Nov 26, 2019

stream: limit the number of peers we sync with on bin 0 #1972

stream: limit the number of peers we sync with on bin 0 #1972

Conversation

acud commented Nov 22, 2019 • edited Loading

janos left a comment

Choose a reason for hiding this comment

holisticode left a comment • edited Loading

Choose a reason for hiding this comment

acud commented Nov 26, 2019

holisticode commented Nov 26, 2019

acud commented Nov 26, 2019 • edited Loading

zelig left a comment

Choose a reason for hiding this comment

acud commented Nov 26, 2019

acud commented Nov 22, 2019 •

edited

Loading

holisticode left a comment •

edited

Loading

acud commented Nov 26, 2019 •

edited

Loading