network: Suggest peer by address space gap #2065

janos · 2020-01-06T13:36:58Z

This PR is based on #1869 which is based on a fork that we do not have access to. Thanks to @kortatu for these changes.

I have merged master and resolved conflicts best to my knowledge.

Original PR description is copied bellow.

This PR propose an improved method for suggesting peers for connecting. Whenever a peer of a given bin needs to be added, instead of taking the first callable, the method suggest the peer which covers a bigger address space. The method is implemented but is not currently attached to the main code (so it is only used in unit tests). Attaching it is just replacing the call to the current function.
(This was built on top of Kademlia load balancing pr #1757).

This is extracted from network/README.md:

Address space gaps

In order to optimize Kademlia load balancing, performance and peer suggestion, we define the concept of address space gap or simply gap.
A gap is a portion of the overlay address space in which the current node does not know any peer. It could be represented as a range of addresses: 0xxx, meaning 0000-0111

The proximity order of a gap or gap po is the proximity order of that address space with respect to the nearest peer(s) in the kademlia connected table (and considering also the current node address). For example if the node address is 0000, the gap of addresses 1xxx has proximity order 0. However the proximity order of the gap 01xx has po 1.

The size of a gap is defined as the number of addresses that could fit in it. If the area of the whole address space is 1, the size of a gap could be defined from the gap po as 1 / 2 ^ (po + 1). For example, our previous 1xxx gap has a size of 1 / (2 ^ 1) = 1/2. The size of 01xx is 1 / (2 ^ 2) = 1/4.

In order to increment performance of content retrieval and delivery the node should minimize the size of its gaps, because this means that it knows peers near almost all addresses. If the minimum gap in the kademlia table is 4, it means that whatever look up or forwarding done will be at least 4 po far away. On the other hand, if the node has a 0 po gap, it means that for half the addresses, the next jump will be still 0 po away!.

Gaps for peer suggestion

The current kademlia bootstrap algorithm try to fill in the bins (or po spaces) until some level of saturation is reached.
In the process of doing that, the gaps will diminish, but not in the optimal way.

For example, if the node address is 00000000, it is connected only with one peer in bin 0 10000000 and the known addresses for bin 0 are: 10000001 and 11000000. The current algorithm we will take the first callable one, so for example, it may suggest 10000001 as next peer. This is not optimal, as the biggest gap in bin 0 will still be po 1 => 11xxxxxx. If however, the algorithm is improved searching for a peer which covers a bigger gap, 11000000 would
be selected and now the biggest gaps will be po2 => 111xxxx and 101xxxx.

Additionally, even though the node does not have an address in a particular gap, it could still select the furthest away from the current peers so it covers a bigger gap. In the previous example with node 00000000 and one peer already connected 10000000, if the known addresses are 10000001 and 1001000, the best suggestion would be the last one, because it is po 3
from the nearest peer as opposed to 10000001 that is only po 7 away. The best case will cover a gap of po 3 size (1/16 of area or 16 addresses) and the other one just po 7 size (1/256 area or 1 address).

Gaps and load balancing

One additional benefit in considering gaps is load balancing. If the target addresses are distributed randomly (although address popularity is another problem that can also be studied from the gap perspective), the request will
be automatically load balanced if we try to connect to peers covering the bigger gaps. Continuing with our example, if in bin 0 we have peers 10000000 and 10000001 (Fig. 1), almost all addresses in space 1xxxxxxx, that is, half of the addresses will have the same distance from both peers. If we need to send to some of those address we will need to use one of those peers. This could be done randomly, always the first or with some load balancing accounting to use the least
used one.

Fig.1 - Closer peers needs an external Load Balancing mechanism

This last method will still be useful, but if the gap filling strategy is used, most probably both peers will be separated enough that they never compete for an address and a natural load balancing will be made among them (for example, 10000000 and 11000000 will be used each for half the addresses in bin 0 (Fig. 2)).

Fig.2 - Peers chosen by space address gap have a natural load balancing

Implementation

The search for gaps can be done easily using a proximity order tree or pot. Traversing the bins of a node, a gap is found if there is some of the po's missing starting from furthest (left). In each level the starting po to search for is the parent po (not 0, because in the second level, under a node of po=0, the minimum po that could be found is 1).

Implementation of the function that looks for the bigger Gap in a pot can be seen in
pot.BiggestAddressGap. That function returns the biggest gap in the form of a po and
a node under the gap can be found.

This function is used in kademlia.suggestPeerInBinByGap, which it returns a BzzAddress in a particular bin which fills up the biggest address gap. This function is not used in SuggestPeer, but it will be enough to replace the call to suggestPeerInBin with the new one.

Further improvements

Instead of the size of a gap, maybe it could be more interesting to see the rationbetween size and number of current peers serving that gap. If we have n current peers that are equidistant to a particular gap of size s, the load of each of these peers will be on average s/n.
We can define a gap's temperature as that number s/n. When looking for new peers to connect, instead of looking for bigger gaps we could look for hotter gaps.
For example, if in our first example, we can't find a peer in 11xxxxxx and we instead, used the best peer, we could end with the configuration in Fig. 3.

Fig. 3 - Comparing gaps temperature

Here we still have 11xxxxxx as the biggest gap (po=1, size 1/4), same size as 01xxxxxx. But if consider temperature, 01xxxxxx is hotter because is served only by our node 00000000, being its temperature is (1/4)/ 1 = 1/4. However, 11xxxxxx is now served by two peers, so its temperature is (1/4) / 2 = 1/8, and that will mean that we will select 01xxxxxx as the hotter one.

There is a way of implementing temperature calculation so its cost it is the same as looking for biggest gap. Temperature can be calculated on the fly as the gap is found using a pot.

Other metrics could be considered in the temperature, as recently number of requests per address space, performance of current peers...

…alancer

…ot address, renamed findPeerPo to peerPo

…unt(). Several minor comments and fixes

…ubchannel. Some PR comments. Fixed TestEachBinBaseUses so it is more stable

zelig

conflict needs resolved, otherwise lgtm

kortatu added 30 commits September 24, 2019 15:41

Proposed solution for peers load balancing in Kademlia

0f31b5a

network: created global capabilityIndex

c1c9ac5

typo

fc65d5a

renamed globalIndex to defaultIndex

7dc9568

Load balancing capability test

ca11c52

Removed color balancing and using a KademliaLoadBalancer

97ab47e

Missing file in commit

f0fc99d

Merge branch 'master' into issue-1757

0941717

Fixed lint and test when mergin master

e9263d7

Subscription to peer changed closed only by writer, not by subscriptors

ed1f9a9

Added an alternative method for initializing a new peer count

904e204

Merge branch 'master' into issue-1757

29303f1

go fmt

e53ae25

extracted pubsub channel to a package

f82db7f

network: fixed pr comments

0e346e6

Merge branch 'master' into issue-1757

5665ea9

Suggest peer by gap or address space

54da612

network: gap documentation

24379ac

Implementation and further improvements

d4b2489

network/kademlia: Pr comments, tests commented and fixed

4c81a24

network/kademlia: better naming for pub/sub channels in kademlialoadb…

5b44ab3

…alancer

Fixed wrong test in kademlia load balancer. Also fixed waiting methods.

52dacb0

Merge branch 'master' into issue-1757

3301920

Debug functions moved out of kademlialoadbalancer.go

49e7d09

More comments, fixed EachConn po for peers in the same bin as the piv…

ad5eac5

…ot address, renamed findPeerPo to peerPo

Minor typo and further imporvements ideas

3de19c1

resourceUseStats moved to a diffrente file. Use() renamed to AddUseCo…

e78e2b3

…unt(). Several minor comments and fixes

added gopubsub unit tests

21a0d2f

Moved resource_use_stats to its own package. Renamed gopubsub to pubs…

c615cdd

…ubchannel. Some PR comments. Fixed TestEachBinBaseUses so it is more stable

Merge branch 'issue-1757' into suggest_address_space

be65dc2

kortatu and others added 5 commits October 29, 2019 14:52

Pubsub now closes all go routines when closing. Removed commented code

340ce15

Merge branch 'master' into issue-1757

96a8245

fix closing channel

697b049

Merge branch 'issue-1757' into suggest_address_space

140cdf8

Merge branch 'master' into suggest_address_space

423f583

janos added the ready for review label Jan 6, 2020

janos requested review from zelig and acud January 6, 2020 13:36

janos changed the title ~~Suggest address space~~ network: Suggest peer by address space gap Jan 6, 2020

janos requested a review from nolash January 6, 2020 13:38

acud approved these changes Jan 7, 2020

View reviewed changes

zelig approved these changes Jan 11, 2020

View reviewed changes

acud added 2 commits January 13, 2020 11:28

Merge branch 'master' into suggest_address_space

316654d

network test: add break and move sleep

a541026

acud mentioned this pull request Jan 13, 2020

hive/kademlia: Suggest peer by address space gap #1869

Closed

acud merged commit e7e98cf into master Jan 13, 2020

acud deleted the suggest_address_space branch January 13, 2020 05:25

acud added this to the 0.5.5 milestone Jan 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

network: Suggest peer by address space gap #2065

network: Suggest peer by address space gap #2065

janos commented Jan 6, 2020

zelig left a comment

network: Suggest peer by address space gap #2065

network: Suggest peer by address space gap #2065

Conversation

janos commented Jan 6, 2020

Address space gaps

Gaps for peer suggestion

Gaps and load balancing

Implementation

Further improvements

zelig left a comment

Choose a reason for hiding this comment