Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kademlia, hive: rate limiting on gossiping peers (send and receive) #1654

Merged
merged 3 commits into from
May 6, 2021

Conversation

istae
Copy link
Member

@istae istae commented May 4, 2021

#1645


This change is Reviewable


if len(addrs) == 0 {
return nil
}

err := k.discovery.BroadcastPeers(ctx, peer, addrs...)
if err != nil {
fmt.Println("DISCONNECT")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to keep this line or was it intended to be something temp?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh snap...

@istae istae added the ready for review The PR is ready to be reviewed label May 5, 2021
type Service struct {
streamer p2p.Streamer
addressBook addressbook.GetPutter
addPeersHandler func(context.Context, ...swarm.Address) error
networkID uint64
logger logging.Logger
metrics metrics
limiter map[string]*rate.Limiter
limiterLock sync.Mutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r: this line isn't needed, the zero value of sync.Mutex is already useful

pkg/hive/hive.go Outdated
@@ -169,3 +188,23 @@ func (s *Service) peersHandler(ctx context.Context, peer p2p.Peer, stream p2p.St

return nil
}

func (s *Service) rateLimitPeer(ctx context.Context, peer swarm.Address, count int) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ctx parameter isn't used in the method, so it should be eighter used or removed.

_ = k.connectedPeers.EachBinRev(func(connectedPeer swarm.Address, _ uint8) (bool, bool, error) {
if connectedPeer.Equal(peer) {
return false, false, nil
//r := rand.New(securerand.NewSource())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope :)

return false, false, nil
//r := rand.New(securerand.NewSource())

for bin := uint8(0); bin < swarm.MaxBins; bin++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r: @acud is there a reason why swarm.MaxBins is a typed constant? If not, I'll remove the type and change the loop: for bin := 0; bin < swarm.MaxBins; bin++ {... (also in the affected code in the pkg/swarm/proximity.go file), more on this topic: https://blog.golang.org/constants

@@ -668,28 +672,31 @@ func (k *Kad) connect(ctx context.Context, peer swarm.Address, ma ma.Multiaddr,
func (k *Kad) Announce(ctx context.Context, peer swarm.Address) error {
addrs := []swarm.Address{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


func randomSubset(addrs []swarm.Address, count int) ([]swarm.Address, error) {

if count >= len(addrs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addresses aren't shuffled when the count count >= len(addrs), is this expected behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@@ -95,6 +95,28 @@ func (s *PSlice) EachBinRev(pf topology.EachPeerFunc) error {
return nil
}

func (s *PSlice) BinPeers(bin uint8) []swarm.Address {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r: if you remove the type from swarm.MaxBin constant as recommended above, then the bin can be simply an integer and the casting on L:102 wouldn't be necessary

Copy link
Member

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 8 files at r1.
Reviewable status: 7 of 8 files reviewed, 11 unresolved discussions (waiting on @esadakar, @mrekucci, and @zelig)


go.mod, line 69 at r1 (raw file):

	golang.org/x/term v0.0.0-20201210144234-2321bbc49cbf
	golang.org/x/text v0.3.4 // indirect
	golang.org/x/time v0.0.0-20191024005414-555d28b269f0

this change is probably not needed and can be reverted


pkg/hive/hive.go, line 40 at r1 (raw file):

var (
	ErrRateLimitExceed = errors.New("rate limit exceeded")

r: maybe ErrRateLimit or ErrRateLimitExceeded


pkg/hive/hive.go, line 41 at r1 (raw file):

var (
	ErrRateLimitExceed = errors.New("rate limit exceeded")
	limitBurst         = 4 * int(swarm.MaxBins)

this is still quite a bit (128 in a minute, 7700 in an hour, 180K a day), so the address spam vector is still there. that being said, i'm not sure I have a better idea. maybe limiting to 4 peers in a bin in a minute?


pkg/hive/hive.go, line 201 at r1 (raw file):

	limiter, ok := s.limiter[addr]
	if !ok {
		limiter = rate.NewLimiter(limitRate, limitBurst)

you add to the map but never clean up. this is a clear memory leak. you'd need to leverage the DisconnectIn and DisconnectOut protocol helpers to detect that the peer went away (or we decided to disconnect from it) in order to clean up the entry for the peer


pkg/topology/kademlia/kademlia.go, line 677 at r1 (raw file):

Previously, mrekucci (Peter Mrekaj) wrote…

r: @acud is there a reason why swarm.MaxBins is a typed constant? If not, I'll remove the type and change the loop: for bin := 0; bin < swarm.MaxBins; bin++ {... (also in the affected code in the pkg/swarm/proximity.go file), more on this topic: https://blog.golang.org/constants

it is typed since everything which is a PO is intended to be a uint8. Going with the untyped approach might make daily life a tad bit easier but I think that having it typed provides good assurances and a strong consideration for developers to actually keep in mind the upper bound when actually writing code


pkg/topology/kademlia/kademlia_test.go, line 1244 at r1 (raw file):

	t.Helper()

	timeout := time.After(5 * time.Second)

is the change needed?


pkg/topology/pslice/pslice_test.go, line 260 at r1 (raw file):

	} {

		t.Run("", func(t *testing.T) {

test label missing. we usually add a short strong in the actual tc struct that has the string label which is used here


pkg/topology/pslice/pslice_test.go, line 284 at r1 (raw file):

			bins := ps.BinPeers(uint8(len(tc.peersCount)))
			if bins != nil {
				t.Fatal("peers must be nil for out of bound bind")

out of bound bin?

Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, heed @mrekucci 's comments

Copy link
Member Author

@istae istae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @acud from 6 discussions.
Reviewable status: 2 of 8 files reviewed, 6 unresolved discussions (waiting on @acud and @esadakar)


go.mod, line 69 at r1 (raw file):

Previously, acud (acud) wrote…

this change is probably not needed and can be reverted

this gets added after go mod tidy


pkg/hive/hive.go, line 40 at r1 (raw file):

Previously, acud (acud) wrote…

r: maybe ErrRateLimit or ErrRateLimitExceeded

Done.


pkg/hive/hive.go, line 41 at r1 (raw file):

Previously, acud (acud) wrote…

this is still quite a bit (128 in a minute, 7700 in an hour, 180K a day), so the address spam vector is still there. that being said, i'm not sure I have a better idea. maybe limiting to 4 peers in a bin in a minute?

I believe the way this rate limiting works is that we start with a token count B, capped at burst amount, and the rate R replenishes the token count. So at t=0, we have B tokens, so 128 are allowed in a single request. If 128 peers connect, than the token count falls to zero. After that, 1 peer per minute is allowed. If no peers connect after the initial burst, then only after 128 minutes. the token count has replenished to the full burst amount.


pkg/hive/hive.go, line 201 at r1 (raw file):

Previously, acud (acud) wrote…

you add to the map but never clean up. this is a clear memory leak. you'd need to leverage the DisconnectIn and DisconnectOut protocol helpers to detect that the peer went away (or we decided to disconnect from it) in order to clean up the entry for the peer

Makes sense. My question would then be, what happens if a peer consistently reconnects, causing the rate limit counters to be wiped each time?


pkg/topology/kademlia/kademlia_test.go, line 1244 at r1 (raw file):

Previously, acud (acud) wrote…

is the change needed?

No :)


pkg/topology/pslice/pslice_test.go, line 260 at r1 (raw file):

Previously, acud (acud) wrote…

test label missing. we usually add a short strong in the actual tc struct that has the string label which is used here

Done.


pkg/topology/pslice/pslice_test.go, line 284 at r1 (raw file):

Previously, acud (acud) wrote…

out of bound bin?

Done.

Copy link
Member

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r2, 5 of 5 files at r3, 2 of 2 files at r4.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @esadakar)


pkg/hive/hive.go, line 201 at r1 (raw file):

Previously, esadakar (Esad Akar) wrote…

Makes sense. My question would then be, what happens if a peer consistently reconnects, causing the rate limit counters to be wiped each time?

That is indeed possible 👍 nicely spotted. This is possible, but due to other nodes possible to be connecting in and reaching the maximum over saturated peer count in NN, meaning if they constantly do that they might get their slot taken by another node. But yeah this is indeed a thing. We should probably detect this sort of behavior in kademlia

Copy link
Member

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @esadakar)

@istae istae merged commit ac06c59 into master May 6, 2021
@istae istae deleted the hive-ratelimit branch May 6, 2021 12:05
@ldeffenb
Copy link
Collaborator

ldeffenb commented May 6, 2021


pkg/hive/hive.go, line 41 at r1 (raw file):

Previously, esadakar (Esad Akar) wrote…

I believe the way this rate limiting works is that we start with a token count B, capped at burst amount, and the rate R replenishes the token count. So at t=0, we have B tokens, so 128 are allowed in a single request. If 128 peers connect, than the token count falls to zero. After that, 1 peer per minute is allowed. If no peers connect after the initial burst, then only after 128 minutes. the token count has replenished to the full burst amount.

It's called a Token Bucket: https://en.wikipedia.org/wiki/Token_bucket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pull-request ready for review The PR is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants