-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kademlia, hive: rate limiting on gossiping peers (send and receive) #1654
Conversation
pkg/topology/kademlia/kademlia.go
Outdated
|
||
if len(addrs) == 0 { | ||
return nil | ||
} | ||
|
||
err := k.discovery.BroadcastPeers(ctx, peer, addrs...) | ||
if err != nil { | ||
fmt.Println("DISCONNECT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to keep this line or was it intended to be something temp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh snap...
type Service struct { | ||
streamer p2p.Streamer | ||
addressBook addressbook.GetPutter | ||
addPeersHandler func(context.Context, ...swarm.Address) error | ||
networkID uint64 | ||
logger logging.Logger | ||
metrics metrics | ||
limiter map[string]*rate.Limiter | ||
limiterLock sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r: this line isn't needed, the zero value of sync.Mutex
is already useful
pkg/hive/hive.go
Outdated
@@ -169,3 +188,23 @@ func (s *Service) peersHandler(ctx context.Context, peer p2p.Peer, stream p2p.St | |||
|
|||
return nil | |||
} | |||
|
|||
func (s *Service) rateLimitPeer(ctx context.Context, peer swarm.Address, count int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ctx
parameter isn't used in the method, so it should be eighter used or removed.
pkg/topology/kademlia/kademlia.go
Outdated
_ = k.connectedPeers.EachBinRev(func(connectedPeer swarm.Address, _ uint8) (bool, bool, error) { | ||
if connectedPeer.Equal(peer) { | ||
return false, false, nil | ||
//r := rand.New(securerand.NewSource()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope :)
return false, false, nil | ||
//r := rand.New(securerand.NewSource()) | ||
|
||
for bin := uint8(0); bin < swarm.MaxBins; bin++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r: @acud is there a reason why swarm.MaxBins
is a typed constant? If not, I'll remove the type and change the loop: for bin := 0; bin < swarm.MaxBins; bin++ {...
(also in the affected code in the pkg/swarm/proximity.go
file), more on this topic: https://blog.golang.org/constants
@@ -668,28 +672,31 @@ func (k *Kad) connect(ctx context.Context, peer swarm.Address, ma ma.Multiaddr, | |||
func (k *Kad) Announce(ctx context.Context, peer swarm.Address) error { | |||
addrs := []swarm.Address{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
func randomSubset(addrs []swarm.Address, count int) ([]swarm.Address, error) { | ||
|
||
if count >= len(addrs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addresses aren't shuffled when the count count >= len(addrs)
, is this expected behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
@@ -95,6 +95,28 @@ func (s *PSlice) EachBinRev(pf topology.EachPeerFunc) error { | |||
return nil | |||
} | |||
|
|||
func (s *PSlice) BinPeers(bin uint8) []swarm.Address { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r: if you remove the type from swarm.MaxBin
constant as recommended above, then the bin
can be simply an integer and the casting on L:102
wouldn't be necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 8 files at r1.
Reviewable status: 7 of 8 files reviewed, 11 unresolved discussions (waiting on @esadakar, @mrekucci, and @zelig)
go.mod, line 69 at r1 (raw file):
golang.org/x/term v0.0.0-20201210144234-2321bbc49cbf golang.org/x/text v0.3.4 // indirect golang.org/x/time v0.0.0-20191024005414-555d28b269f0
this change is probably not needed and can be reverted
pkg/hive/hive.go, line 40 at r1 (raw file):
var ( ErrRateLimitExceed = errors.New("rate limit exceeded")
r: maybe ErrRateLimit
or ErrRateLimitExceeded
pkg/hive/hive.go, line 41 at r1 (raw file):
var ( ErrRateLimitExceed = errors.New("rate limit exceeded") limitBurst = 4 * int(swarm.MaxBins)
this is still quite a bit (128 in a minute, 7700 in an hour, 180K a day), so the address spam vector is still there. that being said, i'm not sure I have a better idea. maybe limiting to 4 peers in a bin in a minute?
pkg/hive/hive.go, line 201 at r1 (raw file):
limiter, ok := s.limiter[addr] if !ok { limiter = rate.NewLimiter(limitRate, limitBurst)
you add to the map but never clean up. this is a clear memory leak. you'd need to leverage the DisconnectIn
and DisconnectOut
protocol helpers to detect that the peer went away (or we decided to disconnect from it) in order to clean up the entry for the peer
pkg/topology/kademlia/kademlia.go, line 677 at r1 (raw file):
Previously, mrekucci (Peter Mrekaj) wrote…
r: @acud is there a reason why
swarm.MaxBins
is a typed constant? If not, I'll remove the type and change the loop:for bin := 0; bin < swarm.MaxBins; bin++ {...
(also in the affected code in thepkg/swarm/proximity.go
file), more on this topic: https://blog.golang.org/constants
it is typed since everything which is a PO is intended to be a uint8
. Going with the untyped approach might make daily life a tad bit easier but I think that having it typed provides good assurances and a strong consideration for developers to actually keep in mind the upper bound when actually writing code
pkg/topology/kademlia/kademlia_test.go, line 1244 at r1 (raw file):
t.Helper() timeout := time.After(5 * time.Second)
is the change needed?
pkg/topology/pslice/pslice_test.go, line 260 at r1 (raw file):
} { t.Run("", func(t *testing.T) {
test label missing. we usually add a short strong in the actual tc
struct that has the string label which is used here
pkg/topology/pslice/pslice_test.go, line 284 at r1 (raw file):
bins := ps.BinPeers(uint8(len(tc.peersCount))) if bins != nil { t.Fatal("peers must be nil for out of bound bind")
out of bound bin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, heed @mrekucci 's comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @acud from 6 discussions.
Reviewable status: 2 of 8 files reviewed, 6 unresolved discussions (waiting on @acud and @esadakar)
go.mod, line 69 at r1 (raw file):
Previously, acud (acud) wrote…
this change is probably not needed and can be reverted
this gets added after go mod tidy
pkg/hive/hive.go, line 40 at r1 (raw file):
Previously, acud (acud) wrote…
r: maybe
ErrRateLimit
orErrRateLimitExceeded
Done.
pkg/hive/hive.go, line 41 at r1 (raw file):
Previously, acud (acud) wrote…
this is still quite a bit (128 in a minute, 7700 in an hour, 180K a day), so the address spam vector is still there. that being said, i'm not sure I have a better idea. maybe limiting to 4 peers in a bin in a minute?
I believe the way this rate limiting works is that we start with a token count B, capped at burst amount, and the rate R replenishes the token count. So at t=0, we have B tokens, so 128 are allowed in a single request. If 128 peers connect, than the token count falls to zero. After that, 1 peer per minute is allowed. If no peers connect after the initial burst, then only after 128 minutes. the token count has replenished to the full burst amount.
pkg/hive/hive.go, line 201 at r1 (raw file):
Previously, acud (acud) wrote…
you add to the map but never clean up. this is a clear memory leak. you'd need to leverage the
DisconnectIn
andDisconnectOut
protocol helpers to detect that the peer went away (or we decided to disconnect from it) in order to clean up the entry for the peer
Makes sense. My question would then be, what happens if a peer consistently reconnects, causing the rate limit counters to be wiped each time?
pkg/topology/kademlia/kademlia_test.go, line 1244 at r1 (raw file):
Previously, acud (acud) wrote…
is the change needed?
No :)
pkg/topology/pslice/pslice_test.go, line 260 at r1 (raw file):
Previously, acud (acud) wrote…
test label missing. we usually add a short strong in the actual
tc
struct that has the string label which is used here
Done.
pkg/topology/pslice/pslice_test.go, line 284 at r1 (raw file):
Previously, acud (acud) wrote…
out of bound bin?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r2, 5 of 5 files at r3, 2 of 2 files at r4.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @esadakar)
pkg/hive/hive.go, line 201 at r1 (raw file):
Previously, esadakar (Esad Akar) wrote…
Makes sense. My question would then be, what happens if a peer consistently reconnects, causing the rate limit counters to be wiped each time?
That is indeed possible 👍 nicely spotted. This is possible, but due to other nodes possible to be connecting in and reaching the maximum over saturated peer count in NN, meaning if they constantly do that they might get their slot taken by another node. But yeah this is indeed a thing. We should probably detect this sort of behavior in kademlia
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @esadakar)
pkg/hive/hive.go, line 41 at r1 (raw file): Previously, esadakar (Esad Akar) wrote…
It's called a Token Bucket: https://en.wikipedia.org/wiki/Token_bucket |
#1645
This change is