-
Notifications
You must be signed in to change notification settings - Fork 20k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p2p: meter peer traffic, emit metered peer events #17695
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice code, needs a few tiny fixups.
p2p/metrics.go
Outdated
MetricsOutboundTraffic = "p2p/OutboundTraffic" // Name for the registered outbound traffic meter | ||
|
||
MetricsRegistryIngressPrefix = MetricsInboundTraffic + "/" | ||
MetricsRegistryEgressPrefix = MetricsOutboundTraffic + "/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need these as public types? We can add the extra /
wherever it's needed. I don't much see the value of having extra public constants that are just an already existing one plus a slash.
p2p/metrics.go
Outdated
MetricsRegistryIngressPrefix = MetricsInboundTraffic + "/" | ||
MetricsRegistryEgressPrefix = MetricsOutboundTraffic + "/" | ||
|
||
MeteredPeerLimit = 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be public? I.e. do we expect to use it anywhere else, or is this just a sanity field to avoid OOM issues? If the latter, please make it private. Also please add a comment to it, it's not obvious what the purpose is (from an outside perspective).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be public, because the dashboard will use it at the cleaning of the history.
p2p/metrics.go
Outdated
PeerIngressRegistry = metrics.NewPrefixedChildRegistry(metrics.DefaultRegistry, MetricsRegistryIngressPrefix) // Registry containing the peer ingress | ||
PeerEgressRegistry = metrics.NewPrefixedChildRegistry(metrics.DefaultRegistry, MetricsRegistryEgressPrefix) // Registry containing the peer egress | ||
|
||
metricsFeed event.Feed // Event feed for peer metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps name this meteredPeerFeed
so it's consistent with meteredPeerCount
?
p2p/metrics.go
Outdated
PeerHandshakeFailed | ||
) | ||
|
||
// MeteredPeerEvent is an event emitted when peers connect or disconnect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a period to the end of the type doc.
type MeteredPeerEvent struct { | ||
Type MeteredPeerEventType // Type of peer event | ||
IP net.IP // IP address of the peer | ||
ID string // NodeID of the peer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fjl Is this compatible with the new node rework? Would we need to change anything here?
p2p/metrics.go
Outdated
Egress uint64 // Egress count in the moment of disconnection | ||
} | ||
|
||
// SubscribeMeteredPeerEvent registers a subscription of MeteredPeerEvent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// SubscribeMeteredPeerEvent registers a subscription for peer life-cycle events
// if metrics collection is enabled.
p2p/metrics.go
Outdated
if atomic.LoadUint64(&meteredPeerCount) >= MeteredPeerLimit { | ||
log.Warn("Metered peer count reached the limit") | ||
return conn | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the best solution. We would still like to create the metered connection, even if we exceed out limits, just not register a new meter into the metrics subsystem. I.e. we would still collect the global connectivity and traffic metrics, just not the individual ones.
For this perhaps this check should be in handshakeDone
, where if we exceed the limit, we omit to create the internal metrics, but everything else remains.
p2p/metrics.go
Outdated
metricsFeed.Send(MeteredPeerEvent{ | ||
Type: PeerConnected, | ||
IP: c.ip, | ||
ID: id.String(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c.id
, no need to stringify again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c.id
is not safe to use without locking, this is the reason, why I used the id
again. However I can move it to another variable to avoid the double stringifying.
p2p/metrics.go
Outdated
// the peer from the traffic registries and emits close event. | ||
func (c *meteredConn) Close() error { | ||
// Decrement the metered peer count | ||
atomic.AddUint64(&meteredPeerCount, ^uint64(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're working with a tiny counter, might as well be int32
. Then we can have -1
too and not have to do weird bit inversions.
p2p/metrics.go
Outdated
atomic.AddUint64(&meteredPeerCount, ^uint64(0)) | ||
|
||
c.lock.RLock() | ||
// If the peer disconnects before the handshake |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this one live below into the if
(// Peer disconnected before the handshake
). Cleaner look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Took me a while to figure out exactly what the c.lock
was intended to protect (reading the c.metered
, and not the c.ingressMeter
), so that could possibly be better documented. Other than that it looks neat
if atomic.LoadInt32(&meteredPeerCount) >= MeteredPeerLimit { | ||
log.Warn("Metered peer count reached the limit") | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that even if we don't do individual traffic metering, the events should still nonetheless fire. We want to omit creating arbitrarily many registered meters, but life-cycle events I would personally keep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the code, so that the events fire even when the peers don't have registered meters.
Should we make a difference between the peers that have meters, and the ones that don't?
I.e. should we give them different MeteredPeerEventType
s?
p2p/metrics.go
Outdated
return | ||
} | ||
// Increment the metered peer count | ||
atomic.AddInt32(&meteredPeerCount, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This construct is racey, because you check first and increment afterwards, but it could change in between.
if atomic.LoadInt32(&meteredPeerCount) >= MeteredPeerLimit {
return
}
atomic.AddInt32(&meteredPeerCount, 1)
If you want to ensure that we stay correctly below MeteredPeerLimit
, you atomically add-and-retrieve and then see if you went below the allowance. If yes, you can decrement back. A bit more work, but correct AFAIK:
if atomic.AddInt32(&meteredPeerCount, 1) >= MeteredPeerLimit {
atomic.AddInt32(&meteredPeerCount, -1)
return
}
@fjl Could you please confirm this? ^
5285531
to
8556748
Compare
This PR extends the peer metrics collection: