-
Notifications
You must be signed in to change notification settings - Fork 629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: rate limit for network messages #11646
Conversation
/// Returns `None` if the message is not meant to be rate limited in any scenario. | ||
fn get_key_and_token_cost(message: &PeerMessage) -> Option<(RateLimitedPeerMessageKey, u32)> { | ||
use RateLimitedPeerMessageKey::*; | ||
match message { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More boilerplate than I would like, but it make sure we don't forget to consider rate limits when adding new messages.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #11646 +/- ##
==========================================
+ Coverage 71.66% 71.73% +0.07%
==========================================
Files 788 790 +2
Lines 161300 161740 +440
Branches 161300 161740 +440
==========================================
+ Hits 115589 116020 +431
- Misses 40674 40681 +7
- Partials 5037 5039 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Added configuration of rate limits through overrides, in a fashion similar to existing network settings override. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Andrea, this looks great overall. I left one comment about the internals of TokenBucket for your thoughts.
… way the tokens can be refreshed during the call to acquire() eliminating the need of a periodic task
I think this draft is now ready for review. IMO a good plan can be to merge this PR which has the rate limits implementation and then open another PR with the actual rate limits settings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. The plan of adding the default limits in a separate PR sounds reasonable.
{ | ||
let labels = [peer_msg.msg_variant()]; | ||
metrics::PEER_MESSAGE_RECEIVED_BY_TYPE_TOTAL.with_label_values(&labels).inc(); | ||
metrics::PEER_MESSAGE_RECEIVED_BY_TYPE_BYTES | ||
.with_label_values(&labels) | ||
.inc_by(msg.len() as u64); | ||
if !self.received_messages_rate_limits.is_allowed(&peer_msg, now) { | ||
metrics::PEER_MESSAGE_RATE_LIMITED_BY_TYPE_TOTAL.with_label_values(&labels).inc(); | ||
tracing::debug!(target: "network", "Peer {} is being rate limited for message {}", self.peer_info, peer_msg.msg_variant()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Food for thought, what purpose do you envision this log message serving?
I guess one thing it helps us to understand is which peer is sending the spam, something which the aggregate metrics (which are only labelled by variant) do not capture. Another way to check which peer is spamming us could be to look at the receive rate (in bytes) per connection, something which is reported via debug api and visible on debug pages. However, that would not be fine-grained by message variant.
Just trying to think through how this might be used and whether there might be a more efficient way to represent the same information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - the purpose would be to understanding which peer is spamming. As you said there are other ways to know know the source of the attack.
I think by default we ran at INFO level is that right? I don't think it hurts to have this debug but I don't think it brings a lot of value either
let tokens_to_add = duration.as_secs_f64() * self.refill_rate as f64; | ||
let tokens_to_add = (tokens_to_add * TOKEN_PARTS_NUMBER as f64) as u64; | ||
// Update `last_refill` and `size` only if there's a change. This is done to prevent | ||
// losing token parts to clamping if the duration is too small. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
} | ||
|
||
#[cfg(test)] | ||
mod tests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Props on thorough coverage
impl RateLimits { | ||
/// Creates all buckets as configured in `config`. | ||
/// See also [TokenBucket::new]. | ||
pub fn from_config(config: &Config, start_time: Instant) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would make sense to print some kind of warning if the override is lower than the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing this configuration is kinda tricky, I don't see many operators doing that - they need to know what they are doing. Is emitting warning for config changes a pattern we use in other places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh hmm why do you say the configuration is tricky to change? If I want to bump up my receive rate for a certain message type, do I need to do more than adding a line to the config.json in the right place?
To answer your question, I didn't find an example of warns specifically but there are some examples here and here of checks on config values. Of course a bail!
is impossible to miss (node won't start) while a warning may not even be seen, so if it's not super easy to add I wouldn't waste time on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's easy once you know what to do, a simple addition to config.json.
It's just that if I put myself in the shoes of an average operator it might non-trivial to know what values to use. A knowledge problem rather than a technical one.
Personally, I think we can go without the warning and keep the bail!
if validation fails, for now at least.
@@ -311,6 +315,10 @@ impl PeerActor { | |||
// That likely requires bigger changes and account_id here is later used for debug / logging purposes only. | |||
account_id: network_state.config.validator.account_id(), | |||
}; | |||
let received_messages_rate_limits = messages_limits::RateLimits::from_config( | |||
&network_state.config.received_messages_rate_limits, | |||
clock.now(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it help to pass a Clock instance (clock.clone()) to RateLimits and call it for getting current time through different methods during the rate limiting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer keeping the parameter an Instant
which has less responsibility than a clock, because calling now()
is all we do with the clock
Implementation for rate limits of incoming network messages.
Original issue: #11617. Also supersedes #11618.
Note: rate limits are implemented but not defined with this PR; in practice, nothing should change for a node.
PR summary
This PR adds:
token_bucket.rs
)messages_limits.rs
)peer_actor.rs
to implement the rate limits on received messagesLeftovers
Make rate limits configurable, likely through config files with overridesdoneUse more accurate token allocation for some network messages, in particular the ones containing a dynamic number of elements. For reference: analysisto be done in a another PR