Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Gossip Loop metrics #26195

Merged
merged 18 commits into from
Jun 29, 2022
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions gossip/src/cluster_info.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1556,6 +1556,7 @@ impl ClusterInfo {
sender: &PacketBatchSender,
generate_pull_requests: bool,
) -> Result<(), GossipError> {
let _st = ScopedTimer::from(&self.stats.gossip_transmit_loop_time);
let reqs = self.generate_new_gossip_requests(
thread_pool,
gossip_validators,
Expand All @@ -1573,6 +1574,7 @@ impl ClusterInfo {
.add_relaxed(packet_batch.len() as u64);
sender.send(packet_batch)?;
}
self.stats.gossip_transmit_loop_itrs_since_last_report.add_relaxed(1);
Ok(())
}

Expand Down Expand Up @@ -2435,6 +2437,7 @@ impl ClusterInfo {
stakes,
response_sender,
);
self.stats.process_gossip_packets_itrs_since_last_report.add_relaxed(1);
Ok(())
}

Expand Down Expand Up @@ -2490,6 +2493,7 @@ impl ClusterInfo {
last_print: &mut Instant,
should_check_duplicate_instance: bool,
) -> Result<(), GossipError> {
let _st = ScopedTimer::from(&self.stats.gossip_listen_loop_time);
const RECV_TIMEOUT: Duration = Duration::from_secs(1);
const SUBMIT_GOSSIP_STATS_INTERVAL: Duration = Duration::from_secs(2);
let mut packets = VecDeque::from(receiver.recv_timeout(RECV_TIMEOUT)?);
Expand Down Expand Up @@ -2528,6 +2532,7 @@ impl ClusterInfo {
submit_gossip_stats(&self.stats, &self.gossip, &stakes);
*last_print = Instant::now();
}
self.stats.gossip_listen_loop_itrs_since_last_report.add_relaxed(1);
Ok(())
}

Expand Down
30 changes: 30 additions & 0 deletions gossip/src/cluster_info_metrics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,10 @@ pub struct GossipStats {
pub(crate) gossip_pull_request_verify_fail: Counter,
pub(crate) gossip_pull_response_verify_fail: Counter,
pub(crate) gossip_push_msg_verify_fail: Counter,
pub(crate) gossip_transmit_loop_time: Counter,
pub(crate) gossip_transmit_loop_itrs_since_last_report: Counter,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: itrs -> iterations we usually spell out all words without abbreviations

pub(crate) gossip_listen_loop_time: Counter,
pub(crate) gossip_listen_loop_itrs_since_last_report: Counter,
pub(crate) handle_batch_ping_messages_time: Counter,
pub(crate) handle_batch_pong_messages_time: Counter,
pub(crate) handle_batch_prune_messages_time: Counter,
Expand All @@ -138,6 +142,7 @@ pub struct GossipStats {
pub(crate) packets_sent_pull_responses_count: Counter,
pub(crate) packets_sent_push_messages_count: Counter,
pub(crate) process_gossip_packets_time: Counter,
pub(crate) process_gossip_packets_itrs_since_last_report: Counter,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please keep the fields in this struct alphabetically sorted ?
that makes it easier to resolve merge conflicts or skim through the list of metrics.

pub(crate) process_prune: Counter,
pub(crate) process_pull_requests: Counter,
pub(crate) process_pull_response: Counter,
Expand Down Expand Up @@ -237,6 +242,11 @@ pub(crate) fn submit_gossip_stats(
stats.process_gossip_packets_time.clear(),
i64
),
(
"process_gossip_packets_itrs_since_last_report",
stats.process_gossip_packets_itrs_since_last_report.clear(),
i64
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for easier queries, can you please report this in the same table as the other ones?

(
"verify_gossip_packets_time",
stats.verify_gossip_packets_time.clear(),
Expand Down Expand Up @@ -385,6 +395,26 @@ pub(crate) fn submit_gossip_stats(
stats.gossip_pull_request_dropped_requests.clear(),
i64
),
(
"gossip_transmit_loop_time",
stats.gossip_transmit_loop_time.clear(),
i64
),
(
"gossip_transmit_loop_itrs_since_last_report",
stats.gossip_transmit_loop_itrs_since_last_report.clear(),
i64
),
(
"gossip_listen_loop_time",
stats.gossip_listen_loop_time.clear(),
i64
),
(
"gossip_listen_loop_itrs_since_last_report",
stats.gossip_listen_loop_itrs_since_last_report.clear(),
i64
),
);
datapoint_info!(
"cluster_info_stats4",
Expand Down