-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Gossip Loop metrics #26195
Add Gossip Loop metrics #26195
Conversation
- fix nav link - add bounty split policy for duplicate reports
…a-labs#25688) * Define shuffle to prep using same shuffle for multiple slices * Determine transaction indexes and plumb to execute_batch * Pair transaction_index with transaction in TransactionStatusService * Add new ReplicaTransactionInfoVersion * Plumb transaction_indexes through BankingStage * Prepare BankingStage to receive transaction indexes from PohRecorder * Determine transaction indexes in PohRecorder; add field to WorkingBank * Add PohRecorder::record unit test * Only pass starting_transaction_index around PohRecorder * Add helper structs to simplify test DashMap * Pass entry and starting-index into process_entries_with_callback together * Add tx-index checks to test_rebatch_transactions * Revert shuffle definition and use zip/unzip * Only zip/unzip if randomize * Add confirm_slot_entries test * Review nits * Add type alias to make sender docs more clear
finish filling out the table....
…a into ftr-gossip-loop-metrics
gossip/src/cluster_info_metrics.rs
Outdated
@@ -113,6 +113,10 @@ pub struct GossipStats { | |||
pub(crate) gossip_pull_request_verify_fail: Counter, | |||
pub(crate) gossip_pull_response_verify_fail: Counter, | |||
pub(crate) gossip_push_msg_verify_fail: Counter, | |||
pub(crate) gossip_transmit_loop_time: Counter, | |||
pub(crate) gossip_transmit_loop_itrs_since_last_report: Counter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: itrs
-> iterations
we usually spell out all words without abbreviations
gossip/src/cluster_info_metrics.rs
Outdated
( | ||
"process_gossip_packets_itrs_since_last_report", | ||
stats.process_gossip_packets_itrs_since_last_report.clear(), | ||
i64 | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for easier queries, can you please report this in the same table as the other ones?
gossip/src/cluster_info_metrics.rs
Outdated
@@ -138,6 +142,7 @@ pub struct GossipStats { | |||
pub(crate) packets_sent_pull_responses_count: Counter, | |||
pub(crate) packets_sent_push_messages_count: Counter, | |||
pub(crate) process_gossip_packets_time: Counter, | |||
pub(crate) process_gossip_packets_itrs_since_last_report: Counter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please keep the fields in this struct alphabetically sorted ?
that makes it easier to resolve merge conflicts or skim through the list of metrics.
Pull request has been modified.
Problem
There is little insight into how well the gossip protocol is running. We need gossip timing metrics for:
run_gossip()
run_listen()
process_packets()
Summary of Changes
Added two timers the measure the loop time of the
run_gossip()
andrun_listen()
methods respectively.gossip_transmit_loop_time
gossip_listen_loop_time
Note a timer metric called:
process_gossip_packets_time
already exists forprocess_packets()
.Added three loop counters that count the total number of loop iterations since the last metric report for all the of the above loops/
gossip_transmit_loop_itrs_since_last_report
gossip_listen_loop_itrs_since_last_report
process_gossip_packets_itrs_since_last_report
Fixes #