New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mertrics: report region flow #1517
Conversation
@@ -1372,6 +1375,7 @@ impl Peer { | |||
} | |||
|
|||
metrics.keys_written += ctx.wb.count() as u64; | |||
self.bytes_written += ctx.wb.data_size() as u64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not put it into RaftReadyMetrics?And we may need to count the size of entries written into rocksdb too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we put it into RaftReadyMetrics , each region need a variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. RaftReadyMetrics is stored in Store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok here, we may report this to PD later.
continue; | ||
} | ||
|
||
REGION_BYTES_WRITTEN_HISTOGRAM.observe(peer.bytes_written as f64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a local histogram then flush, see rust Prometheus
LocalHistogram
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LocalHistogram not export yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged, you can use it now.
@@ -140,4 +140,11 @@ lazy_static! { | |||
"tikv_engine_keys_written_count", | |||
"Count of keys has been written for this interval" | |||
).unwrap(); | |||
|
|||
pub static ref REGION_BYTES_WRITTEN_HISTOGRAM: Histogram = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can check kv count too
@@ -34,6 +34,7 @@ pub enum Tick { | |||
SnapGc, | |||
CompactLockCf, | |||
ConsistencyCheck, | |||
ReportWriteBytes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not only for Bytes now.
@@ -113,6 +115,8 @@ pub struct Config { | |||
|
|||
// Interval (ms) to check region whether the data is consistent. | |||
pub consistency_check_tick_interval: u64, | |||
|
|||
pub report_write_bytes_interval: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a short value in test so we can cover this feature.
}; | ||
} | ||
|
||
fn on_report_region_flow(&mut self, event_loop: &mut EventLoop<Self>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please cover it in test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
LGTM |
@@ -28,6 +28,7 @@ use super::sync_storage::SyncStorage; | |||
|
|||
fn new_raft_storage() -> (Cluster<ServerCluster>, SyncStorage, Context) { | |||
let mut cluster = new_server_cluster_with_cfs(0, 1, ALL_CFS); | |||
cluster.cfg.raft_store.report_region_flow_interval = 100; // 100ms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initialize this in new_store_cfg directly.
PTAL @BusyJay |
@@ -23,7 +23,7 @@ pub struct RaftReadyMetrics { | |||
pub commit: u64, | |||
pub append: u64, | |||
pub snapshot: u64, | |||
pub keys_written: u64, | |||
pub store_keys_written: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peer.bytes_written peer.keys_written
store.region_bytes_written store. region_keys_written
metrics.store_keys_written
The prefix levels look fine to me.
…s-region-write-bytes
4d3ce7a
to
8f337fe
Compare
PING |
|
||
fn on_report_region_flow(&mut self, event_loop: &mut EventLoop<Self>) { | ||
for (_, peer) in &mut self.region_peers { | ||
if !peer.is_leader() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If peer steps down, all records will be lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ok, we don't need exact result.
@@ -143,9 +143,23 @@ lazy_static! { | |||
&["cf"] | |||
).unwrap(); | |||
|
|||
pub static ref STORE_KEYS_WRITTEN_COUNTER: Counter = | |||
pub static ref STORE_WRITTEN_KEYS_COUNTER: Counter = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just REGION_WRITTEN_KEY_HISTOGRAM
is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Em, here we may just want to know the written keys rate, if too high, the store may have a high load, so Counter is ok.
PTAL @BusyJay |
continue; | ||
} | ||
|
||
self.region_written_bytes.observe(peer.written_bytes as f64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why observe it here rather every time handle ready?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, then we can remove the timer here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on_raft_ready
know nothing about time, what we want is to get region flow in a period of time.
LGTM |
@siddontang @BusyJay PTAL