Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat/raft stat describer #208

Merged
merged 11 commits into from
Jul 29, 2019
Merged

feat/raft stat describer #208

merged 11 commits into from
Jul 29, 2019

Conversation

fengjiachun
Copy link
Contributor

@fengjiachun fengjiachun commented Jul 10, 2019

Motivation:

Add sig_usr2 handler, do not support windows
Show all raft nodes state info, node metric and rheakv metric

TODO:

  1. support windows signal?
  2. 文档中添加说明

Modification:

Add JRaftSignalHandler based of SPI
- NodeMetricsSignalHandler for node metrics
- NodeDescribeSignalHandler for node describe
- RheaKVMetricsSignalHandler for rheakv metric

Add Describer to describe raft state
Add MetricReporter to print metric

Result:

~ kill -s SIGUSR2 pid

example:

nodeId: <rhea_example--1/127.0.0.1:8181>
state: STATE_FOLLOWER
term: 16
conf: ConfigurationEntry [id=LogId [index=59, term=16], conf=127.0.0.1:8181,127.0.0.1:8182,127.0.0.1:8183, oldConf=]
electionTimer: 
  RepeatedTimer [timerTask=com.alipay.sofa.jraft.util.RepeatedTimer$1@519d2775, stopped=false, running=true, destroyed=false, invoking=false, timeoutMs=1000]
voteTimer: 
  RepeatedTimer [timerTask=null, stopped=true, running=false, destroyed=false, invoking=false, timeoutMs=1000]
stepDownTimer: 
  RepeatedTimer [timerTask=null, stopped=true, running=false, destroyed=false, invoking=false, timeoutMs=500]
snapshotTimer: 
  RepeatedTimer [timerTask=com.alipay.sofa.jraft.util.RepeatedTimer$1@3a3b5443, stopped=false, running=true, destroyed=false, invoking=false, timeoutMs=3600000]
logManager: 
  storage: [1, 136]
  diskId: LogId [index=136, term=16]
  appliedId: LogId [index=136, term=16]
  lastSnapshotId: LogId [index=0, term=0]
fsmCaller: 
  StateMachine [Idle]
ballotBox: 
  lastCommittedIndex: 136
  pendingIndex: 0
  pendingMetaQueueSize: 0
snapshotExecutor: 
  lastSnapshotTerm: 0
  lastSnapshotIndex: 0
  term: 16
  savingSnapshot: false
  loadingSnapshot: false
  stopped: false
replicatorGroup: 
  replicators: []
  failureReplicators: []

Fixes #103

@fengjiachun
Copy link
Contributor Author

file: node_metrics.log.USR2.2019-07-13_15-28-15

-- <rhea_example--1/127.0.0.1:8181> 19-7-13 15:28:15 ===============================================================

-- <rhea_example--1/127.0.0.1:8181> -- Gauges ----------------------------------------------------------------------
raft-rpc-client-thread-pool.active
             value = 0
raft-rpc-client-thread-pool.completed
             value = 1
raft-rpc-client-thread-pool.pool-size
             value = 1
raft-rpc-client-thread-pool.queued
             value = 0
raft-utils-closure-thread-pool.active
             value = 0
raft-utils-closure-thread-pool.completed
             value = 1
raft-utils-closure-thread-pool.pool-size
             value = 1
raft-utils-closure-thread-pool.queued
             value = 0

-- <rhea_example--1/127.0.0.1:8181> -- Histograms ------------------------------------------------------------------
append-logs-bytes
             count = 11
               min = 0
               max = 36
              mean = 33.11
            stddev = 9.79
            median = 36.00
              75% <= 36.00
              95% <= 36.00
              98% <= 36.00
              99% <= 36.00
            99.9% <= 36.00
append-logs-count
             count = 11
               min = 1
               max = 2
              mean = 1.08
            stddev = 0.27
            median = 1.00
              75% <= 1.00
              95% <= 2.00
              98% <= 2.00
              99% <= 2.00
            99.9% <= 2.00
fsm-apply-tasks-count
             count = 12
               min = 1
               max = 10
              mean = 2.36
            stddev = 3.22
            median = 1.00
              75% <= 1.00
              95% <= 10.00
              98% <= 10.00
              99% <= 10.00
            99.9% <= 10.00
handle-append-entries-count
             count = 210
               min = 0
               max = 2
              mean = 0.05
            stddev = 0.24
            median = 0.00
              75% <= 0.00
              95% <= 1.00
              98% <= 1.00
              99% <= 1.00
            99.9% <= 2.00

-- <rhea_example--1/127.0.0.1:8181> -- Timers ----------------------------------------------------------------------
append-logs
             count = 11
         mean rate = 0.55 calls/second
     1-minute rate = 0.29 calls/second
     5-minute rate = 0.22 calls/second
    15-minute rate = 0.21 calls/second
               min = 0.00 milliseconds
               max = 5.00 milliseconds
              mean = 0.77 milliseconds
            stddev = 1.34 milliseconds
            median = 0.00 milliseconds
              75% <= 1.00 milliseconds
              95% <= 5.00 milliseconds
              98% <= 5.00 milliseconds
              99% <= 5.00 milliseconds
            99.9% <= 5.00 milliseconds
fsm-apply-tasks
             count = 12
         mean rate = 0.60 calls/second
     1-minute rate = 0.45 calls/second
     5-minute rate = 0.41 calls/second
    15-minute rate = 0.40 calls/second
               min = 0.00 milliseconds
               max = 67.00 milliseconds
              mean = 5.29 milliseconds
            stddev = 17.62 milliseconds
            median = 0.00 milliseconds
              75% <= 1.00 milliseconds
              95% <= 67.00 milliseconds
              98% <= 67.00 milliseconds
              99% <= 67.00 milliseconds
            99.9% <= 67.00 milliseconds
fsm-commit
             count = 11
         mean rate = 0.55 calls/second
     1-minute rate = 0.29 calls/second
     5-minute rate = 0.22 calls/second
    15-minute rate = 0.21 calls/second
               min = 0.00 milliseconds
               max = 70.00 milliseconds
              mean = 5.98 milliseconds
            stddev = 19.07 milliseconds
            median = 0.00 milliseconds
              75% <= 1.00 milliseconds
              95% <= 70.00 milliseconds
              98% <= 70.00 milliseconds
              99% <= 70.00 milliseconds
            99.9% <= 70.00 milliseconds
fsm-start-following
             count = 1
         mean rate = 0.05 calls/second
     1-minute rate = 0.16 calls/second
     5-minute rate = 0.19 calls/second
    15-minute rate = 0.20 calls/second
               min = 2.00 milliseconds
               max = 2.00 milliseconds
              mean = 2.00 milliseconds
            stddev = 0.00 milliseconds
            median = 2.00 milliseconds
              75% <= 2.00 milliseconds
              95% <= 2.00 milliseconds
              98% <= 2.00 milliseconds
              99% <= 2.00 milliseconds
            99.9% <= 2.00 milliseconds
handle-append-entries
             count = 210
         mean rate = 10.40 calls/second
     1-minute rate = 10.10 calls/second
     5-minute rate = 10.03 calls/second
    15-minute rate = 10.01 calls/second
               min = 0.00 milliseconds
               max = 3.00 milliseconds
              mean = 0.17 milliseconds
            stddev = 0.45 milliseconds
            median = 0.00 milliseconds
              75% <= 0.00 milliseconds
              95% <= 1.00 milliseconds
              98% <= 1.00 milliseconds
              99% <= 2.00 milliseconds
            99.9% <= 3.00 milliseconds
pre-vote
             count = 1
         mean rate = 0.05 calls/second
     1-minute rate = 0.16 calls/second
     5-minute rate = 0.19 calls/second
    15-minute rate = 0.20 calls/second
               min = 27.00 milliseconds
               max = 27.00 milliseconds
              mean = 27.00 milliseconds
            stddev = 0.00 milliseconds
            median = 27.00 milliseconds
              75% <= 27.00 milliseconds
              95% <= 27.00 milliseconds
              98% <= 27.00 milliseconds
              99% <= 27.00 milliseconds
            99.9% <= 27.00 milliseconds
save-raft-meta
             count = 3
         mean rate = 0.15 calls/second
     1-minute rate = 0.47 calls/second
     5-minute rate = 0.57 calls/second
    15-minute rate = 0.59 calls/second
               min = 0.00 milliseconds
               max = 24.00 milliseconds
              mean = 8.33 milliseconds
            stddev = 11.09 milliseconds
            median = 1.00 milliseconds
              75% <= 24.00 milliseconds
              95% <= 24.00 milliseconds
              98% <= 24.00 milliseconds
              99% <= 24.00 milliseconds
            99.9% <= 24.00 milliseconds



@fengjiachun
Copy link
Contributor Author

file: rheakv_metrics.log.USR2.2019-07-13_15-28-15

-- rheakv 19-7-13 15:28:15 ===============================================================

-- rheakv -- Histograms ------------------------------------------------------------------
rhea-st-batch-write_-1
             count = 12
               min = 1
               max = 10
              mean = 2.36
            stddev = 3.22
            median = 1.00
              75% <= 1.00
              95% <= 10.00
              98% <= 10.00
              99% <= 10.00
            99.9% <= 10.00
send_batching_get_bytes
             count = 0
               min = 0
               max = 0
              mean = 0.00
            stddev = 0.00
            median = 0.00
              75% <= 0.00
              95% <= 0.00
              98% <= 0.00
              99% <= 0.00
            99.9% <= 0.00
send_batching_get_keys
             count = 0
               min = 0
               max = 0
              mean = 0.00
            stddev = 0.00
            median = 0.00
              75% <= 0.00
              95% <= 0.00
              98% <= 0.00
              99% <= 0.00
            99.9% <= 0.00
send_batching_get_only_safe_bytes
             count = 0
               min = 0
               max = 0
              mean = 0.00
            stddev = 0.00
            median = 0.00
              75% <= 0.00
              95% <= 0.00
              98% <= 0.00
              99% <= 0.00
            99.9% <= 0.00
send_batching_get_only_safe_keys
             count = 0
               min = 0
               max = 0
              mean = 0.00
            stddev = 0.00
            median = 0.00
              75% <= 0.00
              95% <= 0.00
              98% <= 0.00
              99% <= 0.00
            99.9% <= 0.00
send_batching_put_bytes
             count = 0
               min = 0
               max = 0
              mean = 0.00
            stddev = 0.00
            median = 0.00
              75% <= 0.00
              95% <= 0.00
              98% <= 0.00
              99% <= 0.00
            99.9% <= 0.00
send_batching_put_keys
             count = 0
               min = 0
               max = 0
              mean = 0.00
            stddev = 0.00
            median = 0.00
              75% <= 0.00
              95% <= 0.00
              98% <= 0.00
              99% <= 0.00
            99.9% <= 0.00

-- rheakv -- Meters ----------------------------------------------------------------------
rhea-st-apply-qps_-1
             count = 30
         mean rate = 1.09 events/second
     1-minute rate = 0.40 events/second
     5-minute rate = 0.10 events/second
    15-minute rate = 0.03 events/second
rhea-st-apply-qps_-1_PUT
             count = 30
         mean rate = 1.50 events/second
     1-minute rate = 3.25 events/second
     5-minute rate = 3.84 events/second
    15-minute rate = 3.94 events/second

-- rheakv -- Timers ----------------------------------------------------------------------
rhea-db-timer_BATCH_PUT
             count = 2
         mean rate = 0.10 calls/second
     1-minute rate = 0.31 calls/second
     5-minute rate = 0.38 calls/second
    15-minute rate = 0.39 calls/second
               min = 0.06 milliseconds
               max = 2.12 milliseconds
              mean = 1.09 milliseconds
            stddev = 1.03 milliseconds
            median = 2.12 milliseconds
              75% <= 2.12 milliseconds
              95% <= 2.12 milliseconds
              98% <= 2.12 milliseconds
              99% <= 2.12 milliseconds
            99.9% <= 2.12 milliseconds
rhea-db-timer_PUT
             count = 10
         mean rate = 0.87 calls/second
     1-minute rate = 1.84 calls/second
     5-minute rate = 1.97 calls/second
    15-minute rate = 1.99 calls/second
               min = 0.01 milliseconds
               max = 0.58 milliseconds
              mean = 0.09 milliseconds
            stddev = 0.17 milliseconds
            median = 0.03 milliseconds
              75% <= 0.04 milliseconds
              95% <= 0.58 milliseconds
              98% <= 0.58 milliseconds
              99% <= 0.58 milliseconds
            99.9% <= 0.58 milliseconds
rhea-rpc-request-timer_-1
             count = 0
         mean rate = 0.00 calls/second
     1-minute rate = 0.00 calls/second
     5-minute rate = 0.00 calls/second
    15-minute rate = 0.00 calls/second
               min = 0.00 milliseconds
               max = 0.00 milliseconds
              mean = 0.00 milliseconds
            stddev = 0.00 milliseconds
            median = 0.00 milliseconds
              75% <= 0.00 milliseconds
              95% <= 0.00 milliseconds
              98% <= 0.00 milliseconds
              99% <= 0.00 milliseconds
            99.9% <= 0.00 milliseconds



@fengjiachun fengjiachun requested a review from masaimu July 17, 2019 06:43
final NodeMetrics nodeMetrics = node.getNodeMetrics();
final MetricRegistry registry = nodeMetrics.getMetricRegistry();
if (registry == null) {
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议加个 warn log,当没启用统计的时候

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}
} catch (final Throwable t) {
LOG.warn("Fail to add signal.", t);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error,不应该是 warn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

package com.alipay.sofa.jraft.util;

/**
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加个类说明

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

public void handle(final sun.misc.Signal signal) {
try {
if (!this.target.equals(signal)) {
LOG.info("Unexpected signal: {}.", signal);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这行日志似乎不需要,还容易带来误解

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done,是应该去掉,会带来误解

@killme2008 killme2008 merged commit cd66ebd into master Jul 29, 2019
@killme2008 killme2008 deleted the feat/raft_stat_view branch July 29, 2019 07:43
@fengjiachun fengjiachun mentioned this pull request Aug 15, 2019
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Supports get_raft_stat service?
4 participants