You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In addition to the DTrace tracepoints (hibari GH18), introduce a brick_metrics module, which is a folsom based metrics system to provide statistics in production.
This will replace the DB operation counters in brick servers and add more statistical information such as 95 percentile and standard deviation of latencies in subsystems.
For example, this is a log message from current Hibair 0.3-dev:
exdec was used for the sampling method in above metrics, which exponentially decays less significant readings over time. They only keep recent 1028 readings to minimize performance impact. Note that these sampling methods and number of readings are configurable.
From the log, you can tell:
all (recent) reads were done from the filesystem cache, none from disk (as 95 percentile is less than 1 ms)
the disk drive (single, 2.5 inch, PC-grade) is overloaded by WAL sync (group commit)
logging wait takes twice as long as wal sync. I am rewriting the old WAL module (gmt_hlog) from scratch to improve this area.
brick_metrics will provide a continuous performance statistics in production. Good for monitoring brick server's resource usage and operation latencies.
DTrace tracepoints will be used to drill down performance issues in production. e.g. draw latency histogram in a subsystem.
The text was updated successfully, but these errors were encountered:
In addition to the DTrace tracepoints (hibari GH18), introduce a brick_metrics module, which is a folsom based metrics system to provide statistics in production.
This will replace the DB operation counters in brick servers and add more statistical information such as 95 percentile and standard deviation of latencies in subsystems.
For example, this is a log message from current Hibair 0.3-dev:
exdec
was used for the sampling method in above metrics, which exponentially decays less significant readings over time. They only keep recent 1028 readings to minimize performance impact. Note that these sampling methods and number of readings are configurable.From the log, you can tell:
An early work is done in this commit: hibari/gdss-brick@2e52fc5fc5a64
Notes about the metrics system and DTrace
brick_metrics will provide a continuous performance statistics in production. Good for monitoring brick server's resource usage and operation latencies.
DTrace tracepoints will be used to drill down performance issues in production. e.g. draw latency histogram in a subsystem.
The text was updated successfully, but these errors were encountered: