Add brick_metrics module - a Folsom based metrics system in production #36

tatsuya6502 · 2014-01-30T00:04:54Z

In addition to the DTrace tracepoints (hibari GH18), introduce a brick_metrics module, which is a folsom based metrics system to provide statistics in production.

This will replace the DB operation counters in brick servers and add more statistical information such as 95 percentile and standard deviation of latencies in subsystems.

For example, this is a log message from current Hibair 0.3-dev:

2014-01-30 07:35:34.420 [info] <0.672.0>@brick_metrics:process_stats:132 statistics report
    (read)  sqflash prminig  median: 0.15 ms, 95 percentile: 0.244 ms
    (write) logging wait     median: 60.627 ms, 95 percentile: 100.651 ms
    (write) wal sync         median: 38.854 ms, 95 percentile: 66.769 ms, reqs 1, 4

exdec was used for the sampling method in above metrics, which exponentially decays less significant readings over time. They only keep recent 1028 readings to minimize performance impact. Note that these sampling methods and number of readings are configurable.

From the log, you can tell:

all (recent) reads were done from the filesystem cache, none from disk (as 95 percentile is less than 1 ms)
the disk drive (single, 2.5 inch, PC-grade) is overloaded by WAL sync (group commit)
logging wait takes twice as long as wal sync. I am rewriting the old WAL module (gmt_hlog) from scratch to improve this area.

An early work is done in this commit: hibari/gdss-brick@2e52fc5fc5a64

Notes about the metrics system and DTrace

brick_metrics will provide a continuous performance statistics in production. Good for monitoring brick server's resource usage and operation latencies.

DTrace tracepoints will be used to drill down performance issues in production. e.g. draw latency histogram in a subsystem.

The text was updated successfully, but these errors were encountered:

tatsuya6502 · 2014-01-30T00:05:53Z

Set target milestone v0.3.0.

ghost assigned tatsuya6502 Jan 30, 2014

tatsuya6502 added the New feature label Mar 29, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add brick_metrics module - a Folsom based metrics system in production #36

Add brick_metrics module - a Folsom based metrics system in production #36

tatsuya6502 commented Jan 30, 2014

tatsuya6502 commented Jan 30, 2014

Add brick_metrics module - a Folsom based metrics system in production #36

Add brick_metrics module - a Folsom based metrics system in production #36

Comments

tatsuya6502 commented Jan 30, 2014

tatsuya6502 commented Jan 30, 2014