Skip to content

Latest commit

 

History

History
12 lines (6 loc) · 473 Bytes

monitoring_reliability.md

File metadata and controls

12 lines (6 loc) · 473 Bytes

Monitoring Reliability

Reliability is a function of mean-time-to-failure (MTTF), and mean-time-to-recovery (MTTR).

MTTF is important in determining when a process is not reliable.

MTTR

MTTR is important in determining how quickly you can resolve an issue and limit the impact on a service.

As humans add latency to the MTTR, automated systems are useful to reduce the amount of time it takes to resolve an issue. Playbooks, or runbooks, also help reduce MTTR.