Site Reliability learnings and resources, focussed on Node.js.
Automation • Simplicity • Monitoring • Microservices • Availability • Scalability • Deployment
Introduction
TODO
Automation
Anything which can, should be automated.
Simplicity
Simplicity is an important goal for SREs, as it strongly correlates with reliability: simple software breaks less often and is easier and faster to fix when it does break. Simple systems are easier to understand, easier to maintain, and easier to test. - The Site Reliability Workbook (O'Reilly)
Monitoring
Collecting, processing, aggregating, and displaying real-time quantitative data about a system, such as query counts and types, error counts and types, processing times, and server lifetimes. - Google SRE
Microservices
TODO
Availability
TODO
Scalability
TODO
Deployment
Resources
Many quotes adapted from the below, excellent SRE resources
