A curated list of awesome tech postmortem resources, inspired by and templated on awesome-python.
Table of Contents generated with DocToc
- Awesome Tech Postmortems *
- Other lists of postmortems
The postmortems, studies, and resources curated here are those which enable learning from incidents.
PSA: Incident Analysis is completely different than the standard postmortem process that you see written about in the Google SRE book and other incident marketing materials. It is a whole field of study and practice on extracting valuable data from incidents focusing on how.
The postmortems linked generally do not meet the bar that Jones suggests for incident analysis -- because very few (no?) tech organizations publish them. However, many of the studies and resources linked do line up with the incident analysis perspective.
- Incident Report: Running Dry on Memory Without Noticing
- I sat in on a Honeycomb incident review and you won't believe what we learned next (2019-11-08, Jacob Scott)
- The Consul outage that never happened (2019-08-07, Gitlab)
- We had issues with Monzo on 29th July (2019-07-29, Monzo)
- Details of the Cloudflare outage on July 2, 2019 (2019-07-02, Cloudflare)
- What We Learned from the Recent Mandrill Outage (2019-03-26, Mailchimp)
- Postmortem: Azure DevOps Service Outages in October 2018 (2018-10-16, Azure DevOps Service)
- Incident review: API and Dashboard outage on 10 October 2017 (2017-10-10, GoCardless)
- Postmortem of database outage of January 31 (2017-01-31, GitLab)
- Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems (2018, Marisa Grayson)
- Report from the SNAFUcatchers Workshop on Coping With Complexity (2017-03-16, SNAFUcatchers)
- Counterfactual Thinking, Rules, and The Knight Capital Accident (2013-10-29, John Allspaw)
Bulk incident analysis
- What bugs cause production cloud incidents (2019-05-13, various)
- Incidents — Trends from the Trenches (2019-02-26, Subbu Allamaraju/Expedia)
How to approach postmortems
Other lists of postmortems
Your contributions are always welcome! Please take a look at the contribution guidelines first.
I will keep some pull requests open if I'm not sure whether those resources are truly
awesome tech postmortems, you could
vote for them by adding
If you have any question about this opinionated list, do not hesitate to contact me @jhscott on Twitter or open an issue on GitHub.