Incident Retrospective
Do before the meeting:
- Link to ticket, external resources
- Build timeline
- Choose facilitator
- Choose note taker
- Confirm participant attendance
- Book room/video conference
Purpose
This is a collaborative discussion of an incident and our response to it. We know that many things have to go wrong to produce an incident, and we seek to document them all so that we can fail better in the future.
Link to incident ticket
If you track this incident in another system, link it here.
Roll call (00:01)
Write down the people who are present
If critical people are missing, deputize someone to try to engage them
What do you want to get out of this Retro? (00:03)
Roadsign: We’ll spend the first 2 minutes discussing what attendees are hoping to get out of this retro.
Timeline (summary) (00:05)
Roadsign: We’ll spend the first 5 minutes talking over the timeline of the incident to make sure it represents what happened and includes the perspective of all responders. Filling in the timeline before the retrospective meeting helps the retro go more smoothly.
Incident responders:
Timeline:
What happened? (00:10)
Roadsign: What are the technical details of the failure? Have one/two responders give a brief factual overview.
Impact (00:15)
Roadsign: The next 10 minutes are for discussing the impact of this is event on our customers.
- How many customers or apps were impacted?
- What did they feel?
- And for how long?
5 Whys, Infinite Hows, something) (00:25)
Root Cause Analysis (Roadsign: The rest of the meeting is for discussing the whys of what happened. When asking why, ideas for remediation should naturally surface. Remember to consider engineering tasks as well as documentation improvements and process improvements when looking at remediation ideas.
Notes from RCA discussions here:
Gather action items generated here:
Have we seen this before (00:45)
Roadsign: Have we seen this incident or root cause before?
What we did well (00:50)
Roadsign: Let’s take a little time to discuss what went well. We need to keep track of what we’re doing right so we can keep doing it.
How did this retrospective go? (00:55)
Roadsign: Take the last two minutes to think about this retrospective. Did we cover all the territory? What should we talk about that we aren’t? Did we miss any key people or perspectives?
Followup:
Email a summary to:
Include a brief description of:
- customer impact
- underlying causes (including root cause)
- remediation plan
This is based on the retrospective template used at Heroku, written by Courtney Eckhardt, Joy Scharmen, and Charles Hooper