Skip to content
Compilation of public failure/horror stories related to Kubernetes
HTML Python Shell Dockerfile
Branch: master
Clone or download

Latest commit

AmetPreply Add article about dns issues at Preply (#51)
* Add article about dns issues

* Fix misspell
Latest commit f353d92 May 4, 2020


Type Name Latest commit message Commit time
Failed to load latest commit information.
site add github corner May 4, 2019
.gitignore Introduce MarkdownLint (#34) Aug 25, 2019
.markdownlintrc Introduce MarkdownLint (#34) Aug 25, 2019
.travis.yml Introduce MarkdownLint (#34) Aug 25, 2019 Add article about dns issues at Preply (#51) May 4, 2020

Kubernetes Failure Stories

A compiled list of links to public failure stories related to Kubernetes. Most recent publications on top.


Kubernetes is a fairly complex system with many moving parts. Its ecosystem is constantly evolving and adding even more layers (service mesh, ...) to the mix. Considering this environment, we don't hear enough real-world horror stories to learn from each other! This compilation of failure stories should make it easier for people dealing with Kubernetes operations (SRE, Ops, platform/infrastructure teams) to learn from others and reduce the unknown unknowns of running Kubernetes in production. For more information, see the blog post.


Please help the community and share a link to your failure story by opening a Pull Request! Failure stories can be anything like blog posts, conference/meetup talks, incident postmortems, tweetstorms, ...

I would also be glad to hear about your failure stories on Twitter: my handle is @try_except_


Thanks to all contributors and everybody who writes public Kubernetes postmortems! 👏

Thanks to Joe Beda for contributing his domain for this project! 👏

You can’t perform that action at this time.