Skip to content
IO, resource contention notes, docs and tools
Shell Dockerfile
Branch: master
Clone or download
Latest commit bbf1cda Feb 12, 2020
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
content what Feb 12, 2020
docs make the language correctly scary Feb 11, 2020
images ok go Feb 12, 2020
tools stash Feb 12, 2020
.gitignore cleanup Feb 12, 2020
.gitmodules purge Feb 12, 2020
README.md Update README.md Feb 10, 2020
_config.yml config Feb 12, 2020

README.md

kubernaughty

This is a collection of documentation, how-tos, tools and other information on debugging and identifying Kubernetes/container workload failures, performance and reliability considerations.

Initially this investigation started as user-reported failures at the DNS, networking and application levels, however through the analysis the actual causes for these failures we due to severe resource saturation & contention, IO throttling, kernel panics, etc. For an overview, see Part 1: Summary.

Through the investigation, I've discovered a lack of operational / systems knowledge, tracking and general awareness of the worker nodes / linux hosts that comprise kubernetes clusters (including filesystem incompatibility).

There are many gotchas, mud pits and blind spots running distributed systems, and kubernetes is no different. My goal with this is to step through the past 20 years of my career (eg, showing everyone my mistakes and learnings from the past).

Hopefully, this stuff helps you and your team.

This is an ongoing project / labor of love. It is not complete by any means

Roadmap

Contents:

Screencasts

Kubernaughty 1: IO saturation and throttling

You can’t perform that action at this time.