Explore K8s Pod restarts, failures, anomaly detections, handlers/hooks & remedial actions in Kubernetes #465

AmitKumarDas · 2017-10-03T07:30:13Z

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

Why this feature request ?

This should help us organize our thoughts into actionable tasks (issues that can be implemented & closed in couple of days or max a week's time) w.r.t OpenEBS volume's High Availability. One can expect lot of actionable tasks that will go on to form the HA implementation of OpenEBS.

What this feature request should do ?

Explore the various mechanisms of Pod & other K8s objects w.r.t this feature request
Explore the various K8s components involved w.r.t this feature request
Explore the various config options in these K8s components w.r.t this feature request
Explore if custom failure detection logic/components can fit into the existing K8s components
Explore the communication protocols used between the components w.r.t this feature request
Explore the endpoints of these components w.r.t this feature request
Build a workflow diagram that explains the life-cycle w.r.t this feature request.

AmitKumarDas · 2017-10-03T10:13:43Z

Here are the summarized details w.r.t Pod `Restarts`, `Failures`, `Anomaly Detection`, & `Remedial Actions`

Liveness Probes:

Kubelet uses liveness probes to know when to restart a Container
Kubelet kills the Container & restarts it
Kubelet calls the probe handlers implemented by Containers
Liveness probes are executed by the kubelet, so all requests are made in the kubelet network namespace.
Types of Probe Handlers:
- ExecAction, HTTPGETAction, TCPSocketAction
Results of Probe Handlers:
- Success, Failure, Unknown
If this probe not available, kubelet performs the corrective action automatically in accordance with the Pod's restartPolicy
If you want the Container to be killed & restarted if a probe fails, then specify a liveness probe, & specify a restartPolicy of Always or OnFailure

Readiness Probes:

Kubelet uses readiness probes to know when a Container is ready to start accepting traffic
Failure in readiness probe triggers the endpoints controller to remove the Pod IP address from the endpoints of all Services that match the Pod.
Note that if you just want to be able to drain requests when the Pod is deleted, you do not necessarily need a readiness probe; on deletion, the Pod automatically puts itself into an unready state regardless of whether the readiness probe exists. The Pod remains in the unready state while it waits for the Containers in the Pod to stop.
Use-Case: It can act as a signal to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
Use-Case: If you want your Container to be able to take itself down for maintenance, you can specify a readiness probe that checks a specific readiness endpoint. This endpoint is obviously not the same as liveness endpoint.

PostStart Container Hook:

This hook executes immediately after a container is created
There is no guarantee that the hook will execute before the container ENTRYPOINT
Q - Is this invoked during a container restart ?

PreStop Container Hook:

This hook is called immediately before a container is terminated.
It is blocking, meaning it is synchronous, so it must complete before the call to delete the container can be sent.
Q - Is this PreStop hook invoked during a container restart ?

Pod RestartPolicy

Pod with single container exits with a Success
- Log completion event
- restartPolicy = Always
  - Container is restarted & Pod phase stays Running
- restartPolicy = OnFailure
  - Pod phase becomes Succeeded
- restartPolicy = Never
  - Pod phase becomes Succeeded
Pod with single container exits with a Failure
- Log failure event
- restartPolicy = Always
  - Container is restarted & Pod phase stays Running
- restartPolicy = OnFailure
  - Container is restarted & Pod phase stays Running
- restartPolicy = Never
  - Pod phase becomes Failed
Pod with single container runs out of memory
- Log OOM event
- restartPolicy = Always
  - Container is restarted & Pod phase stays Running
- restartPolicy = OnFailure
  - Container is restarted & Pod phase stays Running
- restartPolicy = Never
  - Log failure event. Pod phase becomes Failed
Pod is running & a DISK dies
- Log appropriate event
- Pod phase becomes Failed
- If running under a controller, Pod is recreated elsewhere
Pod is running & its node is segmented out
- Node controller waits for timeout
- Node controller sets Pod phase to Failed
- If running under a controller, Pod is recreated elsewhere

Init Containers

They are run before the app containers are started
There can be multiple init containers per pod that are run in a pre-determined sequence
Useful w.r.t start-up related code
Useful w.r.t security settings
Useful to include utilities e.g. sed, awk, etc
Useful w.r.t setting up config files
Used in StatefulSets
Use-Case: frontend app waits for backend app & backend app waits for DB app
NOTE - Check the various resources quota given to a Pod

Involuntary Disruptions to a Pod

Hardware failure of the physical machine backing the node
Cloud Provider or Hypervisor failures makes VM disappear
Kernel panic
Node disappears from the cluster due to cluster network partition
Eviction of pod due to node being out of resources

Voluntary Disruptions to a Pod (`may be human triggered or automated via some intelligent tool`)

Deleting the Deployment
Updating Deployment's Pod template causing a restart
Directly deleting a Pod
Draining a Node for repair/upgrade/scale down
Removing a Pod from a node to permit something else to fit on that node

Disruption Budgets

Kubernetes offers features to help run highly available applications at the same time as frequent voluntary disruptions.
These set of features are called Disruption Budgets.
PodDisruptionBudget (PDB) object can be created for an app
PDB can limit the no. of pods of a replicated app that are down simultaneously from voluntary disruptions
NOTE: Tools should respect PDB by calling Eviction API instead of directly deleting Pods.
- e.g. kubectl drain
- Kubernetes-on-GCE cluster upgrade script
PDB helps in separation of Cluster Owner Role & Application Owner Role
Use-Case: A quorum-based app would like to ensure the no. of running replicas is above certain value
Use-Case: A web front end would like to ensure no. of replicas serving the load never falls below a certain percentage of total replicas
Use-Case: A highly available stateless app with 90% uptime
- use PDB with minAvailable 90%
Use-Case: A highly available single instance stateful app
- Do not use PDB & tolerate ocassional downtime
- or Set PDB with maxUnavailable=0
  - NOTE: This will block Node drain. Need to remove the PDB when Node drain is required.

Eviction Policy

Kubelet can proactively monitor for & prevent against total starvation of a compute resource
In such cases, Kubelet can pro-actively fail one or more pods in-order to reclaim the starved resources

Eviction Signals

Kubelet triggers eviction decisions based on eviction signals
Eviction Signals:
- memory.available
- nodefs.available
- nodefs.inodesFree
- imagefs.available
- imagefs.inodesFree
Deriving value of memory.available
- its value is derived from cgroups instead of free -m
- Why ? As free does not work inside a container

Eviction Threshold

kubelet will use the lesser value among the pod.Spec.TerminationGracePeriodSeconds and the max allowed grace period.
If not specified, the kubelet will kill pods immediately with no graceful termination.
The evictions can be categorized as Soft Eviction or Hard Eviction based on the grace period
Kubelet evaluates the evictions based on monitoring interval i.e. housekeeping-interval

Node Conditions & Eviction Signals

MemoryPressure as a Node Condition
- memory.available is the Eviction Signals
DiskPressure as a Node Condition
- nodefs.available, nodefs.inodesFree, imagefs.available, or imagefs.inodesFree are the Eviction Signals
Kubelet reports node status updates at a frequency specified by
- --node-status-update-frequency which defaults to 10s

Oscillating Node Conditions

What is it?
- Frequent oscillation above & below soft eviction threshold but never exceeding its associated grace period
This will cause poor scheduling decisions as a consequence
Protection against this oscillation can be done by the following flag:
- eviction-pressure-transition-period

Node Allocatable

Allocatable on a Kubernetes node is defined as the amount of compute resources that are available for pods.
The scheduler does not over-subscribe Allocatable.
CPU, memory and ephemeral-storage are supported as of now.
Read more on cgroup flags & cgroup driver for the details/implementation

Kube Reserved

kube-reserved is meant to capture resource reservation for kubernetes system daemons like:
- kubelet,
- container runtime,
- node problem detector, etc.
It is not meant to reserve resources for system daemons that are run as pods.
kube-reserved is typically a function of pod density on the nodes.
Refer to general guidelines on kube-reserved & system-reserved

References/Credits to above sections

Useful Notes

restartPolicy only refers to restarts of the Containers by the Kubelet on the same node
Failed Containers that are restarted by the kubelet are restarted with an exponential back-off delay (10s, 20s, 40s …) capped at five minutes, and is reset after ten minutes of successful execution
If a node dies or is disconnected from the rest of the cluster, Kubernetes applies a policy for setting the phase of all Pods on the lost node to Failed.
Pods with a phase of Succeeded or Failed for more than some duration (determined by the master) will expire and be automatically destroyed.
User controlled Pod delete can be set off with a grace period
- If grace period expiry < Time taken to execute PreStop hook then there is a concept of extended grace period
- Default grace for delete is 30 seconds
kubelet supports only two filesystem partitions:
- The nodefs filesystem that kubelet uses for volumes, daemon logs, etc.
- The imagefs filesystem that container runtimes uses for storing images and container writable layers.
kubelet auto-discovers these filesystems using cAdvisor.

AmitKumarDas · 2017-10-03T11:45:38Z

High Level Conclusion

Granular Tasks based on above comment

Feasibility Specific Tasks:

Implementation Specific Tasks:

Verification Specific Tasks:

AmitKumarDas · 2017-10-17T06:09:23Z

Since this issue acts as a reference point. Will update this issue as & when required even though it is marked as closed.

loren-osborn · 2020-03-25T20:22:19Z

Just some updated links for anyone arriving here late. The tickets above have new numbers:

Implementation Specific Tasks:

#519 -> Update the openebs operator(s) to place maya based deployments into tainted K8s Nodes openebs-archive/openebs-docs#558

Verification Specific Tasks:

#516 -> Create an e2e test case to verify taint feature for openebs volume Pods openebs-archive/maya#1003

AmitKumarDas added type: HighAvailability area/maya Monitoring labels Oct 3, 2017

AmitKumarDas self-assigned this Oct 3, 2017

AmitKumarDas changed the title ~~Explore the mechanisms of Pod, etc failure detection & remedy in Kubernetes~~ Explore the mechanisms of Pod failure detection & remedy in Kubernetes Oct 3, 2017

AmitKumarDas changed the title ~~Explore the mechanisms of Pod failure detection & remedy in Kubernetes~~ Explore the mechanisms of Pod failure detections, hooks & remedial actions in Kubernetes Oct 4, 2017

AmitKumarDas changed the title ~~Explore the mechanisms of Pod failure detections, hooks & remedial actions in Kubernetes~~ Explore the Pod failures, failure detections, failure handlers/hooks & remedial actions in Kubernetes Oct 4, 2017

AmitKumarDas changed the title ~~Explore the Pod failures, failure detections, failure handlers/hooks & remedial actions in Kubernetes~~ Explore the Pod restarts, failures, anomaly detections, handlers/hooks & remedial actions in Kubernetes Oct 4, 2017

AmitKumarDas changed the title ~~Explore the Pod restarts, failures, anomaly detections, handlers/hooks & remedial actions in Kubernetes~~ Explore K8s Pod restarts, failures, anomaly detections, handlers/hooks & remedial actions in Kubernetes Oct 4, 2017

AmitKumarDas added the dependency/k8s label Oct 4, 2017

vharsh added the Hacktoberfest label Oct 4, 2017

AmitKumarDas removed their assignment Oct 9, 2017

AmitKumarDas closed this as completed Oct 17, 2017

AmitKumarDas self-assigned this Oct 17, 2017

kmova added this to the 0.5 milestone Oct 18, 2017

kmova removed the dependency/k8s label Feb 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore K8s Pod restarts, failures, anomaly detections, handlers/hooks & remedial actions in Kubernetes #465

Explore K8s Pod restarts, failures, anomaly detections, handlers/hooks & remedial actions in Kubernetes #465

AmitKumarDas commented Oct 3, 2017 •

edited

Loading

AmitKumarDas commented Oct 3, 2017 •

edited

Loading

AmitKumarDas commented Oct 3, 2017 •

edited

Loading

AmitKumarDas commented Oct 17, 2017

loren-osborn commented Mar 25, 2020

Explore K8s Pod restarts, failures, anomaly detections, handlers/hooks & remedial actions in Kubernetes #465

Explore K8s Pod restarts, failures, anomaly detections, handlers/hooks & remedial actions in Kubernetes #465

Comments

AmitKumarDas commented Oct 3, 2017 • edited Loading

Is this a BUG REPORT or FEATURE REQUEST?

Why this feature request ?

What this feature request should do ?

AmitKumarDas commented Oct 3, 2017 • edited Loading

Here are the summarized details w.r.t Pod Restarts, Failures, Anomaly Detection, & Remedial Actions

Liveness Probes:

Readiness Probes:

PostStart Container Hook:

PreStop Container Hook:

Pod RestartPolicy

Init Containers

Involuntary Disruptions to a Pod

Voluntary Disruptions to a Pod (may be human triggered or automated via some intelligent tool)

Disruption Budgets

Eviction Policy

Eviction Signals

Eviction Threshold

Node Conditions & Eviction Signals

Oscillating Node Conditions

Node Allocatable

Kube Reserved

References/Credits to above sections

Useful Notes

AmitKumarDas commented Oct 3, 2017 • edited Loading

High Level Conclusion

Granular Tasks based on above comment

Feasibility Specific Tasks:

Implementation Specific Tasks:

Verification Specific Tasks:

AmitKumarDas commented Oct 17, 2017

loren-osborn commented Mar 25, 2020

AmitKumarDas commented Oct 3, 2017 •

edited

Loading

AmitKumarDas commented Oct 3, 2017 •

edited

Loading

Here are the summarized details w.r.t Pod `Restarts`, `Failures`, `Anomaly Detection`, & `Remedial Actions`

Voluntary Disruptions to a Pod (`may be human triggered or automated via some intelligent tool`)

AmitKumarDas commented Oct 3, 2017 •

edited

Loading