Skip to content

iorchard/asklepios

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Asklepios

Asklepios is an auto-healing service for stateful pods when kubernetes node is in NotReady state.

Asklepios checks the state of the node and takes the following actions when a node goes to the NotReady state so that the pods on the node can be moved to other nodes after the kickout time has elapsed.

  • Cordon the node to prevent pods from being scheduled.
  • Add a node.kubernetes.io/out-of-service=nodeshutdown:NoExecute taint so that pods stuck on the node can be quickly moved to other nodes.

When the node is healthy and ready, it will revert the actions taken during kickout after the kickin configuration time has elapsed.

  • Uncordon the node to allow pods to be scheduled.
  • Remove the node.kubernetes.io/out-of-service=nodeshutdown:NoExecute taint to allow pods to run on the node.

The configurations are as follows.

  • sleep: How often to check node health (in seconds, Default: 10 seconds)
  • kickout: Time to wait after a node becomes NotReady before running the kickout process (in seconds, Default: 60 seconds)
  • kickin: Time to wait before running the kickin process after the node becomes Ready (in seconds, Default: 60 seconds)
  • balancer: Rebalance rabbitmq/mariadb pods after node recovery (true/false, Default: false)

If you do not want check a node, add node.kubernetes.io/asklepios=skip:NoExecute taint on the node. Then, Asklepios will not check the node status.:

kubectl taint nodes NODE_NAME node.kubernetes.io/asklepios=skip:NoExecute

If you want to check a node again, remove node.kubernetes.io/asklepios=skip:NoExecute taint.:

kubectl taint nodes NODE_NAME node.kubernetes.io/asklepios=skip:NoExecute-

Build

To build a container image:

./build.sh

Deploy

To deploy asklepios in kubernetes cluster:

kubectl apply -f k8s

See if asklepios pod is running.:

kubectl get po -n kube-system -l app=asklepios
NAME                         READY   STATUS    RESTARTS   AGE
asklepios-7fd4d69f95-2rz5c   1/1     Running   0          129m

About

an auto-healing service for stateful pods

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published