-
Notifications
You must be signed in to change notification settings - Fork 44
App centric Troubleshooting
Deployment failures at Kubernetes are cryptic and not intuitive for debugging. Users need to identify the errors from pod logs, pod events (describe pod), pod status, resources associated to pods, etc. Users have to go back and forth querying multiple resources effectively executing a flowchart in their mind to troubleshoot the error which makes it very cumbersome, time-taking and repetitive.
When an issue occurs during deployment, abstraction helps users rapidly troubleshoot. Such an abstraction would automatically run the necessary steps behind the scenes to determine the potential cause of the error in general developer terminology. HyScale provides an app-centric abstraction to solve these complexities involved.
For example, when a particular pod fails with a “CrashLoopBackOff”, the user does not get to know what went wrong exactly.For instance, the CrashLoopBackOff might be due to health check failure, start command failures or a missing entrypoint in the image , multiple restarts of the pod etc.
Another example could be a ‘Pending Pod’, what could be the problem ? In this case, HyScale will determine whether it is due to “Pending Volume that has not attached to the Pod” or “insufficient memory in the cluster” or some other reasons.
Below table shows several of complex error messages in Kubernetes and the possible causes as determined by HyScale, in a simple and understandable way.
K8S Error message | HyScale Error message |
---|---|
CrashLoopBackOff | Service observed to be crashing. Please verify the startCommands in hspec or CMD in Dockerfile |
CrashLoopBackOff | Service container exited abruptly Possible errors in ENTRYPOINT/ CMD in Dockerfile or missing ENTRYPOINT. |
CrashLoopBackOff ⇄ Running | Health checks specified for service failed 3 times in succession. |
ImagePullBackOff | Incorrect registry credentials |
ImagePullBackOff/ErrImagePull | Invalid Image name or tag provided. Recheck the image name or tag in <myimage> service spec. |
ImagePullBackOff/ErrImagePull | Missing target registry credentials for <myregistry>. |
Pending | Cannot accommodate new services as the cluster is full. Please contact your cluster administrator to add cluster capacity or deploy to a different cluster. |
Pending | Cannot provision new volumes, no storage class configured in your cluster. Please contact your cluster administrator. |
Pending | Deployment is still in progress, service is not yet ready. Try querying after sometime. |
OOMKilled | Out of memory errors. Not enough memory to run <myservice>. Increase the memory limits in service spec and try redeploying. |
Error | Service startup commands failed with exitcode 1. Possible errors in startCommands in service spec or ENTRYPOINT/CMD in Dockerfile |
The following flow chart explains the troubleshooting flow for any deployments in Kubernetes.
![](https://github.com/hyscale/hyscale/raw/master/docs/images/troubleshooting.jpg)
Concept Image Credit: learnk8s.io
The greyed out boxes indicate error conditions which occur in a plain kubernetes deployments but are prevented by deploying using HyScale because it automates the process of kubernetes yaml generation.
The conditions in pink are automated by HyScale with the troubleshooting flow.
The blue boxes are the actions that need to be performed by the user once the issue is identified.
The rest of the conditions are planned for future implementation.