Skip to content

App centric Troubleshooting

Anoop P Balakuntalam edited this page Jul 23, 2020 · 1 revision

Deployment failures at Kubernetes are cryptic and not intuitive for debugging. Users need to identify the errors from pod logs, pod events (describe pod), pod status, resources associated to pods, etc. Users have to go back and forth querying multiple resources effectively executing a flowchart in their mind to troubleshoot the error which makes it very cumbersome, time-taking and repetitive.


Simplified Troubleshooting

When an issue occurs during deployment, abstraction helps users rapidly troubleshoot. Such an abstraction would automatically run the necessary steps behind the scenes to determine the potential cause of the error in general developer terminology. HyScale provides an app-centric abstraction to solve these complexities involved.

Example Scenario 1 -CrashLoopBackOff

For example, when a particular pod fails with a “CrashLoopBackOff”, the user does not get to know what went wrong exactly.For instance, the CrashLoopBackOff might be due to health check failure, start command failures or a missing entrypoint in the image , multiple restarts of the pod etc.

Example Scenario 2 -Pending Pod

Another example could be a ‘Pending Pod’, what could be the problem ? In this case, HyScale will determine whether it is due to “Pending Volume that has not attached to the Pod” or “insufficient memory in the cluster” or some other reasons.


Below table shows several of complex error messages in Kubernetes and the possible causes as determined by HyScale, in a simple and understandable way.

K8S Error message HyScale Error message
CrashLoopBackOff Service observed to be crashing. Please verify the startCommands in hspec or CMD in Dockerfile
CrashLoopBackOff Service container exited abruptly
Possible errors in ENTRYPOINT/ CMD in Dockerfile or missing ENTRYPOINT.
CrashLoopBackOff ⇄ Running Health checks specified for service failed 3 times in succession.
ImagePullBackOff Incorrect registry credentials
ImagePullBackOff/ErrImagePull Invalid Image name or tag provided. Recheck the image name or tag in <myimage> service spec.
ImagePullBackOff/ErrImagePull Missing target registry credentials for <myregistry>.
Pending Cannot accommodate new services as the cluster is full. Please contact your cluster administrator to add cluster capacity or deploy to a different cluster.
Pending Cannot provision new volumes, no storage class configured in your cluster. Please contact your cluster administrator.
Pending Deployment is still in progress, service is not yet ready. Try querying after sometime.
OOMKilled Out of memory errors. Not enough memory to run <myservice>. Increase the memory limits in service spec and try redeploying.
Error Service startup commands failed with exitcode 1. Possible errors in startCommands in service spec or ENTRYPOINT/CMD in Dockerfile

The following flow chart explains the troubleshooting flow for any deployments in Kubernetes.

Concept Image Credit: learnk8s.io

The greyed out boxes indicate error conditions which occur in a plain kubernetes deployments but are prevented by deploying using HyScale because it automates the process of kubernetes yaml generation.

The conditions in pink are automated by HyScale with the troubleshooting flow.

The blue boxes are the actions that need to be performed by the user once the issue is identified.

The rest of the conditions are planned for future implementation.