New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Cluster health check failed: cluster agent is not ready" error seen during cluster provisioning after which it proceeds to succeed. #28836
Comments
This is not unexpected behavior when first setting up a cluster. When doing a custom cluster this will just run the base agent image to get it connected into rancher. After the cluster connects rancher will determine if the current agent has all the required features/settings/etc. and if not will redeploy the agent with the desired configuration. Trace logging can be enabled and the difference between the configs can be viewed to see what rancher is changing in the config for the agent. |
@dramich This issue seems to be happening recently on master and happens quite often. This was not seen before. Reopening the issue to see why would that be the case. |
01490ab |
Closing the issue based on comments #28836 (comment) and #28836 (comment) |
@sowmyav27 @izaac please re-open
not working! also after waiting 15min Test Node docker ps:
Rancher Logs
kube-apiserver
|
I have the same kind of issue when provisionning a new custom cluster. I guess since I had the problem described here #16454 It's the same kind of issue (cluster's DNS cannot resolv my Rancher's hostname since it's not public). The added trouble being that unlike in 2.4.8 where I could edit the cattle-cluster-agent deployment to change the dnsConfig ... this time I can't ... rendering the provisionned cluster useless. |
Same issue here for the newer version 2.5.1. worked find for version 2.4.8 |
@slash1387 Did the cluster recover after the error ? Because this error mentioned in the original issue was an intermittent error (always reproducible and cluster would recover after about 5 minutes). If the error persisted, you could log a new issue to track it. |
@sowmyav27 like I said I got the exact same problem and after several hour ( like 6h) It still did not recover. So in my oppinion the bug reported by @slash1387 still exists |
@kwims The error mentioned in the original issue was an intermittent error (always reproducible and cluster would recover after about 5 minutes). If the error persisted, could you please log a new issue to track it. |
As this is a transient messgae, this is expected. |
After an offline conversation with @deniseschannon , changing it to 2.5.7 for it to be looked at |
An average user seeing this error lasting for more than a minute would not interpret this as a transient message but rather assume that the cluster provisioning has failed. |
2.5 PR: #31039 |
The bug fix is validated on the following two rancher versions:
Steps:
Results:
|
What kind of request is this (question/bug/enhancement/feature request): bug
Steps to reproduce (least amount of steps as possible):
Cluster health check failed: cluster agent is not ready
after it has finished provisioning. Nodes will be seen Active in this case. But the cluster is seen in an error state.Other details that may be helpful:
Environment information
rancher/rancher
/rancher/server
image tag or shown bottom left in the UI): master-head -c63f8fda
Cluster information
kubectl version
):The text was updated successfully, but these errors were encountered: