-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dirty data of helm release causes cluster-agent to crash #35971
Comments
Ran into the same issue on rancher v2.6.3. Modified the above script for helm3 charts, and was able to find the bad release data.
Deleting the secret it returns allowed the cattle-cluster-agent pods to start without crashing. |
I ran into this issue on Rancher v2.5.12... upgraded to v2.6.3, v2.6.4, and then v2.6.5... I didn't understand why the downstream |
Hello, As we are trying to update rancher from Version 2.6.3 to 2.6.6. We are getting this error in live cluster cattle-cluster-agent pod. starting /v1, Kind=Secret controller" goroutine 4699 [running]: As we have tried above solutions and we found some empty secrets in local cluster and we deleted that empty-data secrets. |
I just hit this bug with Rancher 2.6.9. |
We also had this issue in Rancher 2.6.9 - thanks for the workarounds! To other people using the code for helm3, please note the ┆ symbol in the bash script: if [[ $? != "0" ]]; then
┆ echo "Got a dirty data: $ns--$name"
fi |
Good catch there! I've updated my comment above, so it should be more copy-pasteable now. 😁 |
You are a godsend @syndr , you just saved me from having a little bit of a freakout after having the exact same issue. That solved it! Though I am curious how you get "dirty data" from a Helm release.... |
How do I run this? In Kubelet? Gives me: The connection to the server localhost:8080 was refused - did you specify the right host or port?
In Rancher? It's not clear - I can't run it in Rancher as it's not connected to the cluster.... so can someone explain where I run the bash script? |
Run it on your own computer. Change you kubectl context to one of the master nodes directly instead of via "rancher". |
@voarsh2 The bash script collects the secrets and validates the base64 data automatically with kubectl via Kubernetes API. In a pre-flight check you can get all Helm Charts Secrets of the whole cluster: $ kubectl get secrets -A | grep helm.sh/release.v1 The DATA field should contain at lease one value. Our problem occurs while one secrets contains no data (for whatever reason). |
I believe I also have the same issue in 2.6.9 downstream cluster.
|
To everyone who might still have this problem. |
Rancher Server Setup
Information about the Cluster
Describe the bug
This is not a fresh install. The rancher-server has undergone multiple version upgrades.
The cluster-agent cannot be started, check the logs as follows:
To Reproduce
There is no specific reproduction step, but it can be confirmed from the log that it is caused by the dirty data of the helm2 release.
Result
Because the cluster-agent is not available, Rancher cannot be used.
Additional context
We checked the code and used the following script to troubleshoot dirty data:
The text was updated successfully, but these errors were encountered: