Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dirty data of helm release causes cluster-agent to crash #35971

Closed
niusmallnan opened this issue Dec 27, 2021 · 12 comments
Closed

Dirty data of helm release causes cluster-agent to crash #35971

niusmallnan opened this issue Dec 27, 2021 · 12 comments
Assignees
Labels
area/agent Issues that deal with the Rancher Agent kind/bug Issues that are defects reported by users or that we know have reached a real release priority/1 team/mapps
Milestone

Comments

@niusmallnan
Copy link
Contributor

niusmallnan commented Dec 27, 2021

Rancher Server Setup

  • Rancher version: v2.5.11
  • Installation option (Docker install/Helm Chart): Docker install
    • If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc):
  • Proxy/Cert Details:

Information about the Cluster

  • Kubernetes version: default k3s in rancher-server
  • Cluster Type (Local/Downstream): Downstream
    • If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): Imported

Describe the bug

This is not a fresh install. The rancher-server has undergone multiple version upgrades.
The cluster-agent cannot be started, check the logs as follows:

panic: runtime error: slice bounds out of range [:3] with capacity 0

image

To Reproduce

There is no specific reproduction step, but it can be confirmed from the log that it is caused by the dirty data of the helm2 release.

Result
Because the cluster-agent is not available, Rancher cannot be used.

Additional context
We checked the code and used the following script to troubleshoot dirty data:

#!/bin/bash

helm2_releases=$(kubectl get configmaps -A -l OWNER=TILLER -o=jsonpath='{range .items[*]}{.metadata.namespace}{","}{.metadata.name}{"\n"}{end}')

for release in $helm2_releases; do
        ns=$(echo $release | cut -f1 -d,)
        name=$(echo $release | cut -f2 -d,)
        kubectl get cm -n $ns $name -o jsonpath='{.data.release}' | base64 -d | gunzip  > /dev/null
        if [[ $? != "0" ]]; then
                echo "Got a dirty data: $ns--$name"
        fi
done

# thanks to [syndr](https://github.com/syndr)
helm3_releases=$(kubectl get secrets -A --field-selector type=helm.sh/release.v1 -o=jsonpath='{range .items[*]}{.metadata.namespace}{","}{.metadata.name}{"\n"}{end}')

for release in $helm3_releases; do
        ns=$(echo $release | cut -f1 -d,)
        name=$(echo $release | cut -f2 -d,)
        kubectl get secret -n $ns $name -o jsonpath='{.data.release}' | base64 -d | base64 -d | gunzip  > /dev/null
        if [[ $? != "0" ]]; then
                echo "Got a dirty data: $ns--$name"
        fi
done
@niusmallnan niusmallnan added kind/bug Issues that are defects reported by users or that we know have reached a real release area/agent Issues that deal with the Rancher Agent labels Dec 27, 2021
@samjustus samjustus added this to the v2.6.x milestone Jan 11, 2022
@syndr
Copy link

syndr commented Jan 27, 2022

Ran into the same issue on rancher v2.6.3. Modified the above script for helm3 charts, and was able to find the bad release data.

#!/bin/bash

helm3_releases=$(kubectl get secrets -A --field-selector type=helm.sh/release.v1 -o=jsonpath='{range .items[*]}{.metadata.namespace}{","}{.metadata.name}{"\n"}{end}')

for release in $helm3_releases; do
        ns=$(echo $release | cut -f1 -d,)
        name=$(echo $release | cut -f2 -d,)
        kubectl get secret -n $ns $name -o jsonpath='{.data.release}' | base64 -d | base64 -d | gunzip  > /dev/null
        if [[ $? != "0" ]]; then
                echo "Got a dirty data: $ns--$name"
        fi
done

Deleting the secret it returns allowed the cattle-cluster-agent pods to start without crashing.

@oxr463
Copy link

oxr463 commented Jun 7, 2022

I ran into this issue on Rancher v2.5.12... upgraded to v2.6.3, v2.6.4, and then v2.6.5... I didn't understand why the downstream cattle-cluster-agent kept crashing... Until I ran the Helm v2 script above and found a cp-schema-registry.v2 ConfigMap with some Tiller metadata. Once I deleted it, the cattle-cluster-agent was fine.

@Himansh-ilink
Copy link

Himansh-ilink commented Aug 5, 2022

Hello,
We are also facing this issue with one of our Downstream cluster.
Following are details:
Rancher Version = 2.6.3
Local cluster = k3s version 1.23.9
Downstream cluster K8s Version = 1.21.6

As we are trying to update rancher from Version 2.6.3 to 2.6.6.

We are getting this error in live cluster cattle-cluster-agent pod.

starting /v1, Kind=Secret controller"
E0804 09:55:45.612622 39 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 4699 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x37e9880, 0x6ea0100})
/go/pkg/mod/k8s.io/apimachinery@v0.23.3/pkg/util/runtime/runtime.go:74 +0x7d
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0099ef5f0})
/go/pkg/mod/k8s.io/apimachinery@v0.23.3/pkg/util/runtime/runtime.go:48 +0x75
panic({0x37e9880, 0x6ea0100})
/usr/lib64/go/1.17/src/runtime/panic.go:1038 +0x215
github.com/rancher/dynamiclistener/factory.(*TLS).Merge(0x34, 0x0, 0xc00c117c80)
/go/pkg/mod/github.com/rancher/dynamiclistener@v0.3.1-0.20210616080009-9865ae859c7f/factory/gen.go:79 +0x70
github.com/rancher/dynamiclistener/storage/kubernetes.(*storage).saveInK8s(0xc0001f2c40, 0xc00c117c80)
/go/pkg/mod/github.com/rancher/dynamiclistener@v0.3.1-0.20210616080009-9865ae859c7f/storage/kubernetes/controller.go:134 +0x96
github.com/rancher/dynamiclistener/storage/kubernetes.(*storage).Update(0xc0001f2c40, 0xc009943b70)
/go/pkg/mod/github.com/rancher/dynamiclistener@v0.3.1-0.20210616080009-9865ae859c7f/storage/kubernetes/controller.go:169 +0xb3
github.com/rancher/dynamiclistener/storage/kubernetes.(*storage).init.func1({0x0, 0x0}, 0xc00c117c80)
/go/pkg/mod/github.com/rancher/dynamiclistener@v0.3.1-0.20210616080009-9865ae859c7f/storage/kubernetes/controller.go:83 +0xa5
github.com/rancher/wrangler/pkg/generated/controllers/core/v1.FromSecretHandlerToHandler.func1({0xc00bff0360, 0x1e14d34f513eba}, {0x47b3720, 0xc00c117c80})
/go/pkg/mod/github.com/rancher/wrangler@v0.8.11-0.20220411195911-c2b951ab3480/pkg/generated/controllers/core/v1/secret.go:102 +0x44
github.com/rancher/lasso/pkg/controller.SharedControllerHandlerFunc.OnChange(0xc0099ef5e0, {0xc00bff0360, 0xc010633080}, {0x47b3720, 0xc00c117c80})
/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220412224715-5f3517291ad4/pkg/controller/sharedcontroller.go:29 +0x38
github.com/rancher/lasso/pkg/controller.(*SharedHandler).OnChange(0xc000cc9ae0, {0xc00bff0360, 0x1a}, {0x47b3720, 0xc00c117c80})
/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220412224715-5f3517291ad4/pkg/controller/sharedhandler.go:75 +0x23f
github.com/rancher/lasso/pkg/controller.(*controller).syncHandler(0xc000d749a0, {0xc00bff0360, 0x1a})
/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220412224715-5f3517291ad4/pkg/controller/controller.go:220 +0x93
github.com/rancher/lasso/pkg/controller.(*controller).processSingleItem(0xc000d749a0, {0x35b2e60, 0xc0099ef5f0})
/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220412224715-5f3517291ad4/pkg/controller/controller.go:201 +0x10e
github.com/rancher/lasso/pkg/controller.(*controller).processNextWorkItem(0xc000d749a0)
/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220412224715-5f3517291ad4/pkg/controller/controller.go:178 +0x46
github.com/rancher/lasso/pkg/controller.(*controller).runWorker(...)
/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220412224715-5f3517291ad4/pkg/controller/controller.go:167
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7fbb1a657fe8)
/go/pkg/mod/k8s.io/apimachinery@v0.23.3/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0018db500, {0x4781a40, 0xc00f15a270}, 0x1, 0xc000cf5f80)
/go/pkg/mod/k8s.io/apimachinery@v0.23.3/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x3b9aca00, 0x0, 0x80, 0x440c45)
/go/pkg/mod/k8s.io/apimachinery@v0.23.3/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0xc00978e758, 0xc00978e778, 0xc00978e768)
/go/pkg/mod/k8s.io/apimachinery@v0.23.3/pkg/util/wait/wait.go:90 +0x25
created by github.com/rancher/lasso/pkg/controller.(*controller).run
/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220412224715-5f3517291ad4/pkg/controller/controller.go:135 +0x2c6
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xb8 pc=0x2168f10]

goroutine 4699 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0099ef5f0})
/go/pkg/mod/k8s.io/apimachinery@v0.23.3/pkg/util/runtime/runtime.go:55 +0xd8
panic({0x37e9880, 0x6ea0100})
/usr/lib64/go/1.17/src/runtime/panic.go:1038 +0x215
github.com/rancher/dynamiclistener/factory.(*TLS).Merge(0x34, 0x0, 0xc00c117c80)
/go/pkg/mod/github.com/rancher/dynamiclistener@v0.3.1-0.20210616080009-9865ae859c7f/factory/gen.go:79 +0x70
github.com/rancher/dynamiclistener/storage/kubernetes.(*storage).saveInK8s(0xc0001f2c40, 0xc00c117c80)
/go/pkg/mod/github.com/rancher/dynamiclistener@v0.3.1-0.20210616080009-9865ae859c7f/storage/kubernetes/controller.go:134 +0x96
github.com/rancher/dynamiclistener/storage/kubernetes.(*storage).Update(0xc0001f2c40, 0xc009943b70)
/go/pkg/mod/github.com/rancher/dynamiclistener@v0.3.1-0.20210616080009-9865ae859c7f/storage/kubernetes/controller.go:169 +0xb3
github.com/rancher/dynamiclistener/storage/kubernetes.(*storage).init.func1({0x0, 0x0}, 0xc00c117c80)
/go/pkg/mod/github.com/rancher/dynamiclistener@v0.3.1-0.20210616080009-9865ae859c7f/storage/kubernetes/controller.go:83 +0xa5
github.com/rancher/wrangler/pkg/generated/controllers/core/v1.FromSecretHandlerToHandler.func1({0xc00bff0360, 0x1e14d34f513eba}, {0x47b3720, 0xc00c117c80})
/go/pkg/mod/github.com/rancher/wrangler@v0.8.11-0.20220411195911-c2b951ab3480/pkg/generated/controllers/core/v1/secret.go:102 +0x44
github.com/rancher/lasso/pkg/controller.SharedControllerHandlerFunc.OnChange(0xc0099ef5e0, {0xc00bff0360, 0xc010633080}, {0x47b3720, 0xc00c117c80})
/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220412224715-5f3517291ad4/pkg/controller/sharedcontroller.go:29 +0x38
github.com/rancher/lasso/pkg/controller.(*SharedHandler).OnChange(0xc000cc9ae0, {0xc00bff0360, 0x1a}, {0x47b3720, 0xc00c117c80})
/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220412224715-5f3517291ad4/pkg/controller/sharedhandler.go:75 +0x23f
github.com/rancher/lasso/pkg/controller.(*controller).syncHandler(0xc000d749a0, {0xc00bff0360, 0x1a})
/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220412224715-5f3517291ad4/pkg/controller/controller.go:220 +0x93
github.com/rancher/lasso/pkg/controller.(*controller).processSingleItem(0xc000d749a0, {0x35b2e60, 0xc0099ef5f0})
/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220412224715-5f3517291ad4/pkg/controller/controller.go:201 +0x10e
github.com/rancher/lasso/pkg/controller.(*controller).processNextWorkItem(0xc000d749a0)
/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220412224715-5f3517291ad4/pkg/controller/controller.go:178 +0x46
github.com/rancher/lasso/pkg/controller.(*controller).runWorker(...)
/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220412224715-5f3517291ad4/pkg/controller/controller.go:167
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7fbb1a657fe8)
/go/pkg/mod/k8s.io/apimachinery@v0.23.3/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0018db500, {0x4781a40, 0xc00f15a270}, 0x1, 0xc000cf5f80)
/go/pkg/mod/k8s.io/apimachinery@v0.23.3/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x3b9aca00, 0x0, 0x80, 0x440c45)
/go/pkg/mod/k8s.io/apimachinery@v0.23.3/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0xc00978e758, 0xc00978e778, 0xc00978e768)
/go/pkg/mod/k8s.io/apimachinery@v0.23.3/pkg/util/wait/wait.go:90 +0x25
created by github.com/rancher/lasso/pkg/controller.(*controller).run
/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220412224715-5f3517291ad4/pkg/controller/controller.go:135 +0x2c6

As we have tried above solutions and we found some empty secrets in local cluster and we deleted that empty-data secrets.
We have tried above solution in 2.6.4, 2.6.5 & 2.6.6 Rancher version.
So can someone please help in this issue?
Thank you

@Raboo
Copy link

Raboo commented Dec 22, 2022

I just hit this bug with Rancher 2.6.9.
@niusmallnan & @syndr thank you for the work-around.

@puffitos
Copy link

We also had this issue in Rancher 2.6.9 - thanks for the workarounds!

To other people using the code for helm3, please note the ┆ symbol in the bash script:

        if [[ $? != "0" ]]; thenecho "Got a dirty data: $ns--$name"
        fi

@syndr
Copy link

syndr commented Jan 12, 2023

To other people using the code for helm3, please note the ┆ symbol in the bash script:

Good catch there! I've updated my comment above, so it should be more copy-pasteable now. 😁

@jbilliau-rcd
Copy link

You are a godsend @syndr , you just saved me from having a little bit of a freakout after having the exact same issue. That solved it! Though I am curious how you get "dirty data" from a Helm release....

@voarsh2
Copy link

voarsh2 commented Jan 25, 2023

How do I run this?

In Kubelet? Gives me: The connection to the server localhost:8080 was refused - did you specify the right host or port?

  • I just docker exec into Kubelet.....

In Rancher? It's not clear - I can't run it in Rancher as it's not connected to the cluster.... so can someone explain where I run the bash script?

@Raboo
Copy link

Raboo commented Jan 25, 2023

Run it on your own computer. Change you kubectl context to one of the master nodes directly instead of via "rancher".

@eumel8
Copy link

eumel8 commented Jan 26, 2023

@voarsh2 The bash script collects the secrets and validates the base64 data automatically with kubectl via Kubernetes API. In a pre-flight check you can get all Helm Charts Secrets of the whole cluster:

$ kubectl get secrets -A | grep helm.sh/release.v1

The DATA field should contain at lease one value. Our problem occurs while one secrets contains no data (for whatever reason).

@MKlimuszka MKlimuszka modified the milestones: v2.6.x, 2023-Q2-v2.7x Feb 1, 2023
@dbravo0531
Copy link

I believe I also have the same issue in 2.6.9 downstream cluster.

E0308 20:02:20.873468 59 runtime.go:79] Observed a panic: runtime.boundsError{x:3, y:0, signed:true, code:0x2} (runtime error: slice bounds out of range [:3] with capacity 0) goroutine 2226 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x3dd5d60?, 0xc003876000}) /go/pkg/mod/k8s.io/apimachinery@v0.24.2/pkg/util/runtime/runtime.go:75 +0x99 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0002cc060?}) /go/pkg/mod/k8s.io/apimachinery@v0.24.2/pkg/util/runtime/runtime.go:49 +0x75 panic({0x3dd5d60, 0xc003876000}) /usr/lib64/go/1.19/src/runtime/panic.go:884 +0x212 github.com/rancher/rancher/pkg/catalogv2/helm.decodeHelm3({0x0?, 0x0?}) /go/src/github.com/rancher/rancher/pkg/catalogv2/helm/helm3.go:124 +0x1b1 github.com/rancher/rancher/pkg/catalogv2/helm.fromHelm3Data({0x0?, 0xc00288a690?}, 0x40cc71c?) /go/src/github.com/rancher/rancher/pkg/catalogv2/helm/helm3.go:23 +0x25 github.com/rancher/rancher/pkg/catalogv2/helm.ToRelease({0x49eda50?, 0xc004cc8980}, 0x0?) /go/src/github.com/rancher/rancher/pkg/catalogv2/helm/release.go:74 +0x3eb github.com/rancher/rancher/pkg/controllers/dashboard/helm.(*appHandler).OnSecretChange(0xc005439720, {0xc0040ba2c0, 0xf}, 0xc004cc8980) /go/src/github.com/rancher/rancher/pkg/controllers/dashboard/helm/apps.go:170 +0xa5 github.com/rancher/wrangler/pkg/generated/controllers/core/v1.FromSecretHandlerToHandler.func1({0xc0040ba2c0?, 0x0?}, {0x49eda50?, 0xc004cc8980?}) /go/pkg/mod/github.com/rancher/wrangler@v1.0.1-0.20220520195731-8eeded9bae2a/pkg/generated/controllers/core/v1/secret.go:102 +0x44 github.com/rancher/lasso/pkg/controller.SharedControllerHandlerFunc.OnChange(0x4067720?, {0xc0040ba2c0?, 0x4131658?}, {0x49eda50?, 0xc004cc8980?}) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220628160937-749b3397db38/pkg/controller/sharedcontroller.go:29 +0x38 github.com/rancher/lasso/pkg/controller.(*SharedHandler).OnChange(0xc000ca79a0, {0xc0040ba2c0, 0xf}, {0x49eda50, 0xc004cc8980}) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220628160937-749b3397db38/pkg/controller/sharedhandler.go:75 +0x23f github.com/rancher/lasso/pkg/controller.(*controller).syncHandler(0xc0008b7ce0, {0xc0040ba2c0, 0xf}) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220628160937-749b3397db38/pkg/controller/controller.go:233 +0x93 github.com/rancher/lasso/pkg/controller.(*controller).processSingleItem(0xc0008b7ce0, {0x364b220?, 0xc0002cc060?}) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220628160937-749b3397db38/pkg/controller/controller.go:214 +0x105 github.com/rancher/lasso/pkg/controller.(*controller).processNextWorkItem(0xc0008b7ce0) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220628160937-749b3397db38/pkg/controller/controller.go:191 +0x46 github.com/rancher/lasso/pkg/controller.(*controller).runWorker(0xc005363ea0?) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220628160937-749b3397db38/pkg/controller/controller.go:180 +0x25 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?) /go/pkg/mod/k8s.io/apimachinery@v0.24.2/pkg/util/wait/wait.go:155 +0x3e k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x49dde20, 0xc004428960}, 0x1, 0xc0012d62a0) /go/pkg/mod/k8s.io/apimachinery@v0.24.2/pkg/util/wait/wait.go:156 +0xb6 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?) /go/pkg/mod/k8s.io/apimachinery@v0.24.2/pkg/util/wait/wait.go:133 +0x89 k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?) /go/pkg/mod/k8s.io/apimachinery@v0.24.2/pkg/util/wait/wait.go:90 +0x25 created by github.com/rancher/lasso/pkg/controller.(*controller).run /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20220628160937-749b3397db38/pkg/controller/controller.go:148 +0x2a7 panic: runtime error: slice bounds out of range [:3] with capacity 0 [recovered] panic: runtime error: slice bounds out of range [:3] with capacity 0

@nicholasSUSE
Copy link
Contributor

To everyone who might still have this problem.
It is fixed at version 2.7.2 please update to the latest version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/agent Issues that deal with the Rancher Agent kind/bug Issues that are defects reported by users or that we know have reached a real release priority/1 team/mapps
Projects
None yet
Development

No branches or pull requests