Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operator crash loop due to nil pointer #699

Open
Lobo75 opened this issue Mar 26, 2024 · 1 comment
Open

operator crash loop due to nil pointer #699

Lobo75 opened this issue Mar 26, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@Lobo75
Copy link

Lobo75 commented Mar 26, 2024

Report

A user error in applying a cr.yaml that was missing the proxy section caused the stack trace seen below. It appears there is no check to see if the proxy section is nil or not.

More about the problem

2024-03-21T19:32:01.194Z INFO Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference {"controller": "perconapgcluster", "controllerGroup": "pgv2.percona.com", "controllerKind": "PerconaPGCluster", "PerconaPGCluster": {"name":"rxtest","namespace":"postgres-operator"}, "namespace": "postgres-operator", "name": "rxtest", "reconcileID": "0ecffd68-d97a-4d13-af64-9eafd015dd10"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1678ace]
goroutine 459 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:116 +0x1e5
panic({0x1a233e0?, 0x2ddbe70?})
/usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/percona/percona-postgresql-operator/pkg/apis/pgv2.percona.com/v2.(*PerconaPGCluster).Default(0xc000cdc380)
/go/src/github.com/percona/percona-postgresql-operator/pkg/apis/pgv2.percona.com/v2/perconapgcluster_types.go:179 +0x22e
github.com/percona/percona-postgresql-operator/percona/controller/pgcluster.(*PGClusterReconciler).Reconcile(0xc00045ef30, {0x1fcc410?, 0xc000d2b530}, {{{0xc00005ddb8?, 0x5?}, {0xc00083f6f6?, 0xc00044cd48?}}})
/go/src/github.com/percona/percona-postgresql-operator/percona/controller/pgcluster/controller.go:170 +0x1c5
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1fcf718?, {0x1fcc410?, 0xc000d2b530?}, {{{0xc00005ddb8?, 0xb?}, {0xc00083f6f6?, 0x0?}}})
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:119 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0004e4aa0, {0x1fcc448, 0xc0003a99a0}, {0x1abf5c0?, 0xc000971140?})
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:316 +0x3cc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0004e4aa0, {0x1fcc448, 0xc0003a99a0})
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266 +0x1c9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 89
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:223 +0x565

Steps to reproduce

Apply a cr.yaml missing the proxy section. Here is a simple test case to verify the problem was a incorrect yaml.

package v2_test

import (
"testing"

"github.com/stretchr/testify/assert"
"gopkg.in/yaml.v2"

v2 "github.com/percona/percona-postgresql-operator/pkg/apis/pgv2.percona.com/v2"

)

func TestPerconaPGCluster_Default(t *testing.T) {
a := assert.New(t)

cluster := new(v2.PerconaPGCluster)

err := yaml.Unmarshal(postgrescluster_empty_proxy, cluster)
a.NoError(err)

cluster.Default()

}

var postgrescluster_empty_proxy []byte = []byte(`
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: hippo
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-15.3-2
postgresVersion: 15
instances:
- name: instance1
dataVolumeClaimSpec:
accessModes:
- "ReadWriteMany"
resources:
requests:
storage: 1Gi
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.45-2
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- "ReadWriteMany"
resources:
requests:
storage: 1Gi
- name: repo2
volume:
volumeClaimSpec:
accessModes:
- "ReadWriteMany"
resources:
requests:
storage: 1Gi
proxy:

Versions

  1. Kubernetes 1.2.7
  2. Operator 2.3.1 I suspect 2.3.0 has the same issue

Anything else?

Even though this was pure user error it did cause a serious situation in that the operator went into a hard crash loop with no way I could find to break it out. The operator would not run long enough to even try to reapply the corrected yaml, a delete and restart, even an uninstall the operator (other than the crd) did not help the situation.

Thank you.

@Lobo75 Lobo75 added the bug Something isn't working label Mar 26, 2024
@spron-in
Copy link
Collaborator

spron-in commented Apr 8, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants