-
Notifications
You must be signed in to change notification settings - Fork 308
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
- If the user mistakes the
override.statefulSet.spec.volumeClaimTemplatesfor amergeoperation rather than areplaceyou will have a cluster that cannot reconcile (permanently). - If the user omits
spec.resources.requests.storageit is interpreted as0by the operator - The error logged by the operator is:
shrinking persistent volumes is not supported - The error doesn't aid in debugging this configuration error; and troubleshooting isn't straight forward if you only inspect the
StatefulSetandPVC-- you'd need to check the helm output and/or the Cluster CR.
Symptoms:
- The operator reconciliation loop is continuously failing (every ~15 minutes based on those logs)
- Any changes to the RabbitMQCluster CR won't be applied (operator can't reconcile)
- Scaling (adding/removing nodes) would likely fail or behave unexpectedly
- Helm upgrades might appear successful but some changes won't take effect
Fixes suggested:
- Implement validation at the CRD level to prevent incomplete
VolumeClaimTemplateoverrides - Make the documentation explicit that
overrideis areplacerather than amerge(yes, this is implied by the name, but, LLMs are gonna LLM, and devs are going to use them 🙃 ) - Added helpful error messages in the operator logs to aid in troubleshooting configuration errors.
Fixes applied:
- All of the above: Fix Issue 2023: Validate VolumeClaimTemplate overrides contain
specand provide helpful error messages when they don't #2024
Logs
{
"container": "operator",
"controller": "rabbitmqcluster",
"controllerGroup": "rabbitmq.com",
"controllerKind": "RabbitmqCluster",
"error": "shrinking persistent volumes is not supported",
"level": "error",
"msg": "Reconciler error",
"name": "rabbitmq",
"namespace": "rabbitmq-system",
"pod": "rabbitmq-cluster-operator-5f8dc96c76-855k6",
"reconcileID": "aaa60dae-fb09-4ea9-a10a-9924c4e7da15",
"service_name": "rabbitmq-cluster-operator",
"stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:353\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:300\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:202",
"stream": "stderr",
"ts": "2025-12-09T21:08:26Z"
}
{
"container": "operator",
"controller": "rabbitmqcluster",
"controllerGroup": "rabbitmq.com",
"controllerKind": "RabbitmqCluster",
"error": "hit an error while scaling PVC capacity: shrinking persistent volumes is not supported",
"level": "error",
"msg": "Failed to scale PVCs: shrinking persistent volumes is not supported",
"name": "rabbitmq",
"namespace": "rabbitmq-system",
"pod": "rabbitmq-cluster-operator-5f8dc96c76-855k6",
"reconcileID": "aaa60dae-fb09-4ea9-a10a-9924c4e7da15",
"service_name": "rabbitmq-cluster-operator",
"stacktrace": "github.com/rabbitmq/cluster-operator/v2/controllers.(*RabbitmqClusterReconciler).reconcilePVC\n\t/workspace/controllers/reconcile_persistence.go:21\ngithub.com/rabbitmq/cluster-operator/v2/controllers.(*RabbitmqClusterReconciler).Reconcile\n\t/workspace/controllers/rabbitmqcluster_controller.go:225\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:340\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:300\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:202",
"stream": "stderr",
"ts": "2025-12-09T21:08:26Z"
}
{
"container": "operator",
"controller": "rabbitmqcluster",
"controllerGroup": "rabbitmq.com",
"controllerKind": "RabbitmqCluster",
"error": "unsupported operation",
"level": "error",
"msg": "shrinking persistent volumes is not supported",
"name": "rabbitmq",
"namespace": "rabbitmq-system",
"pod": "rabbitmq-cluster-operator-5f8dc96c76-855k6",
"reconcileID": "aaa60dae-fb09-4ea9-a10a-9924c4e7da15",
"service_name": "rabbitmq-cluster-operator",
"stacktrace": "github.com/rabbitmq/cluster-operator/v2/internal/scaling.PersistenceScaler.Scale\n\t/workspace/internal/scaling/scaling.go:52\ngithub.com/rabbitmq/cluster-operator/v2/controllers.(*RabbitmqClusterReconciler).reconcilePVC\n\t/workspace/controllers/reconcile_persistence.go:18\ngithub.com/rabbitmq/cluster-operator/v2/controllers.(*RabbitmqClusterReconciler).Reconcile\n\t/workspace/controllers/rabbitmqcluster_controller.go:225\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:340\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:300\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:202",
"stream": "stderr",
"ts": "2025-12-09T21:08:26Z"
}Expected behavior
- Refuse invalid cluster specs at deploy time, rather than logging errors during reconciliation.
- Helpful error messages in the case of misconfiguration not caught by CRDs.
Version and environment information
- RabbitMQ: 4.1.3
- RabbitMQ Cluster Operator: 2.16.1
- Kubernetes: 1.33.5
- Cloud provider or hardware configuration: Azure AKS
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working