Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large logs & resource usage when operator-managed secret is deleted #551

Closed
WAPeterLindsten opened this issue Feb 9, 2023 · 3 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@WAPeterLindsten
Copy link

WAPeterLindsten commented Feb 9, 2023

Describe the bug

Large log lines are logged and CPU usage maxes out (300m) on the operator pod when a user-credential secret is deleted

Log snip from the middle of line

user credentials secret from status; user.status: {1 [{Ready False 2023-02-09 12:00:36 +0000 UTC FailedCreateOrUpdate failed to retrieve user credentials secret from status; user.status: {1 [{Ready True 2023-02-09 09:57:41 +0000 UTC SuccessfulCreateOrUpdate }] &LocalObjectReference{Name:pubsub-user-user-credentials,} N2u4URbX5TgA689t4VLRLSr3w68E8nTa}}] &LocalObjectReference{Name:pubsub-user-user-credentials,}

This part repeats at the start of the line:

2023-02-09 12:00:36 +0000 UTC FailedCreateOrUpdate failed to retrieve user credentials secret from status; user.status: {1 [{Ready False

This part repeats at the end of the line:

&LocalObjectReference{Name:pubsub-user-user-credentials,} N2u4URbX5TgA689t4VLRLSr3w68E8nTa}}]

The repetition is many thousands of times in a single line. Counting the & at the end repeat part, one particular line came up with 3190 occurances.
At the end of this particular line:

...,"stacktrace":"github.com/rabbitmq/messaging-topology-operator/controllers.(*TopologyReconciler).Reconcile\n\t/workspace/controllers/topology_controller.go:94\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.2/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.2/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.2/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.2/pkg/internal/controller/controller.go:235"}

Start of the line reads:

{"level":"error","ts":"2023-02-09T12:06:55Z","msg":"failed to declare user","controller":"user","controllerGroup":"rabbitmq.com","controllerKind":"User","User":{"name":"pubsub-user","namespace":"service"},"namespace":"service","name":"pubsub-user","reconcileID":"2ea98521-a285-4ec0-9b5c-010b3fbcebe9","error":"failed to retrieve user credentials secret from status; user.status: {1 [{Ready False 2023-02-09 12:00:36 +0000 UTC Fail ...

To Reproduce

Steps to reproduce the behavior:

  1. Create user without specifying import secret
apiVersion: rabbitmq.com/v1beta1
kind: User
metadata:
  name: pubsub-user
spec:
  rabbitmqClusterReference:
    name: rabbitmq
  1. Wait for operator to create secret with username/password
$ kubectl get secret
NAME                           TYPE                             DATA   AGE
pubsub-user-user-credentials   Opaque                           2      12m
  1. Do kubectl delete on the secret
$ kubectl delete secret/pubsub-user-user-credentials
  1. Watch operator logs & resource usage on pod

Include any YAML or manifest necessary to reproduce the problem.

Expected behavior
Not large amounts of resources and log output
It would be nice if it simply re-created/rotated the secret - but that is more of a feature request than part of this bug.

Screenshots

If applicable, add screenshots to help explain your problem.

Version and environment information

  • Messaging Topology Operator: docker.io/rabbitmqoperator/messaging-topology-operator:1.10.1
  • RabbitMQ: docker.io/library/rabbitmq:3.10.2-management
  • RabbitMQ Cluster Operator: docker.io/rabbitmqoperator/cluster-operator:2.1.0
  • Kubernetes: 1.24.6
  • Cloud provider or hardware configuration: Elastx (OpenStack)

Additional context

kube-apiserver resource usage also increased when this was ongoing, So I'm guessing that the operator was hammering the API with requests.

@WAPeterLindsten WAPeterLindsten added the bug Something isn't working label Feb 9, 2023
@DanielePalaia DanielePalaia self-assigned this Feb 27, 2023
@ChunyiLyu
Copy link
Contributor

If the amount of log output is not ideal that the team could potential address this by requeung the request less often, and change this log line level to debug rather than info.

As for rotating credentials automatically, I think that could cause unwanted effect and potentially not user friendly.

If we recreate the secret and create new username and password, that's a new user in rabbit and the previous user is not tracked by a custom resource anymore. It will be user's responsibility to manage or delete the previously created user in RMQ. If there're client apps that consume the user.rabbitmq.com secret to authenticate with RMQ, now the app will use a different user, which could cause permission issue.

@WAPeterLindsten
Copy link
Author

Secrets handling is a side issue. The main issue here is that log lines are so large that our log system rejects them (many kilobytes), and they seem to be created by recursion going wild. Effectively the operator locks up in the described condition, not having time to do anything else.

@DanielePalaia
Copy link
Contributor

HI @WAPeterLindsten We created a fix for this issue. This should be included in the next release of the operator. I'll close this now! Thank you to have reported the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants