You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Large log lines are logged and CPU usage maxes out (300m) on the operator pod when a user-credential secret is deleted
Log snip from the middle of line
user credentials secret from status; user.status: {1 [{Ready False 2023-02-09 12:00:36 +0000 UTC FailedCreateOrUpdate failed to retrieve user credentials secret from status; user.status: {1 [{Ready True 2023-02-09 09:57:41 +0000 UTC SuccessfulCreateOrUpdate }] &LocalObjectReference{Name:pubsub-user-user-credentials,} N2u4URbX5TgA689t4VLRLSr3w68E8nTa}}] &LocalObjectReference{Name:pubsub-user-user-credentials,}
This part repeats at the start of the line:
2023-02-09 12:00:36 +0000 UTC FailedCreateOrUpdate failed to retrieve user credentials secret from status; user.status: {1 [{Ready False
The repetition is many thousands of times in a single line. Counting the & at the end repeat part, one particular line came up with 3190 occurances.
At the end of this particular line:
{"level":"error","ts":"2023-02-09T12:06:55Z","msg":"failed to declare user","controller":"user","controllerGroup":"rabbitmq.com","controllerKind":"User","User":{"name":"pubsub-user","namespace":"service"},"namespace":"service","name":"pubsub-user","reconcileID":"2ea98521-a285-4ec0-9b5c-010b3fbcebe9","error":"failed to retrieve user credentials secret from status; user.status: {1 [{Ready False 2023-02-09 12:00:36 +0000 UTC Fail ...
Include any YAML or manifest necessary to reproduce the problem.
Expected behavior
Not large amounts of resources and log output
It would be nice if it simply re-created/rotated the secret - but that is more of a feature request than part of this bug.
Screenshots
If applicable, add screenshots to help explain your problem.
If the amount of log output is not ideal that the team could potential address this by requeung the request less often, and change this log line level to debug rather than info.
As for rotating credentials automatically, I think that could cause unwanted effect and potentially not user friendly.
If we recreate the secret and create new username and password, that's a new user in rabbit and the previous user is not tracked by a custom resource anymore. It will be user's responsibility to manage or delete the previously created user in RMQ. If there're client apps that consume the user.rabbitmq.com secret to authenticate with RMQ, now the app will use a different user, which could cause permission issue.
Secrets handling is a side issue. The main issue here is that log lines are so large that our log system rejects them (many kilobytes), and they seem to be created by recursion going wild. Effectively the operator locks up in the described condition, not having time to do anything else.
HI @WAPeterLindsten We created a fix for this issue. This should be included in the next release of the operator. I'll close this now! Thank you to have reported the issue.
Describe the bug
Large log lines are logged and CPU usage maxes out (300m) on the operator pod when a user-credential secret is deleted
Log snip from the middle of line
This part repeats at the start of the line:
This part repeats at the end of the line:
The repetition is many thousands of times in a single line. Counting the & at the end repeat part, one particular line came up with 3190 occurances.
At the end of this particular line:
Start of the line reads:
To Reproduce
Steps to reproduce the behavior:
kubectl delete
on the secretInclude any YAML or manifest necessary to reproduce the problem.
Expected behavior
Not large amounts of resources and log output
It would be nice if it simply re-created/rotated the secret - but that is more of a feature request than part of this bug.
Screenshots
If applicable, add screenshots to help explain your problem.
Version and environment information
Additional context
kube-apiserver resource usage also increased when this was ongoing, So I'm guessing that the operator was hammering the API with requests.
The text was updated successfully, but these errors were encountered: