Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User and Permission stuck in Terminating #324

Closed
LucasBoisserie opened this issue Mar 1, 2022 · 3 comments
Closed

User and Permission stuck in Terminating #324

LucasBoisserie opened this issue Mar 1, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@LucasBoisserie
Copy link
Contributor

Describe the bug

We use ArgoCD to deploy our code in Kubernetes. We create/delete multiple review-apps each hour, they include: exchanges, queues, permissions and users.
We notice that sometimes User and Permission are stuck in Terminating phase.

In the operator log, we found the following error message: failed to retrieve user credentials secret from status

To Reproduce

Steps to reproduce the behavior:

  1. Create User
  2. Create Permission with userReference
  3. Remove User
  4. Remove Permission (stuck in Terminating phase)

We notice sometimes than step 3 stay stuck also in Terminating phase.

---
# User.yaml
apiVersion: rabbitmq.com/v1beta1
kind: User
metadata:
  finalizers:
  - deletion.finalizers.users.rabbitmq.com
  labels:
    argocd.argoproj.io/instance: sga-qualif-rec
  name: user-sga-hub-rec
  rabbitmqClusterReference:
    name: rabbitmq-qualif
    namespace: rabbitmq-qualif
---
#Permission.yaml
apiVersion: rabbitmq.com/v1beta1
kind: Permission
metadata:
  finalizers:
  - deletion.finalizers.permissions.rabbitmq.com
  labels:
    argocd.argoproj.io/instance: sga-qualif-rec
  name: permission-hub-sga-rec
spec:
  permissions:
    configure: .*
    read: .*
    write: .*
  rabbitmqClusterReference:
    name: rabbitmq-qualif
    namespace: rabbitmq-qualif
  userReference:
    name: user-sga-hub-rec
  vhost: vhost-hub-rec

Expected behavior
All components are removed

Version and environment information

  • Messaging Topology Operator: [1.2.1-scratch-r0] (deployed with bitnami chart)
  • RabbitMQ: [3.8.23-debian-10-r35]
  • RabbitMQ Cluster Operator: [1.11.0-scratch-r0] (deployed with bitnami chart)
  • Kubernetes: [e.g. 1.22.7]
  • Cloud provider : Scaleway Kapsule

Additional context

After some investigations into the logs and the source code of the operator, we find two issues:

Permission with userReference stuck in Terminating if User is removed first

If the User is removed before the Permission, the Permission can’t be removed because the operator can’t found rabbitmq username stored inside User secret credentials.(https://github.com/rabbitmq/messaging-topology-operator/blob/main/controllers/permission_controller.go#L63)
It throws an error and retriggers the object.

As a work around, we added a sync-wave to force ArgoCD to remove Permission before User. Although,if someone manually delete a User (before it’s Permission) we will have the problem again.

We can make PR to fix the test inside the permission_controller.

  • with adding owner reference from User on permission object
  • and with the condition: when the User doesn’t exist and Permission is in deletion status, we skip the permission removal.

User deletion need user credentials secret

We deploy/destroy multiple review-apps each hour and we noticed that sometimes User deletion is triggered as expected but stuck on Terminating.

We looked at user credentials secret and we found that ownerReference.blockOwnerDeletion is set to false (https://github.com/rabbitmq/messaging-topology-operator/blob/main/controllers/user_controller.go#L169).

ArgoCD uses foreground deletion by default and sometimes User credentials secret is removed before User. The User CR doesn’t have rabbitmq username because it’s generated inside the secret.

The issue result is the same #277, where the secret is removed before the User

To fix that we can add it to same PR.
We proposed to set in the User status part a field with the real username to not depend on the secret

Did you have better recommendations ?

@Zerpet
Copy link
Contributor

Zerpet commented Mar 2, 2022

Hey there! Thank you for submitting this issue and PR #325. To add a bit more context: the controller does not block owner deletion since e09d8df because we found incompatibilities with Openshift when it was set to True. We tried numerous RBAC rules to make it work, with no success, and finally resolved to not block owner deletion.

I like the approach proposed 👍 I'll have a look at the PR and leave some later today.

@alex1989hu
Copy link

hey @LucasBoisserie I also met this issue, I thought I made a mistake. Thanks for the PR 👍

@Zerpet
Copy link
Contributor

Zerpet commented Mar 23, 2022

Fixed by PR #325

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants