AtlasDatabaseUser - message - unable to list: test because of unknown namespace for the cache #1515

qtranton · 2024-04-16T12:18:41Z

Have a version of operator 1.7.1 and decide to upgrade to the latest in cluster.
Create local env

k8s - by docker desktop v 1.25.4
operator v1.7.1
Add AtlasDeployment and AtlasDatabaseUser
Upgrade to v2.2.0 ( helm upgrade crd then upgrade operator )
Fix AtlasDeployment
Check logs of operator get error aka

{"level":"INFO","time":"2024-04-16T12:12:14.543Z","msg":"Status update","atlasdatabaseuser":"test/operator-upgrade-test","lastCondition":{"type":"DatabaseUserReady","status":"False","lastTransitionTime":null,"reason":"DatabaseUserStaleConnectionSecrets","message":"unable to list: test because of unknown namespace for the cache"}}

What did you expect?
After all step operator just should work as expected

What happened instead?
AtlasDatabaseUser status always in False state

Operator Information

1.7.1 -> 2.2.0

Kubernetes Cluster Information

Docker Desktop
1.25.4

Additional context
Try to figure out why AtlasDatabaseUser CRD failed.
It's created proper secrets and creates users in AtlasUI but CRD itself always in Ready - False state

status: conditions: - lastTransitionTime: "2024-04-16T12:03:17Z" status: "False" type: Ready - lastTransitionTime: "2024-04-16T11:44:08Z" status: "True" type: ResourceVersionIsValid - lastTransitionTime: "2024-04-16T11:44:08Z" status: "True" type: ValidationSucceeded - lastTransitionTime: "2024-04-16T12:03:18Z" message: 'unable to list: test because of unknown namespace for the cache' reason: DatabaseUserStaleConnectionSecrets status: "False" type: DatabaseUserReady

If possible, please include:

{"level":"DEBUG","time":"2024-04-16T12:17:12.709Z","msg":"Ensured connection Secret up-to-date","atlasdatabaseuser":"test/operator-upgrade-test","secretname":"HIDDEN"} {"level":"INFO","time":"2024-04-16T12:17:12.709Z","msg":"Status update","atlasdatabaseuser":"test/operator-upgrade-test-","lastCondition":{"type":"DatabaseUserReady","status":"False","lastTransitionTime":null,"reason":"DatabaseUserStaleConnectionSecrets","message":"unable to list: test because of unknown namespace for the cache"}}

The text was updated successfully, but these errors were encountered:

josvazg · 2024-04-17T13:31:42Z

Thanks for reporting this issue @qtranton !

Could you give us a minimum YAML sample we could use to reproduce the issue?
Does not need to be your original complete setup, just the definitions that reproduce the same failure.

qtranton · 2024-04-18T09:06:13Z

Sure, i have cleanup i guess my yaml here

apiVersion: v1
kind: Secret
metadata:
  labels:
    app: operator-upgrade
    atlas.mongodb.com/type: credentials
    env: dev
  name: operator-upgrade-test
  namespace: test
stringData:
  password: testpassword


---
# Source: app-resources/templates/mongodb_atlas.yaml
apiVersion: atlas.mongodb.com/v1
kind: AtlasBackupPolicy
metadata:
  name: operator-upgrade-test
  namespace: test
  annotations:
    mongodb.com/atlas-resource-policy: "keep"
spec:
  items: 
    - frequencyInterval: 12
      frequencyType: hourly
      retentionUnit: days
      retentionValue: 1
    - frequencyInterval: 1
      frequencyType: daily
      retentionUnit: days
      retentionValue: 7
    - frequencyInterval: 6
      frequencyType: weekly
      retentionUnit: weeks
      retentionValue: 1
    - frequencyInterval: 40
      frequencyType: monthly
      retentionUnit: months
      retentionValue: 1
---
# Source: app-resources/templates/mongodb_atlas.yaml
apiVersion: atlas.mongodb.com/v1
kind: AtlasBackupSchedule
metadata:
  name: operator-upgrade-test
  namespace: test
  annotations:
    mongodb.com/atlas-resource-policy: "keep"
spec:
  autoExportEnabled: false
  referenceHourOfDay: 21
  referenceMinuteOfHour: 2
  policy:
    name: operator-upgrade-test
    namespace: test
---
# Source: app-resources/templates/mongodb_atlas.yaml
apiVersion: atlas.mongodb.com/v1
kind: AtlasDatabaseUser
metadata:
  name: operator-upgrade-test
  labels:
    app: "operator-upgrade"
    env: dev
  #   mongodb.com/atlas-resource-policy: "keep"
spec:
  roles:
  - roleName: readWrite
    databaseName: Application
  scopes:
  - type: CLUSTER
    name: operator-upgrade-test
  projectRef:
    name: project-name
    namespace: mongodb-operator
  username: operator-upgrade-test
  databaseName: admin
  passwordSecretRef:
    name: "operator-upgrade-test"

---
# Source: app-resources/templates/mongodb_atlas.yaml
apiVersion: atlas.mongodb.com/v1
kind: AtlasDeployment
metadata:
  name: operator-upgrade-test
  namespace: test
  labels:
    app: "operator-upgrade"
    env: dev
  # annotations:
  #   mongodb.com/atlas-resource-policy: "keep"
spec:
  backupRef:
    name: operator-upgrade-test
    namespace: test
  projectRef:
    name: project-name
    namespace: mongodb-operator
  advancedDeploymentSpec:
    mongoDBMajorVersion: "6.0"
    clusterType: REPLICASET
    backupEnabled: true
    pitEnabled: false
    name: operator-upgrade-test
    replicationSpecs:
      - regionConfigs:
        - electableSpecs:
              instanceSize: M10
              nodeCount: 3
          providerName: GCP
          backingProviderName: GCP
          regionName: "EASTERN_US"
          # Priority description https://www.mongodb.com/docs/atlas/reference/atlas-operator/atlasdeployment-custom-resource/#mongodb-setting-spec.advancedDeploymentSpec.replicationSpecs.regionConfigs.priority
          priority: 7
          autoScaling:
            compute:
              enabled: false

s-urbaniak · 2024-04-19T07:42:50Z

cc @roothorp

s-urbaniak · 2024-04-26T11:26:09Z

@qtranton can you check if you happen to have the WATCH_NAMESPACE environment variable set for your operator deployment? i.e. if you could submit the output of kubectl -n <operator_namespace> get pod <operator_name> here?

s-urbaniak · 2024-04-26T11:30:15Z

i.e. it looks like the test namespace is not being listened by the operator, overriden by the WATCH_NAMESPACE env variable.

qtranton · 2024-04-29T08:18:30Z

In helm i see this

{{- if .Values.watchNamespaces }}
          - name: WATCH_NAMESPACE
            value: "{{ join "," .Values.watchNamespaces }}"
          {{- end }}

So i have check pod and

 Readiness:  http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:
      OPERATOR_POD_NAME:   mongodb-atlas-operator-5df9ff6978-tqznx (v1:metadata.name)
      OPERATOR_NAMESPACE:  mongodb-operator (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4kgq7 (to)

qtranton · 2024-04-29T08:20:33Z

In roles i also see some mention of this variable, but since it empty no additional roles was created

mongodb-operator   mongodb-atlas-operator                           
mongodb-operator   mongodb-atlas-operator-leader-election-role

Plus it works on older version so older version could read secrets i guess

qtranton · 2024-04-30T12:50:37Z

Validate secrets as well
When remove labels

atlas.mongodb.com/type: credentials

Get error like

"msg":"Status update","atlasdatabaseuser":"tester/operator-upgrade-test","lastCondition":{"type":"DatabaseUserReady","status":"False","lastTransitionTime":null,"reason":"InternalError","message":"Secret \"operator-upgrade-test\" not found"}}

Back labels in place get error

"msg":"Status update","atlasdatabaseuser":"tester/operator-upgrade-test","lastCondition":{"type":"DatabaseUserReady","status":"False","lastTransitionTime":null,"reason":"DatabaseUserStaleConnectionSecrets","message":"unable to list: tester because of unknown namespace for the cache"}}

qtranton · 2024-05-27T13:20:33Z

@josvazg @s-urbaniak hey have some time to debug issue, so on my local cluster for some reason on version 2.2.2 i do not see status.name parameters.
Just put a lot of println in local branch :D

    #############################
    operator-upgrade-test
    cleanupStaleSecrets: Failed to list connection Secrets 
    #############################

To

if user.Status.UserName != user.Spec.Username {
		// Note, that we pass the username from the status, not from the spec
		fmt.Println("#############################")
		fmt.Println(user.Status.UserName, user.Spec.Username)
		fmt.Println("cleanupStaleSecrets: Failed to list connection Secrets")
		fmt.Println("#############################")
		return RemoveStaleSecretsByUserName(ctx.Context, k8sClient, projectID, user.Status.UserName, user, ctx.Log)
	}

Here
https://github.com/mongodb/mongodb-atlas-kubernetes/blob/main/pkg/controller/connectionsecret/connectionsecrets.go#L126
Now i try figure out why i have error related to secret if user not set
Meanwhile CRD look like that :

status:
    conditions:
    - lastTransitionTime: "2024-05-27T11:30:38Z"
      status: "False"
      type: Ready
    - lastTransitionTime: "2024-05-27T11:30:38Z"
      status: "True"
      type: ResourceVersionIsValid
    - lastTransitionTime: "2024-05-27T11:30:38Z"
      status: "True"
      type: ValidationSucceeded
    - lastTransitionTime: "2024-05-27T11:30:39Z"
      message: 'unable to list: tester because of unknown namespace for the cache'
      reason: DatabaseUserStaleConnectionSecrets
      status: "False"
      type: DatabaseUserReady
    observedGeneration: 1
    passwordVersion: "3017702"

qtranton · 2024-05-27T13:28:19Z

Update: Recheck on v 1.7 and name in status appear

josvazg · 2024-05-27T17:28:02Z

Thanks for your reports. I managed to reproduce the same. I am debugging it now.

josvazg · 2024-05-27T18:08:39Z

Seems we found the issue, we are working on a fix.

In the meantime, you could pass the list of namespaces you want to get checked. ie:

helm install ... --set watchNamespaces=test,...

qtranton · 2024-05-28T09:07:49Z

@josvazg on local machine yeah, but for main cluster we have too much namespace :) i will wait, not so critical

josvazg · 2024-05-29T09:37:07Z

BTW this #1619 already fixes the issue but it includes unrelated refactors. I am working on a specific test to cover this bug which was not previously detected by our test suite.

Signed-off-by: jose.vazquez <jose.vazquez@mongodb.com>

qtranton · 2024-05-30T13:43:35Z

I will check build locally then :)

qtranton · 2024-05-30T14:01:23Z

@josvazg jfyi

{"level":"ERROR","time":"2024-05-30T14:00:05.322Z","msg":"LeaderElectionID must be configuredunable to start operator"}

Get this error now

josvazg · 2024-05-31T14:50:18Z

@josvazg jfyi

{"level":"ERROR","time":"2024-05-30T14:00:05.322Z","msg":"LeaderElectionID must be configuredunable to start operator"}

Get this error now

I do not think this is related. BTW this PR #1621 should fix the original issue.

As for this new error, do you have a sample to reproduce it?

qtranton · 2024-05-31T15:10:42Z

@josvazg just build and put docker container to helm chart 2.2.2 nothing change from in deployment

qtranton · 2024-06-03T14:03:15Z

@josvazg After few additional crd ( not in upstream yet :D ) user status becomes true. We will do some additional tests according to our infra.
Maybe you know when will it be released?

josvazg · 2024-06-03T16:14:42Z

@josvazg After few additional crd ( not in upstream yet :D ) user status becomes true. We will do some additional tests according to our infra. Maybe you know when will it be released?

We are aiming for a release soon, maybe this week. I should be merging PR #1621 tomorrow

* Add reproducing test Signed-off-by: jose.vazquez <jose.vazquez@mongodb.com> * Fix cache and predicate setup * test/e2e/cache_watch_test.go: improve e2e test * Fix gets labels and ns names Signed-off-by: jose.vazquez <jose.vazquez@mongodb.com> * Rename controller predicates helper * Move trim to env reading line --------- Signed-off-by: jose.vazquez <jose.vazquez@mongodb.com> Co-authored-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com>

josvazg added a commit that referenced this issue May 29, 2024

Add test to reproduce issue #1515

b96d68b

josvazg mentioned this issue May 29, 2024

CLOUDP-250918: Add test to reproduce issue #1515 #1621

Merged

2 tasks

josvazg added a commit that referenced this issue May 29, 2024

Add test to reproduce issue #1515

1f1928f

Signed-off-by: jose.vazquez <jose.vazquez@mongodb.com>

josvazg added a commit that referenced this issue May 29, 2024

Add test to reproduce issue #1515

c2efea2

Signed-off-by: jose.vazquez <jose.vazquez@mongodb.com>

josvazg closed this as completed in #1621 Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AtlasDatabaseUser - message - unable to list: test because of unknown namespace for the cache #1515

AtlasDatabaseUser - message - unable to list: test because of unknown namespace for the cache #1515

qtranton commented Apr 16, 2024

josvazg commented Apr 17, 2024

qtranton commented Apr 18, 2024

s-urbaniak commented Apr 19, 2024

s-urbaniak commented Apr 26, 2024 •

edited

s-urbaniak commented Apr 26, 2024

qtranton commented Apr 29, 2024

qtranton commented Apr 29, 2024 •

edited

qtranton commented Apr 30, 2024

qtranton commented May 27, 2024 •

edited

qtranton commented May 27, 2024

josvazg commented May 27, 2024

josvazg commented May 27, 2024

qtranton commented May 28, 2024

josvazg commented May 29, 2024

qtranton commented May 30, 2024

qtranton commented May 30, 2024 •

edited

josvazg commented May 31, 2024

qtranton commented May 31, 2024

qtranton commented Jun 3, 2024

josvazg commented Jun 3, 2024 •

edited

AtlasDatabaseUser - message - unable to list: test because of unknown namespace for the cache #1515

AtlasDatabaseUser - message - unable to list: test because of unknown namespace for the cache #1515

Comments

qtranton commented Apr 16, 2024

josvazg commented Apr 17, 2024

qtranton commented Apr 18, 2024

s-urbaniak commented Apr 19, 2024

s-urbaniak commented Apr 26, 2024 • edited

s-urbaniak commented Apr 26, 2024

qtranton commented Apr 29, 2024

qtranton commented Apr 29, 2024 • edited

qtranton commented Apr 30, 2024

qtranton commented May 27, 2024 • edited

qtranton commented May 27, 2024

josvazg commented May 27, 2024

josvazg commented May 27, 2024

qtranton commented May 28, 2024

josvazg commented May 29, 2024

qtranton commented May 30, 2024

qtranton commented May 30, 2024 • edited

josvazg commented May 31, 2024

qtranton commented May 31, 2024

qtranton commented Jun 3, 2024

josvazg commented Jun 3, 2024 • edited

s-urbaniak commented Apr 26, 2024 •

edited

qtranton commented Apr 29, 2024 •

edited

qtranton commented May 27, 2024 •

edited

qtranton commented May 30, 2024 •

edited

josvazg commented Jun 3, 2024 •

edited