Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SURE-7311] GitRepo fails with error "namespace not found" when using namespaceLabels: in existing namespaces #1994

Closed
1 task done
skanakal opened this issue Dec 4, 2023 · 11 comments

Comments

@skanakal
Copy link

skanakal commented Dec 4, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Gitrepo fails with error "namespace not found" when the resources are created in an existing namespaces and in the fleet.yaml file there are some namespaceLabels set up.

Expected Behavior

fleet should apply the namespaceLabels on already existing namespaces.

Steps To Reproduce

  1. Create namespace test03 in downstream cluster

  2. Create a GitRepo using this branch:
    https://github.com/mrolmedo/fleet-examples/blob/fleetv08nscreated/enlugardelamanchadecuyonombrenoquieroacoerdarme/fleet.yaml

  3. The GitRepo fails to deploy with error:

image

Environment

- Architecture: 
- Fleet Version:fleet-103.1.0+up0.9.0
- Cluster: rancher-2.8.0-rc3
  - Provider: local-RKE2
  - Options:
  - Kubernetes Version: v1.24

Logs

~~~
root@master-0:~# kubectl get bundledeployment -n cluster-fleet-default-test-d143efe17fbc   test03-enlugardelamanchadecuyonombrenoquieroaco-7bd3a -o yaml
apiVersion: fleet.cattle.io/v1alpha1
kind: BundleDeployment
metadata:
  annotations:
    objectset.rio.cattle.io/applied: H4sIAAAAAAAA/9yTT4/TMBDFvwryuSl206RJJC5oLwgEF77A2H5uzTp2sJ2iatXvjtyy1S50V4uWE7f8Gb9585vnOzYik6ZMbLhj5H3IlG3wqbwG+Q0qJ+RltGGpKGeHpQ1vrWYDk7PXDmzxZFX44RGr7f6WDcw4ID/4txfkph2JxZuP1ut371+m5WkEG1hGyryu4N28pajhaCSvdqSh5kPwYZQRPnyfLWIgFaqN1DW9SDxNpHBvt9IwNLvMjgumIk5YvtoRKdM4scHPzi2YIwl3gvX7iGc+r/X8jOp1s38eUW5OGfGXiacL/k40jKPNbGB9KwDRi5o3XdsZVXPTqJoa6jtovRF9zTugrq9ojORpi5KlHGdcKYiYwmOAz2xxR2lXYil62sh1C5JcKrESrRSaTPmG9RpNv264XmldtvrK3Tzkdc/wEbeqKFdarGsYiI2RqnRNE1QJjAoxQuWbaE1mw91xwTQmFw4jfP5wwwaWKqxasQHv5FoYXndtzwkrI3VX6xaSS94aBcUbIVtuBJeGOiE3q57WetUMBdZ0uczX2p1cfn4wxoXyDm48V9mtDxHn58vEny6xn4KuEtQcbT4sb2eJ6JGRykbgTYgn2SnavXUoy1684ES1R0w2eDYwR6fQHgu3XNJy888ZnXW//DekzqjyfGqqbZocHYqp4/FnAAAA///TuD+77QUAAA
    objectset.rio.cattle.io/id: bundle
    objectset.rio.cattle.io/owner-gvk: fleet.cattle.io/v1alpha1, Kind=Bundle
    objectset.rio.cattle.io/owner-name: test03-enlugardelamanchadecuyonombrenoquieroaco-7bd3a
    objectset.rio.cattle.io/owner-namespace: fleet-default
  creationTimestamp: "2023-12-04T10:06:16Z"
  generation: 1
  labels:
    fleet.cattle.io/bundle-name: test03-enlugardelamanchadecuyonombrenoquieroaco-7bd3a
    fleet.cattle.io/bundle-namespace: fleet-default
    fleet.cattle.io/cluster: test
    fleet.cattle.io/cluster-namespace: fleet-default
    fleet.cattle.io/commit: 961ee191305868fc30f5c3a5a98edd719308ee33
    fleet.cattle.io/managed: "true"
    fleet.cattle.io/repo-name: test03
    objectset.rio.cattle.io/hash: b19a7b46eab0bc1216b1dafa7b4e44e59450d2dd
  name: test03-enlugardelamanchadecuyonombrenoquieroaco-7bd3a
  namespace: cluster-fleet-default-test-d143efe17fbc
  resourceVersion: "213699"
  uid: e8e45949-87d0-409c-bfa4-36b6636a2a3e
spec:
  correctDrift: {}
  deploymentID: s-e2617e08b41f038690ae2fbd83d6eb0b06fcec051b60f10bfa81b729a4d25:00db4a9cdaecffdee93c9d3f9b872c1e3a97c3679495735ccb2d4861858201ee
  options:
    correctDrift: {}
    defaultNamespace: test03
    helm: {}
    ignore: {}
    namespaceLabels:
      pod-security.kubernetes.io/enforce: privileged
      pod-security.kubernetes.io/enforce-version: latest
  stagedDeploymentID: s-e2617e08b41f038690ae2fbd83d6eb0b06fcec051b60f10bfa81b729a4d25:00db4a9cdaecffdee93c9d3f9b872c1e3a97c3679495735ccb2d4861858201ee
  stagedOptions:
    correctDrift: {}
    defaultNamespace: test03
    helm: {}
    ignore: {}
    namespaceLabels:
      pod-security.kubernetes.io/enforce: privileged
      pod-security.kubernetes.io/enforce-version: latest
status:
  conditions:
  - lastUpdateTime: "2023-12-04T10:06:16Z"
    message: namespace test03 not found
    reason: Error
    status: "False"
    type: Deployed
  - lastUpdateTime: "2023-12-04T10:06:16Z"
    status: "True"
    type: Monitored
  display:
    deployed: 'Error: namespace test03 not found'
    monitored: "True"
    state: ErrApplied
root@master-0:~# 
~~~

The problem here, the already existing namespace `test03` doesn't have a label `name=test03`, when fleet creates a namespace it creates a label called `name=namespacename`.

The workaround is to apply the label manually... 

~~~
root@server-1:~# kubectl get ns test03 -o yaml
apiVersion: v1
kind: Namespace
metadata:
  annotations:
    cattle.io/status: '{"Conditions":[{"Type":"ResourceQuotaInit","Status":"True","Message":"","LastUpdateTime":"2023-12-04T10:04:36Z"},{"Type":"InitialRolesPopulated","Status":"True","Message":"","LastUpdateTime":"2023-12-04T10:04:36Z"}]}'
    lifecycle.cattle.io/create.namespace-auth: "true"
  creationTimestamp: "2023-12-04T10:04:35Z"
  finalizers:
  - controller.cattle.io/namespace-auth
  labels:
    kubernetes.io/metadata.name: test03
  name: test03
  resourceVersion: "68335"
  uid: d01a3197-7476-4a00-8f0e-3ea86a6c08c4
spec:
  finalizers:
  - kubernetes
status:
  phase: Active
root@server-1:~# 
root@server-1:~# 
root@server-1:~# kubectl label ns test03 name=test03
namespace/test03 labeled
root@server-1:~# 
root@server-1:~# 
root@server-1:~# kubectl get ns test03 -o yaml
apiVersion: v1
kind: Namespace
metadata:
  annotations:
    cattle.io/status: '{"Conditions":[{"Type":"ResourceQuotaInit","Status":"True","Message":"","LastUpdateTime":"2023-12-04T10:04:36Z"},{"Type":"InitialRolesPopulated","Status":"True","Message":"","LastUpdateTime":"2023-12-04T10:04:36Z"}]}'
    lifecycle.cattle.io/create.namespace-auth: "true"
  creationTimestamp: "2023-12-04T10:04:35Z"
  finalizers:
  - controller.cattle.io/namespace-auth
  labels:
    kubernetes.io/metadata.name: test03
    name: test03                                                  <<<<<-------
    pod-security.kubernetes.io/enforce: privileged                <<<<<-------
    pod-security.kubernetes.io/enforce-version: latest            <<<<<-------
  name: test03
  resourceVersion: "69690"
  uid: d01a3197-7476-4a00-8f0e-3ea86a6c08c4
spec:
  finalizers:
  - kubernetes
status:
  phase: Active
root@server-1:~# 
~~~

Anything else?

SURE-7311

@skanakal
Copy link
Author

skanakal commented Dec 4, 2023

It can be workaround using kustomize and takeownership. https://github.com/skanakal/fleet-experiments/blob/SURE-7311/SURE-7311/fleet.yaml

@mrolmedo
Copy link

mrolmedo commented Dec 4, 2023

@skanakal thanks for you help and explanation.

@weyfonk
Copy link
Contributor

weyfonk commented Dec 5, 2023

Reproduced the issue using standalone Fleet 0.9.0:

$ kc get bundledeployment -A
NAMESPACE                                                 NAME                               DEPLOYED                            MONITORED   STATUS
cluster-fleet-default-cluster-58984b679db2-580e2da6f32a   fleet-agent-cluster-58984b679db2   True                                True        
cluster-fleet-default-cluster-58984b679db2-580e2da6f32a   gitrepo-1994-test-1994-1922f51b    Error: namespace test03 not found   True        
cluster-fleet-local-local-1a3d67d0a899                    fleet-agent-local                  True                                True

Interestingly, the frontend pod used in the example was still deployed:

$ kc get pods --context=k3d-downstream -n test03
NAME                        READY   STATUS    RESTARTS   AGE
frontend-79f47c87d6-5dnmn   1/1     Running   0          23s

Applied Gitrepo definition:

kind: GitRepo    
apiVersion: fleet.cattle.io/v1alpha1    
metadata:    
  name: gitrepo-1994    
  namespace: fleet-default    
    
spec:    
  repo: https://github.com/weyfonk/fleet-examples    
  branch: test-1994    
  paths:    
    - test_1994    
  targets:    
    # Match everything    
    - clusterSelector: {}    
    # Selector ignored    
    - clusterSelector: null

@manno
Copy link
Member

manno commented Dec 6, 2023

Maybe we're able to relax the fetchNamespace(..) conditions. Since we're only returning one namespace and the we know the "name" label is set by helm to the actual name of the namespace, we could just "get" the namespace instead of listing:

https://github.com/rancher/fleet/blob/release/v0.9/internal/cmd/agent/controllers/bundledeployment/controller.go#L197-L227

@raulcabello
Copy link
Contributor

We are using the label name to retrieve the namespace. The label name is added to all namespaces created by Fleet (and all namespaces created with helm). However, this label is missing for namespaces created in other ways (e.g. kubectl).

It's fixed by #2009.
As a workaround you can add the name label to fix this.

@raulcabello
Copy link
Contributor

QA Template

Solution

Use the kubernetes.io/metadata.name label for filtering, since this label is added by kubernetes to all namespaces.

Testing

Follow the steps described in the issue, and verify you don't see any error

Additional info

Needs a new fleet RC

@manno
Copy link
Member

manno commented Feb 29, 2024

This has not been backported to 0.9 yet.

Reading the function again, the comment seems strange:
// fetchNamespace gets the namespace matching the release ID. Returns an error if none is found.

So, this func is just used to return the namespace for the helm release secret?
I can’t find the helm label on any other namespace, so I think it’s only added to the --create-namespace namespace?

Should we convert the List into a Get?

@kkaempf kkaempf modified the milestones: v2.8.3, v2.8-Next1 Mar 1, 2024
@manno manno changed the title [SURE-7311] GitRepo fails with error "namespace not found" when using namespaceLabels: in existing namespaces [v0.9][SURE-7311] GitRepo fails with error "namespace not found" when using namespaceLabels: in existing namespaces Mar 1, 2024
@manno manno changed the title [v0.9][SURE-7311] GitRepo fails with error "namespace not found" when using namespaceLabels: in existing namespaces [SURE-7311] GitRepo fails with error "namespace not found" when using namespaceLabels: in existing namespaces Mar 1, 2024
@manno manno modified the milestones: v2.8-Next1, v2.9.0 Mar 1, 2024
@manno
Copy link
Member

manno commented Mar 4, 2024

The label name is added to all namespaces created by Fleet (and all namespaces created with helm). However, this label is missing for namespaces created in other ways (e.g. kubectl).

Clarification, Helm always adds a "name" label, when creating the release namespace (CreateNamespace). It's not added to any namespaces that are in the template folder.

@jhoblitt
Copy link
Contributor

jhoblitt commented Mar 8, 2024

I think I'm seeing this with 0.9.0 as well. What is strange is that it isn't 100% repeatable. I was able to add namespaceLabels to several existing bundles without triggering an error and with this current one, I've deleted and readded the label to the bundle and the error went away for half a day.

image
image

@mmartin24
Copy link
Collaborator

QA report

System info

Tested on fresh Rancher v2.9-2deb6a1e64956e35d02aeaa5e179090ebfeb0d6a-head with k3S_VERSION=v1.28.7+k3s1 in local and downstream clusters
Tested upgrade from Rancher 2.7.9 with k3S_VERSION=v1.26.10+k3s2 to v2.9-2deb6a1e64956e35d02aeaa5e179090ebfeb0d6a-head


Performed similar test as in backported version here

  1. Checked on fresh Rancher v2.9-head install that local gitrepo using defaultNamespace: nginx example (link here) is possible and no error Namespace not found appears. Tested both in local and downstream clusters. Screenshot here of success in 2.9 vs problem in 2.7.9:

2024-04-22_13-22

  1. On a second test, it seemed that the upgrade of faulty 2.7.9 version to 2.9-head the error persisted (even after Force Update) as opposed in backported to 2.8 mentioned here:
    2024-04-23_08-02

Although not sure at the moment if this is related, looking at at the Apps repositories in local clusters an SSL error appears:

error: exit status 128, detail: fatal: unable to access 'https://git.rancher.io/partner-charts/': SSL certificate problem: unable to get local issuer certificate

image

@mmartin24
Copy link
Collaborator

mmartin24 commented Apr 23, 2024

About point 2 previously mentioned, after performing the same upgrade test on GCP environment with Rancher v2.9-3a426b5ba059358f8874573c0a98d01bfda25a47-head / fleet:104.0.0+up0.10.0-rc.11, it worked well error was removed as well as SSL error mentioned earlier:

image

So probably the error mentioned was a local problem on my cluster setup.

Closing ticket as it seems working as expected.

Edit:
the reason why it worked in cloud is because flag "--set", "useBundledSystemChart=true" was passed in ci here due to this other bug detected in Rancher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

8 participants