Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Namespace Reference #11883

Closed
coffee-time-design opened this issue Mar 13, 2023 · 19 comments
Closed

Incorrect Namespace Reference #11883

coffee-time-design opened this issue Mar 13, 2023 · 19 comments

Comments

@coffee-time-design
Copy link

Bug Report
I get the following error when navigating to the NFS dashboard page on the Ceph dashboard.
NFS-Ganesha is not configured
Remote method threw exception: Traceback (most recent call last): File "/usr/share/ceph/mgr/nfs/module.py", line 154, in cluster_ls return available_clusters(self) File "/usr/share/ceph/mgr/nfs/utils.py", line 39, in available_clusters orchestrator.raise_if_exception(completion) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 228, in raise_if_exception raise e kubernetes.client.rest.ApiException: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Audit-Id': '803e295f-e4c4-4cd9-b076-0f89f87fa312', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': 'd431fe1b-f287-4c56-a98d-85fc76dafa64', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'b00a33f8-691c-4c54-b663-8ee129f18627', 'Date': 'Tue, 07 Mar 2023 21:21:35 GMT', 'Content-Length': '360'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"cephnfses.ceph.rook.io is forbidden: User \"system:serviceaccount:rook-rook-ceph:rook-ceph-mgr\" cannot list resource \"cephnfses\" in API group \"ceph.rook.io\" in the namespace \"rook-ceph\"","reason":"Forbidden","details":{"group":"ceph.rook.io","kind":"cephnfses"},"code":403}

Expected behavior:
It should display the page correctly.

How to reproduce it (minimal and precise):
I installed the latest helm chart.
I accidentally installed to the wrong namespace rook-rook-ceph, I'm sure if I installed to rook-ceph, this wouldn't be an issue.
Seems like the name space is hard coded somewhere to be rook-ceph.

@travisn
Copy link
Member

travisn commented Mar 13, 2023

Seems like a dashboard issue expecting the nfs CR to be in the rook-ceph namespace. @jmolmo ?

@parth-gr
Copy link
Member

Is it hardcoded at ceph(mgr module) side?

@parth-gr
Copy link
Member

parth-gr commented Mar 13, 2023

@coffee-time-design can you show the describe of cephnfses
In that, it will be mentioned which namespace it is created or you have to see in which it is created by querying it in both of them

@coffee-time-design
Copy link
Author

Sorry where can I find that resource?

user@555-mgr:/home/user# kubectl get cephnfses -n rook-rook-ceph -o yaml
apiVersion: v1
items: []
kind: List
metadata:
  resourceVersion: ""

@parth-gr
Copy link
Member

Can you check in kubectl get cephnfses -n rook-ceph -o yaml

@coffee-time-design
Copy link
Author

Also:

apiVersion: v1
items: []
kind: List
metadata:
  resourceVersion: ""

As I have no such namespace.

@parth-gr
Copy link
Member

Looks ikt the cephnfses is not get created thats why you cant see the dashboard,

Can you share the operator logs

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions
Copy link

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 26, 2023
@mkhpalm
Copy link

mkhpalm commented Oct 13, 2023

I see this same issue with the dashboard module trying to call out to a non-existent rook-ceph namespace in v17.2.6

  • rook is in namespace A
  • ceph cluster is in namespace B
  • a namespace named rook-ceph does not exist
  • I have checked mgr's env and it has the correct POD_NAMESPACE (B)
  • The NFS section of the dashboard throws errors because the SA can't read cephnfses in namespace rook-ceph
  • I reviewed the role and rolemapping (common library from rook v1.11.11) and they both have the proper definitions for the namespace the cluster is in

So the RBAC seems to be set up correct despite getting a 403 from the api. I went and poked around a bit and noticed some things that are hardcoded to rook-ceph.

https://github.com/ceph/ceph/blob/v17.2.6/src/pybind/mgr/rook/rook_cluster.py#L228

(there are a few more in there)

@parth-gr
Copy link
Member

@rkachach @avanthakkar

@parth-gr
Copy link
Member

Re-opening this, as looks like still any issue, will take try to reproduce it

@parth-gr parth-gr reopened this Oct 18, 2023
@parth-gr parth-gr self-assigned this Oct 18, 2023
@github-actions github-actions bot removed the wontfix label Oct 18, 2023
@rkachach
Copy link
Contributor

rkachach commented Oct 20, 2023

I see this same issue with the dashboard module trying to call out to a non-existent rook-ceph namespace in v17.2.6

  • rook is in namespace A
  • ceph cluster is in namespace B
  • a namespace named rook-ceph does not exist
  • I have checked mgr's env and it has the correct POD_NAMESPACE (B)
  • The NFS section of the dashboard throws errors because the SA can't read cephnfses in namespace rook-ceph
  • I reviewed the role and rolemapping (common library from rook v1.11.11) and they both have the proper definitions for the namespace the cluster is in

So the RBAC seems to be set up correct despite getting a 403 from the api. I went and poked around a bit and noticed some things that are hardcoded to rook-ceph.

https://github.com/ceph/ceph/blob/v17.2.6/src/pybind/mgr/rook/rook_cluster.py#L228

(there are a few more in there)

@mkhpalm Just to understand/confirm your setup: do you mean that you are not using the namespace rook-ceph, instead you are using other namespaces for rook and for ceph-cluster?

@mkhpalm
Copy link

mkhpalm commented Oct 20, 2023

I see this same issue with the dashboard module trying to call out to a non-existent rook-ceph namespace in v17.2.6

  • rook is in namespace A
  • ceph cluster is in namespace B
  • a namespace named rook-ceph does not exist
  • I have checked mgr's env and it has the correct POD_NAMESPACE (B)
  • The NFS section of the dashboard throws errors because the SA can't read cephnfses in namespace rook-ceph
  • I reviewed the role and rolemapping (common library from rook v1.11.11) and they both have the proper definitions for the namespace the cluster is in

So the RBAC seems to be set up correct despite getting a 403 from the api. I went and poked around a bit and noticed some things that are hardcoded to rook-ceph.
https://github.com/ceph/ceph/blob/v17.2.6/src/pybind/mgr/rook/rook_cluster.py#L228
(there are a few more in there)

@mkhpalm Just to understand/confirm your setup: do you mean that you are not using the namespace rook-ceph, instead you are using other namespaces for rook and for ceph-cluster?

Correct, there is no rook-ceph namespace defined in my k8s clusters. I'm using different namespaces for the ceph cluster(s) and another/different namespace for the rook operator. But like the bug ticket description I'm getting the same API errors from the manager module trying to access k8s resources under a non-existent rook-ceph namespace.

Looking at module.py the intention here in RookEnv look like it will infer the correct namespaces.

https://github.com/ceph/ceph/blob/v17.2.6/src/pybind/mgr/rook/module.py#L51

But if you import RookCluster and use it...

https://github.com/ceph/ceph/blob/v17.2.6/src/pybind/mgr/rook/module.py#L41C27-L41C38

And all throughout that RookCluster class everything is hardcoded to namespace=rook-ceph for KubernetesResource API calls. The k8s api is going to throw the 403 errors because the service account doesn't have access to that (nonexistant) namespace. Or even if it did exist for some reason, it won't have access because thats not where the cluster is or what the RBAC should say.

https://github.com/ceph/ceph/blob/v17.2.6/src/pybind/mgr/rook/rook_cluster.py#L683

Search the link above searching for the string rook-ceph. Basically, it seems like what needs to happen is that RookCluster class needs to utilize the RookEnv class for namespaces to make k8s api requests. Or it just needs to behave the same way and use env POD_NAMESPACE for those k8s api requests.

@rkachach rkachach assigned rkachach and unassigned jmolmo Oct 22, 2023
@rkachach
Copy link
Contributor

rkachach commented Oct 23, 2023

I see this same issue with the dashboard module trying to call out to a non-existent rook-ceph namespace in v17.2.6

  • rook is in namespace A
  • ceph cluster is in namespace B
  • a namespace named rook-ceph does not exist
  • I have checked mgr's env and it has the correct POD_NAMESPACE (B)
  • The NFS section of the dashboard throws errors because the SA can't read cephnfses in namespace rook-ceph
  • I reviewed the role and rolemapping (common library from rook v1.11.11) and they both have the proper definitions for the namespace the cluster is in

So the RBAC seems to be set up correct despite getting a 403 from the api. I went and poked around a bit and noticed some things that are hardcoded to rook-ceph.
https://github.com/ceph/ceph/blob/v17.2.6/src/pybind/mgr/rook/rook_cluster.py#L228
(there are a few more in there)

@mkhpalm Just to understand/confirm your setup: do you mean that you are not using the namespace rook-ceph, instead you are using other namespaces for rook and for ceph-cluster?

Correct, there is no rook-ceph namespace defined in my k8s clusters. I'm using different namespaces for the ceph cluster(s) and another/different namespace for the rook operator. But like the bug ticket description I'm getting the same API errors from the manager module trying to access k8s resources under a non-existent rook-ceph namespace.

Looking at module.py the intention here in RookEnv look like it will infer the correct namespaces.

https://github.com/ceph/ceph/blob/v17.2.6/src/pybind/mgr/rook/module.py#L51

But if you import RookCluster and use it...

https://github.com/ceph/ceph/blob/v17.2.6/src/pybind/mgr/rook/module.py#L41C27-L41C38

And all throughout that RookCluster class everything is hardcoded to namespace=rook-ceph for KubernetesResource API calls. The k8s api is going to throw the 403 errors because the service account doesn't have access to that (nonexistant) namespace. Or even if it did exist for some reason, it won't have access because thats not where the cluster is or what the RBAC should say.

https://github.com/ceph/ceph/blob/v17.2.6/src/pybind/mgr/rook/rook_cluster.py#L683

Search the link above searching for the string rook-ceph. Basically, it seems like what needs to happen is that RookCluster class needs to utilize the RookEnv class for namespaces to make k8s api requests. Or it just needs to behave the same way and use env POD_NAMESPACE for those k8s api requests.

@mkhpalm I think your analysis is correct as there are a lot of places where the namespace rook-ceph is hardcoded in 17.2.6. Fortunately most of these hard-coded references have been removed in recent versions. I reproduced your same environment (by creating two different namespaces for cluster and for the operator) and was able to create and navigate NFS view successfully without experimenting the mentioned issues by using quay.io/ceph/ceph:v18 (note: to use this version you have to apply manually ceph/ceph#53910)

I'll let this issue open, in case you have a test setup it'd be awesome if you can test that image: quay.io/ceph/ceph:v18 solves your issues or not. As I said in my testing it works correctly.

@rkachach
Copy link
Contributor

rkachach commented Oct 23, 2023

The PR ceph/ceph#54151 should fix this issue.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

Copy link

github-actions bot commented Jan 4, 2024

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 4, 2024
@rkachach
Copy link
Contributor

rkachach commented Jan 8, 2024

This issue should fixed on ceph v18.2.1 (which comes as default with rook v1.13.1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants