Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr: enable rook orchestrator mgr module by default #13761

Merged
merged 2 commits into from
Mar 15, 2024

Conversation

rkachach
Copy link
Contributor

@rkachach rkachach commented Feb 13, 2024

previously, we had kept the Rook orchestrator manager module disabled so far because it was causing a bunch of issues and errors on the dashboard. But with the recent changes made in the cephv v18.2.1 release, we've fixed those issues and made some overall improvements to the dashboard user experience when Rook is enabled. With all that in mind, it's time to switch on the Rook orchestrator by default.

Fixes: #13760

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@rkachach rkachach added the do-not-merge DO NOT MERGE :) label Feb 13, 2024
@rkachach rkachach requested review from travisn and a team February 13, 2024 11:41
deploy/examples/cluster.yaml Show resolved Hide resolved
deploy/examples/cluster-test.yaml Show resolved Hide resolved
Copy link
Contributor

@subhamkrai subhamkrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought, can check in go code if dashboard is enabled then only enable mgr module, or there are other uses cases of mgr module?

IIRC in the past it was true only but we changed that to false since we have some issue probably in dashboard or somewhere. Those error may not exist today but just sharing the thoughts.

@travisn
Copy link
Member

travisn commented Feb 13, 2024

A thought, can check in go code if dashboard is enabled then only enable mgr module, or there are other uses cases of mgr module?

IIRC in the past it was true only but we changed that to false since we have some issue probably in dashboard or somewhere. Those error may not exist today but just sharing the thoughts.

Agreed, if the dashboard module is enabled, it will be good to automatically enable the rook module. Perhaps we don't even need to support enabling/disabling the rook module if it's listed in the modules section of the cluster CR. So this likely means when the configureDashboardModules() method is called to enable the dashboard, that method would call configureOrchestratorModules().

@rkachach
Copy link
Contributor Author

rkachach commented Feb 14, 2024

A thought, can check in go code if dashboard is enabled then only enable mgr module, or there are other uses cases of mgr module?
IIRC in the past it was true only but we changed that to false since we have some issue probably in dashboard or somewhere. Those error may not exist today but just sharing the thoughts.

Agreed, if the dashboard module is enabled, it will be good to automatically enable the rook module. Perhaps we don't even need to support enabling/disabling the rook module if it's listed in the modules section of the cluster CR. So this likely means when the configureDashboardModules() method is called to enable the dashboard, that method would call configureOrchestratorModules().

Still not clear what are the benefits of doing this in code as we have the support to enable/disable modules through configuration. I'd prefer to keep this kind of functionality in conf unless there's really a good reason to move it to code.

@travisn
Copy link
Member

travisn commented Feb 14, 2024

Still not clear what are the benefits of doing this in code as we have the support to enable/disable modules through configuration. I'd prefer to keep this kind of functionality in conf unless there's really a good reason to move it to code.

The main question for me is if we want to consider it's "required" to have the rook mgr module enabled when the dashboard is enabled. When it's not enabled, the dashboard will show messages about needing to enable it, so it really seems like it's required for full dashboard functionality.

In practice, it will likely be sufficient to just add the rook mgr module to the example manifests for enabling it as a separate module, so implementing it in code won't really change the scenario.

Copy link

mergify bot commented Feb 15, 2024

This pull request has merge conflicts that must be resolved before it can be merged. @rkachach please rebase it. https://rook.io/docs/rook/latest/Contributing/development-flow/#updating-your-fork

@rkachach rkachach marked this pull request as ready for review February 29, 2024 13:07
@travisn
Copy link
Member

travisn commented Mar 12, 2024

Since v18.2.2 is out, it would be great to enable this by default now. How about just rebasing to get the latest CI fixes?

@rkachach
Copy link
Contributor Author

Since v18.2.2 is out, it would be great to enable this by default now. How about just rebasing to get the latest CI fixes?

Done

@BlaineEXE BlaineEXE added the debug-ci run CI with debugging label Mar 13, 2024
previously, we had kept the Rook orchestrator manager module disabled
so far because it was causing a bunch of issues and errors on the
dashboard. But with the recent changes made in the cephv v18.2.1
release, we've fixed those issues and made some overall improvements
to the dashboard user experience when Rook is enabled. With all that
in mind, it's time to switch on the Rook orchestrator by default.

Fixes: rook#13760

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
@rkachach rkachach removed the do-not-merge DO NOT MERGE :) label Mar 13, 2024
Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a permission is missing for tests where there is a cluster in a second namespace:

2024-03-13 19:29:22.737236 I | ceph-spec: ceph-file-controller: CephCluster "my-cluster" found but skipping reconcile since ceph health is &{Health:HEALTH_ERR Details:map[MGR_MODULE_ERROR:{Severity:HEALTH_ERR Message:Module 'rook' has failed: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '6b518229-d6e1-49d1-a16b-247e2eab237f', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '2609213b-bd67-4367-9804-46c4e37b0667', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'f640617f-7d82-45fd-a24b-f5ffd9e34525', 'Date': 'Wed, 13 Mar 2024 19:19:40 GMT', 'Content-Length': '295'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"nodes is forbidden: User \"system:serviceaccount:rook-ceph-secondary:rook-ceph-mgr\" cannot list resource \"nodes\" in API group \"\" at the cluster scope","reason":"Forbidden","details":{"kind":"nodes"},"code":403}

}] LastChecked:2024-03-13T19:28:50Z LastChanged:2024-03-13T19:20:42Z PreviousHealth:HEALTH_OK Capacity:{TotalBytes:6442450944 UsedBytes:28209152 AvailableBytes:6414241792 LastUpdated:2024-03-13T19:28:50Z} Versions:0xc00156fc00 FSID:3be4c34b-e180-4f6b-a346-0034aa37a77d}

@rkachach
Copy link
Contributor Author

The full stack trace from the error:

Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'c70097f3-3d57-4f6b-b7ad-b10435fcda43', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options':
 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': 'd3b30d1e-d8d1-45ce-9279-fd7cf81515a9', 'X-Kubernetes-Pf-Prioritylevel-Uid': '84562724-007b-4670-afc5-1a77fb588000', 'Date': 'Thu, 14 Mar 2024 1
5:36:37 GMT', 'Content-Length': '295'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"nodes is forbidden: User \"system:serviceaccount:rook-ceph-secondary:rook-ceph-mgr\" cannot
 list resource \"nodes\" in API group \"\" at the cluster scope","reason":"Forbidden","details":{"kind":"nodes"},"code":403}

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 137, in wrapper
    return OrchResult(f(*args, **kwargs))
  File "/usr/share/ceph/mgr/rook/module.py", line 244, in get_hosts
    return self.rook_cluster.get_hosts()
  File "/usr/share/ceph/mgr/rook/rook_cluster.py", line 1230, in get_hosts
    for node in self.nodes.items:
  File "/usr/share/ceph/mgr/rook/rook_cluster.py", line 302, in items
    resource_version = self._fetch()
  File "/usr/share/ceph/mgr/rook/rook_cluster.py", line 284, in _fetch
    response = self.api_func(**self.kwargs)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api/core_v1_api.py", line 13726, in list_node
    (data) = self.list_node_with_http_info(**kwargs)  # noqa: E501
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api/core_v1_api.py", line 13821, in list_node_with_http_info
    collection_formats=collection_formats)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 345, in call_api
    _preload_content, _request_timeout)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 176, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 366, in request
    headers=headers)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 241, in GET
    query_params=query_params)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 231, in request
    raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (403)
Reason: Forbidden

@rkachach rkachach force-pushed the fix_issue_13760 branch 2 times, most recently from 0fdd1a3 to 1bc7ced Compare March 14, 2024 16:48
deploy/examples/common-second-cluster.yaml Outdated Show resolved Hide resolved
deploy/examples/common-second-cluster.yaml Outdated Show resolved Hide resolved
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
Copy link
Member

@parth-gr parth-gr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

deploy/examples/common-second-cluster.yaml Show resolved Hide resolved
@travisn travisn merged commit 9c0d795 into rook:master Mar 15, 2024
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
debug-ci run CI with debugging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable rook orchestrator by default
5 participants