Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Watch CR in multiple namespaces with namespaced RBAC resources #1106

Merged
merged 9 commits into from
Jun 5, 2023

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented May 23, 2023

Why are these changes needed?

Currently, KubeRay supports the following options:

  • Watching all custom resources in the Kubernetes cluster: This is the default setting, which installs cluster-scoped RBAC resources (i.e., ClusterRole and ClusterRoleBinding).
  • Watching only the namespace where the operator is deployed: This is achieved by setting singleNamespaceInstall to true, which installs namespaced RBAC resources (i.e., Role and RoleBinding).

However, for users who only have namespaced access, they would need to deploy one KubeRay operator for each namespace. This can increase the maintenance overhead, such as upgrading the version of KubeRay for each deployed instance.

This PR implements a feature that supports users to deploy a KubeRay operator to watch multiple namespaces without installing ClusterRole and ClusterRoleBinding as proposal in #1084.

There are two fields available to configure the KubeRay operator to achieve this:

  • singleNamespaceInstall:

    • Case 1: If set to false, the KubeRay operator will create ClusterRole and ClusterRoleBinding.
    • Case 2: If set to true and watchNamespace is set, it will create Role and RoleBinding for all namespaces listed in watchNamespace.
    • Case 3: If set to true and watchNamespace is not set, it will create Role and RoleBinding for the namespace where the operator is deployed.
    • Case 4: If singleNamespaceInstall is not set, the KubeRay operator will create ClusterRole and ClusterRoleBinding.
  • watchNamespace: A list of namespaces that the KubeRay operator will watch. In addition, it will be passed to KubeRay binary as the --watch-namespace flag.

Related issue number

Closes #1084
Closes #1083

#1094

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

Case 1: If singleNamespaceInstall is set to false, the KubeRay operator will create ClusterRole and ClusterRoleBinding.

kind create cluster --image=kindest/node:v1.23.0
kind load docker-image controller:latest

# Create namespaces
kubectl create ns n1
kubectl create ns n2

# Install a KubeRay operator (use gist1 to replace values.yaml)
# (path: helm-chart/kuberay-operator)
helm install kuberay-operator . --set image.repository=controller,image.tag=latest

# Check ClusterRole
kubectl get clusterrole | grep kuberay
# kuberay-operator                                                       2023-05-23T22:02:31Z

# Check Role 
kubectl get role
# NAME                               CREATED AT
# kuberay-operator-leader-election   2023-05-23T22:02:31Z


# Install RayCluster in `default`, `n1`, `n2`
helm install raycluster kuberay/ray-cluster --version 0.5.0
helm install raycluster kuberay/ray-cluster --version 0.5.0 -n n1
helm install raycluster kuberay/ray-cluster --version 0.5.0 -n n2

# RayCluster in these 3 namespaces should be created.
Screen Shot 2023-05-23 at 3 08 44 PM

Case 2: If singleNamespaceInstall is set to true and watchNamespace is set, it will create Role and RoleBinding for all namespaces listed in watchNamespace.

  • watchNamespace: n1, n2
kind create cluster --image=kindest/node:v1.23.0
kind load docker-image controller:latest

# Create namespaces
kubectl create ns n1
kubectl create ns n2

# Install a KubeRay operator (use gist2 to replace values.yaml)
# (path: helm-chart/kuberay-operator)
helm install kuberay-operator . --set image.repository=controller,image.tag=latest

# Check ClusterRole
kubectl get clusterrole | grep kuberay
# (nothing found)

# Check Role 
kubectl get role --all-namespaces | grep kuberay
# default       kuberay-operator-leader-election                 2023-05-23T22:27:37Z
# n1            kuberay-operator                                 2023-05-23T22:27:37Z
# n2            kuberay-operator                                 2023-05-23T22:27:37Z


# Install RayCluster in `default`, `n1`, `n2`
helm install raycluster kuberay/ray-cluster --version 0.5.0
helm install raycluster kuberay/ray-cluster --version 0.5.0 -n n1
helm install raycluster kuberay/ray-cluster --version 0.5.0 -n n2

# Only RayCluster in n1 and n2 will be created.
Screen Shot 2023-05-23 at 3 31 02 PM

Case 3: If singleNamespaceInstall is set to true and watchNamespace is not set, it will create Role and RoleBinding for the namespace where the operator is deployed.

kind create cluster --image=kindest/node:v1.23.0
kind load docker-image controller:latest

# Create namespaces
kubectl create ns n1
kubectl create ns n2

# Install a KubeRay operator (use gist3 to replace values.yaml)
# (path: helm-chart/kuberay-operator)
helm install kuberay-operator . --set image.repository=controller,image.tag=latest

# Check ClusterRole
kubectl get clusterrole | grep kuberay
# (nothing found)

# Check Role 
kubectl get role --all-namespaces | grep kuberay
# default       kuberay-operator                                 2023-05-23T22:35:15Z
# default       kuberay-operator-leader-election                 2023-05-23T22:35:15Z


# Install RayCluster in `default`, `n1`, `n2`
helm install raycluster kuberay/ray-cluster --version 0.5.0
helm install raycluster kuberay/ray-cluster --version 0.5.0 -n n1
helm install raycluster kuberay/ray-cluster --version 0.5.0 -n n2

# Only RayCluster in `default` will be created.
Screen Shot 2023-05-23 at 3 36 15 PM

Case 4: If singleNamespaceInstall is not set, the KubeRay operator will create ClusterRole and ClusterRoleBinding.

kind create cluster --image=kindest/node:v1.23.0
kind load docker-image controller:latest

# Create namespaces
kubectl create ns n1
kubectl create ns n2

# Install a KubeRay operator (use gist3 to replace values.yaml)
# (path: helm-chart/kuberay-operator)
helm install kuberay-operator . --set image.repository=controller,image.tag=latest

# Check ClusterRole
kubectl get clusterrole | grep kuberay
# kuberay-operator                                                       2023-05-23T22:40:18Z

# Check Role 
kubectl get role --all-namespaces | grep kuberay
# default       kuberay-operator-leader-election                 2023-05-23T22:40:18Z


# Install RayCluster in `default`, `n1`, `n2`
helm install raycluster kuberay/ray-cluster --version 0.5.0
helm install raycluster kuberay/ray-cluster --version 0.5.0 -n n1
helm install raycluster kuberay/ray-cluster --version 0.5.0 -n n2

# RayCluster in `default`, `n1`, and `n2` will be created.
Screen Shot 2023-05-23 at 3 41 11 PM

})
}

if len(watchNamespaces) == 1 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not possible for len(watchNamespaces) == 0 to be true. The length of strings.Split("", ",") is still 1. See this link for more details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems worth including as a code comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added f209400.


if len(watchNamespaces) == 1 {
options.Namespace = watchNamespaces[0]
if watchNamespaces[0] == "" {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this doc, if Namespace is specified, restricts the manager's cache to watch objects in the desired namespace. If the field is not specified, it defaults to all namespaces.

setupLog.Info(fmt.Sprintf("Only watch custom resources in the namespace: %s", watchNamespaces[0]))
}
} else {
options.NewCache = cache.MultiNamespacedCacheBuilder(watchNamespaces)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To watch multiple namespaces, we can use MultiNamespacedCache. See this example and this doc for more details.

@@ -30,3 +30,12 @@ rules:
- events
verbs:
- create
- apiGroups:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary for Leader Election. We can refer to leader_election_role.yaml in the kubebuilder repo for further.

@@ -32,4 +31,13 @@ rules:
- events
verbs:
- create
- apiGroups:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Necessary for leader election.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education, how did you notice that this was necessary? (Did some test fail?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted to skip the installation of role.yaml, but it resulted in the failure of the leader election due to permission issues. Then, I found leader_election_role.yaml.

@@ -42,10 +42,10 @@ spec:
{{- $argList = append $argList "--enable-batch-scheduler" -}}
{{- end -}}
{{- $watchNamespace := "" -}}
{{- if .Values.singleNamespaceInstall -}}
{{- if and .Values.singleNamespaceInstall (not .Values.watchNamespace) -}}
{{- $watchNamespace = .Release.Namespace -}}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case 1: If singleNamespaceInstall is set to true and watchNamespace is not specified, the KubeRay operator will only watch the namespace where the operator is deployed.

Case 2: If watchNamespace is specified, the KubeRay operator will watch all namespaces listed in the variable. In this case, we concatenate all namespaces into a string separated by commas. Then, the string will be parsed by strings.Split(watchNamespace, ",") in main.go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this, it looks likesingleNamespaceInstall and watchNamespace are existing fields in KubeRay. The new behavior for both is clearly explained by Cases 1-4. The old behavior for singleNamespaceInstall is also described. But I think the old behavior of watchNamespace is missing from the PR description, do you mind adding it? (Maybe the only change is that it now supports multiple namespaces?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check, are commas , forbidden characters in namespaces?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think the old behavior of watchNamespace is missing from the PR description, do you mind adding it?

The behavior is described by the comments in values.yaml. The comments are deleted by this PR.

# kuberay operator will only watch the resource events from the "watchNamespace" namespace.
# this option has no effect if singleNamespaceInstall is true, because we assume there are no
# permissions outside of the current namespace
# watchNamespace: ray-user-namespace

Just to double check, are commas , forbidden characters in namespaces?

The flag --watch-namespace is a list of namespaces, separated by commas. It will be parsed by watchNamespaces := strings.Split(watchNamespace, ",") in main.go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but if a namespace is allowed to have a comma in it then breaking by commas won't work. Anyways I checked and it looks like namespaces can only have alphanumeric characters, so it's fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but if a namespace is allowed to have a comma in it then breaking by commas won't work.

Oh, I understand now. I missed that part. Thank you for pointing it out!

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
labels: {{ include "kuberay-operator.labels" $ | nindent 4 }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a range block, . no longer refers to the top level scope, so use $ to access the top-level variables.

@kevin85421 kevin85421 mentioned this pull request May 23, 2023
4 tasks
@kevin85421 kevin85421 changed the title Watch multiple namespaces [Feature] Watch CR in multiple namespaces with namespaced RBAC resources May 23, 2023
@kevin85421 kevin85421 marked this pull request as ready for review May 23, 2023 22:43
@kevin85421
Copy link
Member Author

cc @msumitjain @anshulomar @Yicheng-Lu-llll would you mind reviewing this PR? Thanks!

@kevin85421
Copy link
Member Author

Follow up:

  • Add consistency check for multiple_namespaces_*.yaml
  • Revisit the RBAC YAML files and examine potential overlaps between role.yaml and ray_*_editor_role.yaml.

@architkulkarni architkulkarni self-assigned this May 30, 2023
Copy link
Contributor

@architkulkarni architkulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have prior experience with this area so I don't really have substantial critical feedback, just comments/questions.

As a general note, the PR comments are really helpful, and I think most of them could be included directly as code comments to improve readability with little or no downside.

@@ -42,10 +42,10 @@ spec:
{{- $argList = append $argList "--enable-batch-scheduler" -}}
{{- end -}}
{{- $watchNamespace := "" -}}
{{- if .Values.singleNamespaceInstall -}}
{{- if and .Values.singleNamespaceInstall (not .Values.watchNamespace) -}}
{{- $watchNamespace = .Release.Namespace -}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this, it looks likesingleNamespaceInstall and watchNamespace are existing fields in KubeRay. The new behavior for both is clearly explained by Cases 1-4. The old behavior for singleNamespaceInstall is also described. But I think the old behavior of watchNamespace is missing from the PR description, do you mind adding it? (Maybe the only change is that it now supports multiple namespaces?)

@@ -42,10 +42,10 @@ spec:
{{- $argList = append $argList "--enable-batch-scheduler" -}}
{{- end -}}
{{- $watchNamespace := "" -}}
{{- if .Values.singleNamespaceInstall -}}
{{- if and .Values.singleNamespaceInstall (not .Values.watchNamespace) -}}
{{- $watchNamespace = .Release.Namespace -}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check, are commas , forbidden characters in namespaces?

@@ -32,4 +31,13 @@ rules:
- events
verbs:
- create
- apiGroups:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education, how did you notice that this was necessary? (Did some test fail?)


{{- if .Values.singleNamespaceInstall }}
kind: Role
{{- else }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the effect of this change? It looks like now we don't create these roles if singleNamespaceInstall is false, why don't we need them anymore in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Is it because we're replacing these roles with the new multiple_namespaces_role?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the effect of this change? It looks like now we don't create these roles if singleNamespaceInstall is false, why don't we need them anymore in this case?

The permissions of this file are a subset of those in role.yaml. Therefore, KubeRay operator does not need it before/after this PR. In my understanding, there is no RoleBinding to link these Roles with any ServiceAccount. These RBAC YAML files are generated by kubebuilder. I guess they are used to define users' access.

# permissions outside of the current namespace
# watchNamespace: ray-user-namespace
# The KubeRay operator will watch the custom resources in the namespaces listed in the "watchNamespace" parameter.
# watchNamespace:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the source of truth for documentation, or is there a corresponding doc page that also needs to be updated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is the source of truth. There is no document related to it.

ray-operator/DEVELOPMENT.md Show resolved Hide resolved
})
}

if len(watchNamespaces) == 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems worth including as a code comment.

@architkulkarni
Copy link
Contributor

Also, is this a breaking change?

Copy link
Contributor

@msumitjain msumitjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kevin85421
Copy link
Member Author

I think most of them could be included directly as code comments to improve readability with little or no downside.

Adding comments in Helm chart template YAML files may cause issues with the Helm parser.

Also, is this a breaking change?

Nope.

Copy link

@anshulomar anshulomar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kevin85421 for putting up this PR! It looks good to me but I have few comments:

  1. (Functional) While the changes are fine for a manual deploy, for automated deploys there could be challenges. For example, an Argo CD project can deploy to only a single namespace. In such a case, the Argo CD won't be able to apply multiple_namespaces_role and multiple_namespaces_rolebinding to any namespace other than the one it is configured with - where it deploys the operator. So, Argo CD projects for respective namespaces (where RayServices run) will have to apply the role and rolebinding to allow the operator manage RayService in their namespaces. Hence, to facilitate such use-cases, we might want to introduce a new config attribute, say multipleNamespacesRbacEnabled. If its value is true, only then multiple_namespaces_role/rolebinding should be applied. What do you think?
  2. (Config Readability) I think the field name singleNamespaceInstall becomes confusing after this change. This is because it could mean installing KubeRay and its roles/rolebindings either in a single namespace or in multiple namespaces (defined in watchNamespaces list). One way to resolve this confusion could be to replace this field with another field, say clusterScopedInstall. When clusterScopedInstall: true is set, then ClusterRole and ClusterRoleBinding should be created. When clusterScopedInstall: false is set, then roles and rolebindings should be created in the namespaces listed in watchNamespaces. That said, I do understand that singleNamespaceInstall field may be required for backward-compatibility. If that is the case, please ignore this point.

@kevin85421
Copy link
Member Author

Thank @anshulomar for the insight!

  1. For example, an Argo CD project can deploy to only a single namespace. In such a case, the Argo CD won't be able to apply multiple_namespaces_role and multiple_namespaces_rolebinding to any namespace other than the one it is configured with - where it deploys the operator.

Wow, I wasn't aware that ArgoCD has this restriction.

Argo CD projects for respective namespaces (where RayServices run) will have to apply the role and rolebinding to allow the operator manage RayService in their namespaces. Hence, to facilitate such use-cases, we might want to introduce a new config attribute, say multipleNamespacesRbacEnabled. If its value is true, only then multiple_namespaces_role/rolebinding should be applied. What do you think?

This is a bit confusing for me. Could you please provide more details or an example to help clarify? Thanks!

Q: Which RBAC resources (Role / RoleBinding / ClusterRole / ClusterRoleBinding / ServiceAccount) should be created by the KubeRay operator Helm chart when multipleNamespacesRbacEnabled is set to false?

Current situation:

(1) Users can set rbacEnable: false and serviceAccount.create: false to prevent the creation of any RBAC resources.

(2) Users can use the singleNamespaceInstall option to determine whether to create namespaced RBAC resources (e.g., Role and RoleBinding) or cluster-scoped RBAC resources (e.g., ClusterRole and ClusterRoleBinding).

(3) If singleNamespaceInstall is set to true, users can use the watchNamespace option to determine in which namespaces the Role and RoleBinding should be installed. If watchNamespace is not set, the Role and RoleBinding will be installed in the same namespace as the KubeRay operator.

Based on (1)(2)(3), users seem to have enough power? I do not have any experience with ArgoCD. I may miss some points.

@kevin85421
Copy link
Member Author

@anshulomar

(Config Readability) I think the field name singleNamespaceInstall becomes confusing after this change.

I agree that this can be confusing. Breaking backward compatibility can be a challenging decision. For example, even if there is a typo like miniReplicas, we still need to make accommodations to maintain backward compatibility through the following hacks.

minReplicas: {{ $values.minReplicas | default (default 1 $values.miniReplicas) }}

{{- $watchNamespaces := default (list .Release.Namespace) .Values.watchNamespace }}
{{- range $namespace := $watchNamespaces }}
---
kind: Role
Copy link
Contributor

@Yicheng-Lu-llll Yicheng-Lu-llll Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we could add thescenario: singleNamespaceInstall is set to false or not set here. So that there is no need to synchronize multiple_namespaces_role.yaml and role.yaml (same as rolebinding)

More specifically, when singleNamespaceInstall is false:

  • set $watchNamespaces to a placeholder value (so that only iterates once)
  • create a ClusterRole instead of a Role.
  • omit the metadata.namespace field

Copy link
Member Author

@kevin85421 kevin85421 Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, I intended to achieve this by updating role.yaml and rolebinding.yaml instead of creating new files (multi_namespaces_*.yaml). However, it can be extremely challenging or even impossible (?) to achieve that. The Helm parser is not smart enough.

@kevin85421 kevin85421 merged commit 36b112e into ray-project:master Jun 5, 2023
17 checks passed
@anshulomar
Copy link

anshulomar commented Jun 6, 2023

Thanks @kevin85421 !

This is a bit confusing for me. Could you please provide more details or an example to help clarify? Thanks!

Sure. Given that an ArgoCD project can deploy only to one namespace, ArgoCD for kuberay operator won't be able to apply multiple_namespaces_role/rolebinding to other namespaces where RayServices will be deployed. Let us consider a Kubernetes cluster configured with Argo. And let us suppose that we need to deploy kuberay operator in namespace n1 and rayservice1 in n2 and rayservice2 in n3. In such a scenario, we will have three ArgoCD projects:

  • ArgoCD project1 for deploying kuberay operator in n1
  • ArgoCD project2 for deploying rayservice1 in n2
  • ArgoCD project3 for deploying rayservice2 in n3

By configuring multipleNamespacesRbacEnabled: false, ArgoCD project1 won't apply multiple_namespaces_role/rolebinding at all, thereby averting error scenario that could arise due to permission issue. So, we will have to additionally configure ArgoCD project2 and project3 to apply the required role/rolebinding in their respective namespaces. This additional configuration can be done by end clients themselves - it doesn't require any changes in this repository to manage it.

Q: Which RBAC resources (Role / RoleBinding / ClusterRole / ClusterRoleBinding / ServiceAccount) should be created by the KubeRay operator Helm chart when multipleNamespacesRbacEnabled is set to false?

When multipleNamespacesRbacEnabled: false is set, this Helm chart should not create any resources in a namespace different from the one it is installed in. This restriction is clearly applicable to only namespace-scoped resources such as roles, rolebindings, serviceaccount and so on. ClusterRole and ClusterRoleBinding are cluster-scoped resources so this Helm chart should create them if singleNamespaceInstall: false.

Based on (1)(2)(3), users seem to have enough power? I do not have any experience with ArgoCD. I may miss some points.

Yes, users have enough power but for automated deployment tools like Argo, there could be permission restrictions for applying manifests to multiple namespaces as explained in this comment above.

Does it help? Or is it still confusing?

@kevin85421
Copy link
Member Author

@anshulomar Thank you for your reply! I understand your point clearly, and I will proceed to draft a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants