-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic RBAC #172
Add basic RBAC #172
Conversation
Do we need to change the rbac entry in the jupyter-config.yaml file as well? |
Yes, thanks for catching that. I'll fix. |
If you would like me to squash into a single commit, I can do that. Please let me know. Would also be good to have @jacobtomlinson review if he has time. |
We can always squash with the green button. No reason to do so manually. |
🙌 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! I've made a few comments.
One thing to note is I still haven't managed to get access to worker logs in dask-kubernetes
with this Role
. So it will require some tweaking when I figure it out.
gce/jupyter-config.yaml
Outdated
|
||
hub: | ||
extraConfig: | | ||
c.KubeSpawner.singleuser_service_account = 'default' | ||
c.KubeSpawner.singleuser_service_account = 'daskkubernetes' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could move this into the helm config itself instead of the extraConfig
code as this config option is supported in z2jh
.
singleuser:
serviceAccountName: daskkubernetes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unable to launch worker pods when the user was set within the singleuser section. I never figured out why. Should I do more testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps. This is working on our stack.
To be honest I'm not worried either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought did you remove the c.KubeSpawner.singleuser_service_account = 'default'
line when setting within the singleuser
section? Things in the extra config will override the helm config values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure I did, but I can try again.
@@ -1,5 +1,5 @@ | |||
# Start cluster on Google cloud | |||
gcloud container clusters create pangeo --num-nodes=10 --machine-type=n1-standard-2 --zone=us-central1-b --cluster-version=1.9.3-gke.0 | |||
gcloud container clusters create pangeo --no-enable-legacy-authorization --num-nodes=10 --machine-type=n1-standard-2 --zone=us-central1-b --cluster-version=1.9.3-gke.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay! 🎉
@@ -17,6 +17,10 @@ helm repo update | |||
# Install JupyterHub and Dask on the cluster | |||
helm install jupyterhub/jupyterhub --version=v0.6.0-9701a90 --name=jupyter --namespace=pangeo -f jupyter-config.yaml -f secret-config.yaml | |||
|
|||
# create the daskkubernetes service account and role bindings | |||
echo "Installing service account for daskkubernetes." | |||
kubectl create -f dask-kubernetes-serviceaccount.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR This is an anti-pattern, but probably fine for now.
I'm uncomfortable having additional manifests being added manually as you can no longer simply run helm delete <release> --purge
to remove Pangeo. It would be better to have the whole deployment contained in a single helm installation. This could be a good opportunity to create a pangeo
helm chart.
We currently have a jadepangeo
helm chart which has jupyterhub/jupyterhub
as a dependency and adds some extra manifests. I would like to get to a position where there is an offical chart (maybe called pangeo/pangeo
) which depends on jupyterhub/jupyterhub
and then we can make jadepangeo
depend on pangeo/pangeo
. However this is beyond the scope of this PR so I'll raise a new issue for start work on a PR.
We'll probably rename jadepangeo
to metoffice/pangeo
or something more sensible as the Pangeo project has superseded the Jade project now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #178
gce/jupyter-config.yaml
Outdated
@@ -24,11 +24,11 @@ singleuser: | |||
enabled: true | |||
|
|||
rbac: | |||
enabled: false | |||
enabled: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This defaults to true so you could omit this.
@jacobtomlinson, one question I have is, does the role as currently defined limit the pod operations only to the pods that the user owns, or can these operations (verbs) be applied to any pod in the cluster? |
They can be applied to any pod in the namespace. Therefore it might be sensible to have a separate namespace for dask workers. |
Okay. A separate namespace for workers sounds like it would be a significant increase in security. Another question I have is, what are the steps to running containers as unprivileged? |
Well it looks like the only requirement for privileged containers currently is to allow users to mount FUSE filesystems within their notebook containers. On our AWS deployment we are using custom FUSE flex volume drivers which means that S3 is mounted onto the host instead and then volumed into an unprivileged container. So we could create equivalent drivers for the Google platform. Then the containers could be run as unprivileged. The other option is to stop using FUSE. Our goal is for our tools to work directly with object stores and remove the requirement for FUSE all together, but we are a way off that currently. |
Is it a crazy idea to launch every notebook pod into its own unique namespace, where it has control over the pods in that namespace but zero control over any other pods in any other namespace? |
Not crazy at all. I have considered this before. It will require changes to the Being able to delete a namespace is a pretty big deal as it deletes all resources within it, so giving the |
cc @yuvipanda |
This PR is definitely a step in the right direction. Even more can be done going forwards. I'm keen for it to be merged ASAP so I can incorporate it into my helm PR. |
@jacobtomlinson both you and @tjcrone should have merge privileges. I encourage you both to use them liberally. This system is still experimental. I think it's more imprtant to move quickly than to keep things from breaking, especially in this direction. If folks are waiting on me to merge things like this we'll probably end up waiting a while. I'll be especially busy in the coming month, and so am keen to ensure that others feel empowered to make large changes here. |
@mrocklin I'm happy to merge but I am currently unable to test on GCE or update the demo platform. I'm not sure how you want to handle that? Perhaps it would be good to set up Travis CI or some other CI/CD to run tests and keep the platform up to date? |
Roger that @mrocklin. This PR has had enough discussion and review that I am comfortable merging. Cheers. |
@jacobtomlinson I have just given jacob.tomlinson@informaticslab.co.uk edit rights on the project. |
This PR has been merged. But has the cluster been re-deployed with the new settings? |
Not as far as I'm aware. |
@jacobtomlinson how do you manage deployment at the met office? Do you manually update your cluster every time you change the configuration, or do you have some automated way to do this? (see #131) I worry that we are making updates to this configuration without actually testing it live. |
As per suggestions by @jacobtomlinson, add basic RBAC setup.