Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic RBAC #172

Merged
merged 4 commits into from
Mar 23, 2018
Merged

Add basic RBAC #172

merged 4 commits into from
Mar 23, 2018

Conversation

tjcrone
Copy link
Contributor

@tjcrone tjcrone commented Mar 21, 2018

As per suggestions by @jacobtomlinson, add basic RBAC setup.

@mrocklin
Copy link
Member

Do we need to change the rbac entry in the jupyter-config.yaml file as well?

@tjcrone
Copy link
Contributor Author

tjcrone commented Mar 21, 2018

Yes, thanks for catching that. I'll fix.

@tjcrone
Copy link
Contributor Author

tjcrone commented Mar 21, 2018

If you would like me to squash into a single commit, I can do that. Please let me know. Would also be good to have @jacobtomlinson review if he has time.

@mrocklin
Copy link
Member

We can always squash with the green button. No reason to do so manually.

@rabernat
Copy link
Member

🙌

Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! I've made a few comments.

One thing to note is I still haven't managed to get access to worker logs in dask-kubernetes with this Role. So it will require some tweaking when I figure it out.


hub:
extraConfig: |
c.KubeSpawner.singleuser_service_account = 'default'
c.KubeSpawner.singleuser_service_account = 'daskkubernetes'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could move this into the helm config itself instead of the extraConfig code as this config option is supported in z2jh.

singleuser:
    serviceAccountName: daskkubernetes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unable to launch worker pods when the user was set within the singleuser section. I never figured out why. Should I do more testing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps. This is working on our stack.

To be honest I'm not worried either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought did you remove the c.KubeSpawner.singleuser_service_account = 'default' line when setting within the singleuser section? Things in the extra config will override the helm config values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure I did, but I can try again.

@@ -1,5 +1,5 @@
# Start cluster on Google cloud
gcloud container clusters create pangeo --num-nodes=10 --machine-type=n1-standard-2 --zone=us-central1-b --cluster-version=1.9.3-gke.0
gcloud container clusters create pangeo --no-enable-legacy-authorization --num-nodes=10 --machine-type=n1-standard-2 --zone=us-central1-b --cluster-version=1.9.3-gke.0
Copy link
Member

@jacobtomlinson jacobtomlinson Mar 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay! 🎉

@@ -17,6 +17,10 @@ helm repo update
# Install JupyterHub and Dask on the cluster
helm install jupyterhub/jupyterhub --version=v0.6.0-9701a90 --name=jupyter --namespace=pangeo -f jupyter-config.yaml -f secret-config.yaml

# create the daskkubernetes service account and role bindings
echo "Installing service account for daskkubernetes."
kubectl create -f dask-kubernetes-serviceaccount.yaml
Copy link
Member

@jacobtomlinson jacobtomlinson Mar 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR This is an anti-pattern, but probably fine for now.

I'm uncomfortable having additional manifests being added manually as you can no longer simply run helm delete <release> --purge to remove Pangeo. It would be better to have the whole deployment contained in a single helm installation. This could be a good opportunity to create a pangeo helm chart.

We currently have a jadepangeo helm chart which has jupyterhub/jupyterhub as a dependency and adds some extra manifests. I would like to get to a position where there is an offical chart (maybe called pangeo/pangeo) which depends on jupyterhub/jupyterhub and then we can make jadepangeo depend on pangeo/pangeo. However this is beyond the scope of this PR so I'll raise a new issue for start work on a PR.

We'll probably rename jadepangeo to metoffice/pangeo or something more sensible as the Pangeo project has superseded the Jade project now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #178

@@ -24,11 +24,11 @@ singleuser:
enabled: true

rbac:
enabled: false
enabled: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This defaults to true so you could omit this.

@tjcrone
Copy link
Contributor Author

tjcrone commented Mar 23, 2018

@jacobtomlinson, one question I have is, does the role as currently defined limit the pod operations only to the pods that the user owns, or can these operations (verbs) be applied to any pod in the cluster?

@jacobtomlinson
Copy link
Member

They can be applied to any pod in the namespace. Therefore it might be sensible to have a separate namespace for dask workers.

@tjcrone
Copy link
Contributor Author

tjcrone commented Mar 23, 2018

Okay. A separate namespace for workers sounds like it would be a significant increase in security. Another question I have is, what are the steps to running containers as unprivileged?

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Mar 23, 2018

Well it looks like the only requirement for privileged containers currently is to allow users to mount FUSE filesystems within their notebook containers.

On our AWS deployment we are using custom FUSE flex volume drivers which means that S3 is mounted onto the host instead and then volumed into an unprivileged container. So we could create equivalent drivers for the Google platform. Then the containers could be run as unprivileged.

The other option is to stop using FUSE. Our goal is for our tools to work directly with object stores and remove the requirement for FUSE all together, but we are a way off that currently.

@tjcrone
Copy link
Contributor Author

tjcrone commented Mar 23, 2018

Is it a crazy idea to launch every notebook pod into its own unique namespace, where it has control over the pods in that namespace but zero control over any other pods in any other namespace?

@jacobtomlinson
Copy link
Member

Not crazy at all. I have considered this before.

It will require changes to the KubeSpawner as it will need to be able to create and remove namespaces too. It also makes assumptions about how the Kubernetes cluster is being managed. For example some people may have been given access to a single namespace on a shared cluster in their org, but others may have dedicated clusters like those used in this project.

Being able to delete a namespace is a pretty big deal as it deletes all resources within it, so giving the KubeSpawner that power is a big deal.

@mrocklin
Copy link
Member

It will require changes to the KubeSpawner as it will need to be able to create and remove namespaces too

cc @yuvipanda

@jacobtomlinson
Copy link
Member

This PR is definitely a step in the right direction. Even more can be done going forwards.

I'm keen for it to be merged ASAP so I can incorporate it into my helm PR.

@mrocklin
Copy link
Member

@jacobtomlinson both you and @tjcrone should have merge privileges. I encourage you both to use them liberally. This system is still experimental. I think it's more imprtant to move quickly than to keep things from breaking, especially in this direction.

If folks are waiting on me to merge things like this we'll probably end up waiting a while. I'll be especially busy in the coming month, and so am keen to ensure that others feel empowered to make large changes here.

@jacobtomlinson
Copy link
Member

@mrocklin I'm happy to merge but I am currently unable to test on GCE or update the demo platform.

I'm not sure how you want to handle that? Perhaps it would be good to set up Travis CI or some other CI/CD to run tests and keep the platform up to date?

@tjcrone
Copy link
Contributor Author

tjcrone commented Mar 23, 2018

Roger that @mrocklin. This PR has had enough discussion and review that I am comfortable merging. Cheers.

@jacobtomlinson jacobtomlinson merged commit 129a482 into pangeo-data:master Mar 23, 2018
@mrocklin
Copy link
Member

@jacobtomlinson I have just given jacob.tomlinson@informaticslab.co.uk edit rights on the project.

@jacobtomlinson jacobtomlinson mentioned this pull request Mar 23, 2018
2 tasks
@tjcrone tjcrone deleted the rbac branch March 24, 2018 15:53
@rabernat
Copy link
Member

This PR has been merged. But has the cluster been re-deployed with the new settings?

@jacobtomlinson
Copy link
Member

Not as far as I'm aware.

@rabernat
Copy link
Member

@jacobtomlinson how do you manage deployment at the met office? Do you manually update your cluster every time you change the configuration, or do you have some automated way to do this? (see #131)

I worry that we are making updates to this configuration without actually testing it live.

@jacobtomlinson
Copy link
Member

I manually do upgrades using helm. As @mrocklin suggests in #131 this could be automated in CI/CD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants