Configure JupyterHub spawner to allow root operations by default #300

cwbeitel · 2018-02-26T22:03:51Z

Regarding notebook server spawner options https://github.com/kubeflow/kubeflow/blob/master/kubeflow/core/jupyterhub_spawner.py#L79 currently c.KubeSpawner.args is being provided the arg"--allow-root" presumably because the original implementer meant for users to have root access in containers. But currently this is not available.

By adding "GRANT_SUDO=yes" to the environment users do have sudo privs in spawned notebooks, e.g. can successfully run sudo apt-get update:

###################################################
# Spawner Options
###################################################
c.JupyterHub.spawner_class = KubeFormSpawner
c.KubeSpawner.singleuser_image_spec = 'gcr.io/kubeflow/tensorflow-notebook'
c.KubeSpawner.cmd = 'start-singleuser.sh'
c.KubeSpawner.args = ['--allow-root']
# First pulls can be really slow, so let's give it a big timeout
c.KubeSpawner.start_timeout = 60 * 10
c.KubeSpawner.environment = {
    "GRANT_SUDO": "yes"
}

If making this addition should determine if there is anything in c.KubeSpawner.environment to over-write and if so append the variable instead of setting all.

Related to #289

The text was updated successfully, but these errors were encountered:

jlewi · 2018-02-27T04:19:37Z

Seems useful as this would also allow people to install packages e.g. via apt-get.

Any info on what the security implications are?

cwbeitel · 2018-02-27T16:12:52Z

I agree that seems worth considering and potentially problematic but I don't have any expert understanding in that area to contribute.

From my cursory reading it seems that running containers with root credentials makes it more likely that a kernel vulnerability could be exploited to gain access to the node running that container, given that with Docker the kernels are shared?

jlewi · 2018-02-27T21:33:41Z

I think the use cases of

Mounting NFS volumes
Installing additional packages
Image building

Are convincing enough that I think its worth doing until we have more reason to believe its unsafe and should disable it.

aronchick · 2018-02-27T23:57:00Z

cc @mayakacz

mayakacz · 2018-02-28T00:32:52Z

Containers that don't need to run as root shouldn't run as root. See for example: https://docs.docker.com/engine/security/security/#linux-kernel-capabilities

My concern is the earlier comment, "its worth doing until we have more reason to believe its unsafe" - I strongly disagree. Making this kind of change later would be breaking, and it'll be painful to get users to migrate from an unsafe state to a safe state. If we can prevent this initially, that is preferred.

Ideally, we provide the container with only the capabilities it needs, following the principle of least privilege.

cwbeitel · 2018-02-28T01:39:39Z

I agree with seeking to provide the least priv possible and preventing this initially if possible and to prevent breaking changes.

I think the debate should be whether installing system packages is a requirement of the developer experience because people can pip install in a virtual environment, we can mount NFS on startup, and we can probably solve the docker problem without root (e.g. we could launch notebook pods with a docker daemon sidecar a-la-dind).

On the no side perhaps we want to provide a rich base image with all the common tools already installed (e.g. ffmpeg for rendering) and expect that installation of package requiring system-level privs. will be infrequent and involve re-building the workspace container (followed by non-root users launching the container).

On the yes side it seems like it would be nice to have a development environment with the full capabilities of local development.

The worst case scenario seems to be that the user installs malicious packages that lead to an attacker gaining access to the kubernetes credentials for the underlying cluster. It doesn't render the issue moot, perhaps just raising another, but currently this can already happen since people are pulling kube and gcloud credentials to the notebook so they can use kubectl, gcloud, gsutil, etc. utilities.

mayakacz · 2018-02-28T02:48:06Z

A few questions (apologies if I am missing context):

How common do we think this set of packages will be across developers? This seems to be a configuration question - if it's highly custom per deployment, then we might not want to include; but if it is very consistent, then providing the set of packages, that have already been vuln scanned with minimum permissions, is a great idea. For the 'yes' side, this is a commitment to maintain certain security standards and release frequencies with this image - potentially a lot of work.
You mention, "this can already happen since people are pulling kube and gcloud credentials to the notebook". Do you mean people are storing these creds directly in a Jupyter notebook? Is there no built-in/ recommended secret mgmt solution? (or rather, are you saying a solution exists, but people aren't following the best practice?)

cwbeitel · 2018-02-28T04:40:28Z

No worries sorry if I'm missing experience with security!

Yeah that's a good question in terms of system packages these should be pretty common. Everyone debugging their TensorFlow Agents renders in the notebook is going to need libav-tools and ffmpeg, see also extensions of the base notebook image. Users will often need to install pip packages or change the version of those already installed but guessing this should just be done in a conda or other dependency env.

@danijar I'm guessing you don't very often need to run sudo-level operations and that you're primarily installing and changing versions of pip packages, is that correct? How would you feel to develop in an environment without sudo?

@mayakacz Is there any concern about allowing users to install arbitrary python packages at the user level? Perhaps this gets at your second point. Here's an example of such a notebook where "gcloud container clusters --project={PROJECT} --zone={ZONE} get-credentials {CLUSTER}" was called to obtain kube credentials which were stored in the default location. Also this particular cluster was deployed with the cloud platform scope

gcloud beta container clusters create \
       $CLUSTER \
       --project ${PROJECT} \
       --zone=$ZONE \
       ...
       --scopes=https://www.googleapis.com/auth/cloud-platform

which is why you're already authenticated to make gcloud and gsutil calls.

Kubernetes has this mechanism for storing secrets but the end result is just the credential saved to disk the same as results from gcloud container clusters get-credentials...

Given the above would you want to do something like mirror a bunch of popular ML packages and regularly scan those?

cwbeitel · 2018-02-28T04:48:04Z

@jlewi You mentioned NFS, do you anticipate people needing to be able to mount additional NFS volumes to existing notebook containers that aren't pre-attached? Idk maybe you're thinking of a variety of different datasets on different NFS volumes available to mount?

mayakacz · 2018-02-28T15:58:15Z

@cwbeitel "Is there any concern about allowing users to install arbitrary python packages at the user level?"
Not particularly. I assume they know what they want and are then responsible for that security. My push was more if there is something that 50%+ (arbitrary #) are using, then we should do that for them.

Re: credentials storage in secrets, yes, I believe that is the best way to do it with what's available today.

"Given the above would you want to do something like mirror a bunch of popular ML packages and regularly scan those?"
Yes. Having a repository of scanned packages, with some commitment to patch would be great. (It is work though, and I understand if it's not a top priority.)

inc0 · 2018-02-28T17:13:38Z

Thing is, we don't have this security right now - if person wants to run notebook as root, they can specify their own image that'll run as root. In fact, we explicitly run pod as root now and su to notebook user as part of start.sh.

I'd go for running it all as root tbh.

cwbeitel · 2018-02-28T17:31:25Z

Ok cool so far it sounds like providing a rich base container without sudo is the way forward but interested to hear from the others.

Also @jlewi perhaps its not necessary to even build training containers - perhaps we could just share model code with training containers via NFS and either also share dependency code that way or have training containers in their first step pip install a requirements.txt file. This would extend whatever is available in the workspace environment with user's collection of custom packages. The strategy for this not slowing down the initiation of training jobs could be to have the workspace container already include packages like pybullet and tensorflow that take the longest to build with most other python packages being quick to install.

This related to whether users are doing more than installing pip packages in their workspace container because if they're not we don't really need to mess with all this container building infrastructure.

cwbeitel · 2018-02-28T17:37:19Z

@inc0 posted my last before seeing yours. Right, users aren't currently prevented from running root containers. Are you advocating root to avoid restricting your/the development experience? That seems like an important thing to consider.

inc0 · 2018-02-28T17:43:48Z

@cwbeitel as I said on slack, we can do 2 things:

restrict to jovyan user, that'd need us to fix jovyan UID on creation (good idea anyway) and run pod with this uid. That will require that everyone wanting to run their custom images to have this user with this image... hard.
just run as root, there is some security considerations but that's usually how people run pods anyway, containers aren't meant to be security features anyway.

I'd go with root at least for now, will make quick simplification and I can't of any real security issue (if someone jailbreaks out of containers, that's bad tho).

ad nfs share -Yeah, I was kinda wondering why do we want to build container images in containers:) not a typical usage. I'd rather put model on PV rather than nfs share. More k8s'y and would allow to use proper storage infra. NFS can be an option that we might not need to worry about, if someone creates StorageClass with nfs, they're good.

inc0 · 2018-02-28T18:20:39Z

So I'd add cleanups to this issue as well - we have a lot of messing with perms and users in our start.sh script and others - if we run as root both jupyter dockerfile and this could lose a lot of code.

jlewi · 2018-03-01T02:07:35Z

@mayakacz Convinced me that I was being too permissive. I think we should avoid granting sudo privileges so that we don't create more work in terms of locking down the container in the future.

Closes StatCan#31 . This --allow-root option seems unnecessary, as we don't run as root, and the option only seems to be suggested for the that case. See kubeflow/kubeflow#300 too.

Sarrouna · 2022-03-23T17:43:11Z

Hi, sorry for this comment, the issue is already closed, but really I need to you on my issue: jupyterhub/the-littlest-jupyterhub#805
please and thank you a lot

jlewi added help wanted area/jupyter Issues related to Jupyter priority/p2 labels Feb 27, 2018

cwbeitel mentioned this issue Feb 28, 2018

Investigate notebook container sidecar to enable in-notebook container builds #312

Closed

inc0 mentioned this issue Feb 28, 2018

Fix #285 and make few tweaks to notebook Dockerfile #311

Merged

cwbeitel mentioned this issue Feb 28, 2018

Consider including a scanned PyPI package mirror component #313

Closed

jlewi closed this as completed Mar 26, 2018

kimwnasptd mentioned this issue Jan 30, 2019

Prepare Notebook Controller for the new Jupyter UI #2349

Closed

This was referenced May 12, 2020

Restrict access to containers meant for notebook servers to prevent non-KF notebook usage StatCan/aaw#47

Closed

Remove --allow-root ? StatCan/aaw-kubeflow-containers#31

Closed

blairdrummond mentioned this issue May 27, 2020

fix: remove --allow-root option StatCan/aaw-kubeflow-containers#49

Merged

yanniszark pushed a commit to arrikto/kubeflow that referenced this issue Feb 15, 2021

Adding master pod check for default metric collector (kubeflow#300)

8e89813

elenzio9 pushed a commit to arrikto/kubeflow that referenced this issue Oct 31, 2022

Update org.yaml (kubeflow#300)

b5527aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configure JupyterHub spawner to allow root operations by default #300

Configure JupyterHub spawner to allow root operations by default #300

cwbeitel commented Feb 26, 2018

jlewi commented Feb 27, 2018

cwbeitel commented Feb 27, 2018

jlewi commented Feb 27, 2018

aronchick commented Feb 27, 2018

mayakacz commented Feb 28, 2018

cwbeitel commented Feb 28, 2018

mayakacz commented Feb 28, 2018 •

edited

Loading

cwbeitel commented Feb 28, 2018

cwbeitel commented Feb 28, 2018

mayakacz commented Feb 28, 2018

inc0 commented Feb 28, 2018

cwbeitel commented Feb 28, 2018

cwbeitel commented Feb 28, 2018

inc0 commented Feb 28, 2018

inc0 commented Feb 28, 2018

jlewi commented Mar 1, 2018

Sarrouna commented Mar 23, 2022

Configure JupyterHub spawner to allow root operations by default #300

Configure JupyterHub spawner to allow root operations by default #300

Comments

cwbeitel commented Feb 26, 2018

jlewi commented Feb 27, 2018

cwbeitel commented Feb 27, 2018

jlewi commented Feb 27, 2018

aronchick commented Feb 27, 2018

mayakacz commented Feb 28, 2018

cwbeitel commented Feb 28, 2018

mayakacz commented Feb 28, 2018 • edited Loading

cwbeitel commented Feb 28, 2018

cwbeitel commented Feb 28, 2018

mayakacz commented Feb 28, 2018

inc0 commented Feb 28, 2018

cwbeitel commented Feb 28, 2018

cwbeitel commented Feb 28, 2018

inc0 commented Feb 28, 2018

inc0 commented Feb 28, 2018

jlewi commented Mar 1, 2018

Sarrouna commented Mar 23, 2022

mayakacz commented Feb 28, 2018 •

edited

Loading