Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure JupyterHub spawner to allow root operations by default #300

Closed
cwbeitel opened this issue Feb 26, 2018 · 17 comments
Closed

Configure JupyterHub spawner to allow root operations by default #300

cwbeitel opened this issue Feb 26, 2018 · 17 comments
Labels

Comments

@cwbeitel
Copy link
Contributor

Regarding notebook server spawner options https://github.com/kubeflow/kubeflow/blob/master/kubeflow/core/jupyterhub_spawner.py#L79 currently c.KubeSpawner.args is being provided the arg"--allow-root" presumably because the original implementer meant for users to have root access in containers. But currently this is not available.

By adding "GRANT_SUDO=yes" to the environment users do have sudo privs in spawned notebooks, e.g. can successfully run sudo apt-get update:

###################################################
# Spawner Options
###################################################
c.JupyterHub.spawner_class = KubeFormSpawner
c.KubeSpawner.singleuser_image_spec = 'gcr.io/kubeflow/tensorflow-notebook'
c.KubeSpawner.cmd = 'start-singleuser.sh'
c.KubeSpawner.args = ['--allow-root']
# First pulls can be really slow, so let's give it a big timeout
c.KubeSpawner.start_timeout = 60 * 10
c.KubeSpawner.environment = {
    "GRANT_SUDO": "yes"
}

If making this addition should determine if there is anything in c.KubeSpawner.environment to over-write and if so append the variable instead of setting all.

Related to #289

@jlewi
Copy link
Contributor

jlewi commented Feb 27, 2018

Seems useful as this would also allow people to install packages e.g. via apt-get.

Any info on what the security implications are?

@cwbeitel
Copy link
Contributor Author

I agree that seems worth considering and potentially problematic but I don't have any expert understanding in that area to contribute.

From my cursory reading it seems that running containers with root credentials makes it more likely that a kernel vulnerability could be exploited to gain access to the node running that container, given that with Docker the kernels are shared?

@jlewi
Copy link
Contributor

jlewi commented Feb 27, 2018

I think the use cases of

  • Mounting NFS volumes
  • Installing additional packages
  • Image building

Are convincing enough that I think its worth doing until we have more reason to believe its unsafe and should disable it.

@jlewi jlewi added help wanted area/jupyter Issues related to Jupyter priority/p2 labels Feb 27, 2018
@aronchick
Copy link
Contributor

cc @mayakacz

@mayakacz
Copy link

Containers that don't need to run as root shouldn't run as root. See for example: https://docs.docker.com/engine/security/security/#linux-kernel-capabilities

My concern is the earlier comment, "its worth doing until we have more reason to believe its unsafe" - I strongly disagree. Making this kind of change later would be breaking, and it'll be painful to get users to migrate from an unsafe state to a safe state. If we can prevent this initially, that is preferred.

Ideally, we provide the container with only the capabilities it needs, following the principle of least privilege.

@cwbeitel
Copy link
Contributor Author

I agree with seeking to provide the least priv possible and preventing this initially if possible and to prevent breaking changes.

I think the debate should be whether installing system packages is a requirement of the developer experience because people can pip install in a virtual environment, we can mount NFS on startup, and we can probably solve the docker problem without root (e.g. we could launch notebook pods with a docker daemon sidecar a-la-dind).

On the no side perhaps we want to provide a rich base image with all the common tools already installed (e.g. ffmpeg for rendering) and expect that installation of package requiring system-level privs. will be infrequent and involve re-building the workspace container (followed by non-root users launching the container).

On the yes side it seems like it would be nice to have a development environment with the full capabilities of local development.

The worst case scenario seems to be that the user installs malicious packages that lead to an attacker gaining access to the kubernetes credentials for the underlying cluster. It doesn't render the issue moot, perhaps just raising another, but currently this can already happen since people are pulling kube and gcloud credentials to the notebook so they can use kubectl, gcloud, gsutil, etc. utilities.

@mayakacz
Copy link

mayakacz commented Feb 28, 2018

A few questions (apologies if I am missing context):

  • How common do we think this set of packages will be across developers? This seems to be a configuration question - if it's highly custom per deployment, then we might not want to include; but if it is very consistent, then providing the set of packages, that have already been vuln scanned with minimum permissions, is a great idea. For the 'yes' side, this is a commitment to maintain certain security standards and release frequencies with this image - potentially a lot of work.
  • You mention, "this can already happen since people are pulling kube and gcloud credentials to the notebook". Do you mean people are storing these creds directly in a Jupyter notebook? Is there no built-in/ recommended secret mgmt solution? (or rather, are you saying a solution exists, but people aren't following the best practice?)

@cwbeitel
Copy link
Contributor Author

No worries sorry if I'm missing experience with security!

Yeah that's a good question in terms of system packages these should be pretty common. Everyone debugging their TensorFlow Agents renders in the notebook is going to need libav-tools and ffmpeg, see also extensions of the base notebook image. Users will often need to install pip packages or change the version of those already installed but guessing this should just be done in a conda or other dependency env.

@danijar I'm guessing you don't very often need to run sudo-level operations and that you're primarily installing and changing versions of pip packages, is that correct? How would you feel to develop in an environment without sudo?

@mayakacz Is there any concern about allowing users to install arbitrary python packages at the user level? Perhaps this gets at your second point. Here's an example of such a notebook where "gcloud container clusters --project={PROJECT} --zone={ZONE} get-credentials {CLUSTER}" was called to obtain kube credentials which were stored in the default location. Also this particular cluster was deployed with the cloud platform scope

gcloud beta container clusters create \
       $CLUSTER \
       --project ${PROJECT} \
       --zone=$ZONE \
       ...
       --scopes=https://www.googleapis.com/auth/cloud-platform

which is why you're already authenticated to make gcloud and gsutil calls.

Kubernetes has this mechanism for storing secrets but the end result is just the credential saved to disk the same as results from gcloud container clusters get-credentials...

Given the above would you want to do something like mirror a bunch of popular ML packages and regularly scan those?

@cwbeitel
Copy link
Contributor Author

@jlewi You mentioned NFS, do you anticipate people needing to be able to mount additional NFS volumes to existing notebook containers that aren't pre-attached? Idk maybe you're thinking of a variety of different datasets on different NFS volumes available to mount?

@mayakacz
Copy link

@cwbeitel "Is there any concern about allowing users to install arbitrary python packages at the user level?"
Not particularly. I assume they know what they want and are then responsible for that security. My push was more if there is something that 50%+ (arbitrary #) are using, then we should do that for them.

Re: credentials storage in secrets, yes, I believe that is the best way to do it with what's available today.

"Given the above would you want to do something like mirror a bunch of popular ML packages and regularly scan those?"
Yes. Having a repository of scanned packages, with some commitment to patch would be great. (It is work though, and I understand if it's not a top priority.)

@inc0
Copy link

inc0 commented Feb 28, 2018

Thing is, we don't have this security right now - if person wants to run notebook as root, they can specify their own image that'll run as root. In fact, we explicitly run pod as root now and su to notebook user as part of start.sh.

I'd go for running it all as root tbh.

@cwbeitel
Copy link
Contributor Author

Ok cool so far it sounds like providing a rich base container without sudo is the way forward but interested to hear from the others.

Also @jlewi perhaps its not necessary to even build training containers - perhaps we could just share model code with training containers via NFS and either also share dependency code that way or have training containers in their first step pip install a requirements.txt file. This would extend whatever is available in the workspace environment with user's collection of custom packages. The strategy for this not slowing down the initiation of training jobs could be to have the workspace container already include packages like pybullet and tensorflow that take the longest to build with most other python packages being quick to install.

This related to whether users are doing more than installing pip packages in their workspace container because if they're not we don't really need to mess with all this container building infrastructure.

@cwbeitel
Copy link
Contributor Author

@inc0 posted my last before seeing yours. Right, users aren't currently prevented from running root containers. Are you advocating root to avoid restricting your/the development experience? That seems like an important thing to consider.

@inc0
Copy link

inc0 commented Feb 28, 2018

@cwbeitel as I said on slack, we can do 2 things:

  1. restrict to jovyan user, that'd need us to fix jovyan UID on creation (good idea anyway) and run pod with this uid. That will require that everyone wanting to run their custom images to have this user with this image... hard.

  2. just run as root, there is some security considerations but that's usually how people run pods anyway, containers aren't meant to be security features anyway.

I'd go with root at least for now, will make quick simplification and I can't of any real security issue (if someone jailbreaks out of containers, that's bad tho).

ad nfs share -Yeah, I was kinda wondering why do we want to build container images in containers:) not a typical usage. I'd rather put model on PV rather than nfs share. More k8s'y and would allow to use proper storage infra. NFS can be an option that we might not need to worry about, if someone creates StorageClass with nfs, they're good.

@inc0
Copy link

inc0 commented Feb 28, 2018

So I'd add cleanups to this issue as well - we have a lot of messing with perms and users in our start.sh script and others - if we run as root both jupyter dockerfile and this could lose a lot of code.

@jlewi
Copy link
Contributor

jlewi commented Mar 1, 2018

@mayakacz Convinced me that I was being too permissive. I think we should avoid granting sudo privileges so that we don't create more work in terms of locking down the container in the future.

@jlewi jlewi closed this as completed Mar 26, 2018
blairdrummond added a commit to blairdrummond/kubeflow-containers that referenced this issue May 27, 2020
Closes StatCan#31 . This --allow-root option seems unnecessary, as we don't
run as root, and the option only seems to be suggested for the that
case.

See kubeflow/kubeflow#300 too.
@Sarrouna
Copy link

Hi, sorry for this comment, the issue is already closed, but really I need to you on my issue: jupyterhub/the-littlest-jupyterhub#805
please and thank you a lot

elenzio9 pushed a commit to arrikto/kubeflow that referenced this issue Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants