Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Singularity containers (authentication) #277

Closed
nathanweeks opened this issue Feb 9, 2018 · 10 comments
Closed

Support Singularity containers (authentication) #277

nathanweeks opened this issue Feb 9, 2018 · 10 comments

Comments

@nathanweeks
Copy link

Singularity is becoming commonplace on High-Performance Computing clusters for running containers. Importing rocker/rstudio (via pull docker://rocker/rstudio:3.4.3) results in a functioning rstudio image (via singularity exec rstudio-3.4.3.simg rserver. This is of course insecure on a multiuser HPC cluster; however, setting the PASSWORD environment variable and invoking rserver with --auth-none 0 doesn't result in functioning authentication (note that Singularity containers lack a r/w overlay when not run as root---and a normal user on an HPC cluster won't have root access).

A recently-added GitHub repository (https://github.com/nickjer/singularity-rstudio) provides a single RStudio Server Singularity image with a PAM helper script that enables password-based authentication using a password set via environment variable.

Would it be feasible to add support for such a Singularity-amenable authentication mechanism to Rocker images so that the bread of image types and versions provided by the Rocker project can be utilized in an HPC environment?

@eddelbuettel
Copy link
Member

Part of me feels that there is something about a password in an environment variable that just doesn't taste right. Even if it is just for RStudio authentication.

@cboettig
Copy link
Member

cboettig commented Feb 9, 2018

@nathanweeks Thanks for the report, worth exploring at least but still some things I have to figure out.

I've only run rocker images on singularity before in R console mode where RStudio auth is not needed (I imagine most hpc users would be comfortable with terminal R) but I suppose RStudio could work over a port forwarding tunnel? (I gather most HPC compute nodes also would firewall any other incoming connection anyway). Not exactly sure what you have in mind vis a vi the RStudio part here -- can you outline this a little bit more?

In any event, it may make more sense to bypass the RStudio's proxy server and run rsession directly. I can follow up with our singularity folks here when I get a chance.

@nathanweeks
Copy link
Author

@cboettig : A number of potential users of the HPC cluster in question are scientists (mostly biologists) with (mainly) Windows laptops/desktops that aren't powerful enough for their data-analysis needs, or who would benefit from access to large shared data sets on the cluster, and who are not HPC experts (or even necessarily comfortable with the Unix command line). RStudio Sever has been on the wish list for some of these users but rather than buying additional nodes (with scant budget) dedicated to RStudio Server that may sit idle or become overloaded, it would be preferable to have them be able to use the same cluster nodes (in the modestly-sized cluster) like any other job.

Adding to the example in the https://github.com/nickjer/singularity-rstudio README, I've used a SLURM job script similar to the following to start rserver from the Singularity image on a compute node:

#!/bin/sh
#SBATCH --time=08:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem=8192
#SBATCH --output=/home/%u/rstudio-server.job.%j

export RSTUDIO_PASSWORD=$(openssl rand -base64 15)
# get unused socket per https://unix.stackexchange.com/a/132524
# tiny race condition between the python & singularity commands
readonly PORT=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')
cat 1>&2 <<END
1. SSH tunnel from your workstation using the following command:

   ssh -N -L 8787:${HOSTNAME}:${PORT} ${USER}@LOGIN-HOST

   and point your web browser to http://localhost:8787

2. log in to RStudio Server using the following credentials:

   user: ${USER}
   password: ${RSTUDIO_PASSWORD}
END
singularity exec /path/to/singularity-rstudio.simg \
    rserver --www-port ${PORT} --auth-none 0 --auth-pam-helper rstudio_auth
printf 'rserver exited' 1>&2

Access is via SSH tunnel through the login node (in our case VPN is also an option).

The generated password should be usable only for the duration of the job, so I think this would be reasonably secure.

@cboettig
Copy link
Member

@nathanweeks Okay, thanks for clarification. I'm completely supportive of this use case, just wrapping my head around the details.

So I'm not a security professional, but in this setup, it's not clear to me why you care about having authentication at all. The user has an authenticated, secure ssh connection to the host machine with port tunnel, right? Why not just do

 singularity exec docker://rocker/rstudio:3.4.3 rserver --auth-none 0

and log in with the default rstudio/rstudio user? This seems to work just fine on my test. I'm not sure what is really gained by the --auth-pam-helper script in this process?

@nathanweeks
Copy link
Author

nathanweeks commented Feb 12, 2018

@cboettig : users can have a port-forwarding SSH connection to the login node, but direct SSH to a compute node (where rstudio server would run) is not allowed on our cluster (or AFAIK most HPC clusters); rather, interactive jobs are scheduled & run via SLURM's srun -pty or salloc commands, which lack a facility for port forwarding (though there appears to be a dated SLURM plugin for this: https://github.com/harvardinformatics/spunnel).

One approach that has been use for running Jupyter notebooks on an HPC cluster is to use reverse port forwarding from a random port on the compute node the job is scheduled to the same port on the login node. However, this approach is insecure without authentication. It could be more secure with SSH reverse port forwarding to the login node to a Unix domain socket that is accessible only to the user, though other (interactive) jobs on the same compute node could connect to the rserver tcp port and still pose a security risk (allocating the entire node to the rstudio server job and listening on a port on localhost could prevent this, but waste resources). And Rstudio Server doesn't have the capability to listen on a Unix domain socket (in the community version), so one couldn't do SSH reverse port forwarding from a Unix domain socket accessible on the compute node to a Unix domain socket accessible from the login node.

[Edit: the approach I described above for RStudio Server has apparently previously been used for Jupyter notebooks, with a different, Jupyter-specific approach to passwords )

@cboettig
Copy link
Member

@nathanweeks Thanks for the follow-up. Yeah, I appreciate the difficulty of the port-forwarding strategy when running on the compute node (i.e. since SLURM and not a user ssh command is handling the connection between login node and compute node); but I don't know enough about SLURM to see how to get around that.

It seems you have found a work-around to that issue(?), which I don't entirely follow. I think you're suggesting that the work-around could however pose a security risk, at least between other authenticated users running interactive jobs on the same compute node, and thus suggesting a password for this. It looks like this should work 'out-of-the-box' so to speak, with the existing rocker containers (either using the pam script or just changing the default password)

I must admit that it seems a little strange to think about user security issues on HPC compute nodes -- this would seem like either it is a security issue common to any program on a shared compute node (in which case why worry about this one in particular?) or it is unique to this use case (in which case I think I'd be making my sysadmin pretty uncomfortable if I was introducing a security vulnerability and then just promising to work around it with a password). Feels to me like there should be a more natural solution?

@nathanweeks
Copy link
Author

@cboettig , I think the primary security risk for running rserver without authentication on an HPC compute node is that another user's process running on the same compute node (which is expected on HPC clusters that allow node sharing) could connect to the TCP port that rserver is listening on and manipulate files/data that . It seems like a strong, temporary password that is known only to the user running the rserver process should be good suffice (but I'm certainly not a security expert).

If I'm interpreting the rocker-specific script that sets the password to the value of the PASSWORD environment variable (https://github.com/rocker-org/rocker-versioned/blob/master/rstudio/userconf.sh), it does this by executing privileged commands in the container, modifying files in /etc, etc. Unless run as root to enable persistent overlays (http://singularity.lbl.gov/docs-overlay), Singularity containers are immutable, so an unprivileged user can't modify files in /etc to change the password.

However, your suggestion regarding simply using the PAM helper script provided at https://github.com/nickjer/singularity-rstudio (setting --auth-pam-helper-path=/path/to/rstudio_auth.sh) seems to work for rocker. I'll have to test this further, but perhaps this is the best solution? If so, would it be problematic to bundle a similar script with rocker for convenience?

@cboettig
Copy link
Member

@nathanweeks Sounds like a good idea to me. Would you like to send a PR to add such a script to the rstudio image?

We also need to figure out how to document this approach. Would you be up for writing a quick overview .md that we could drop into https://github.com/rocker-org/website/tree/master/content/use so it appears on the rocker-project.org menu? (Still figuring out what documentation works for most users, somehow people don't often find the wiki on rocker-org/rocker but not sure this is better).

@nathanweeks
Copy link
Author

Sure, I'll give it a go.

@cboettig
Copy link
Member

Closing this out since @nathanweeks has provided a solution above and now merged into the web docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants