Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process writes to host home directory which is mounted in the container #734

Open
skrakau opened this issue Sep 3, 2020 · 6 comments
Open
Labels
template nf-core pipeline/component template

Comments

@skrakau
Copy link
Member

skrakau commented Sep 3, 2020

Hi, within the epitopeprediction pipeline, the tool mhcflurry tries to create a folder within the HOME directory to store some downloaded data:

However, the fact that the host HOME directory is included into the container, allowing tools to actually write to and read from it, potentially unnoticed, is from a reproducibility perspective not ideal and should probably be avoided.

See also a longer discussion by @lkuchenb, @drpatelh, @pontus and @pditommaso about this topic on the slack help channel.

@pontus
Copy link
Contributor

pontus commented Sep 3, 2020

My analysis here would be that when running docker with the -u $(id -u):$(id -g), there is no user information available inside the container, so $HOME will likely be set to /, and anything that relies on creating stuff there should fail (this should also happen with Docker on Linux as far as I can understand).

@pditommaso
Copy link

pditommaso commented Sep 3, 2020 via email

@lkuchenb
Copy link

lkuchenb commented Sep 3, 2020

IMHO having the host home mounted and set as $HOME in the container is a general source of potential side effects, e.g. RC files for any tool used inside the container may break reproducibility. Providing a dedicated, empty host folder as $HOME into the container would avoid such issues.

@pontus
Copy link
Contributor

pontus commented Sep 4, 2020

I also think pointing HOME somewhere would be best (new temporary directory or possibly the current work directory for simplicity).

Configuring tools seems reliant on additional testing and seems an unnecessary introduction of disparities between various runtime environments (docker, singularity, conda).

As mentioned, tools reading rec files can break reproducibility in a way that seems difficult to handle well in testing.

@ewels
Copy link
Member

ewels commented Feb 17, 2021

This looks very similar to an issue mentioned on Slack by @maxibor

@skrakau - did you try creating an empty directory in the container and then setting the $HOME env variable to that in the Dockerfile? eg. super crude example that probably would break but hopefully you get the idea:

RUN touch /home
RUN chmod 777 /home
ENV HOME=/home

If this is a general solution for this problem then we could add this to the nf-core base docker image that all other custom images are built from and it should work for everyone. Maybe. Might cause other problems.

@ewels ewels added the template nf-core pipeline/component template label Feb 17, 2021
@pontus
Copy link
Contributor

pontus commented Feb 17, 2021

For consistency (with respect to not reading random configuration files), it's probably desirable to have docker and singularity behave similarly (ignore bound directories/set up home). That likely means a static solution runs into issues with read-only mounts for other engines (e.g. singularity).

I'm not sure if we could set something up that will be sourced by the called process, but don't immediately see a solution to do it for docker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
template nf-core pipeline/component template
Projects
No open projects
Development

No branches or pull requests

5 participants