Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Snakemake container includes Ancient Singularity Version #652

Closed
nhartwic opened this issue Oct 1, 2020 · 17 comments
Closed

Docker Snakemake container includes Ancient Singularity Version #652

nhartwic opened this issue Oct 1, 2020 · 17 comments

Comments

@nhartwic
Copy link
Contributor

nhartwic commented Oct 1, 2020

I'm trying to use snakemake with tibanna. As part of this, I have some containers that let me run my workflows. I had been using singularity-hub to host my containers, but it has a quota/limit on number of pulls per week. So singularity hub is basically useless to me.

My initial sollution was to just host the images on my aws bucket and pull them using https addresses. New versions of singularity handle this just fine. At least as of 3.5.3 which is the version I tend to use. The version of singularity packaged with the snakemake docker container is 2.6.1 Is there some good reason that the singularity version used is so old, or has it just gone ignored for a while?

@vsoch
Copy link
Contributor

vsoch commented Oct 1, 2020

Why not pull down a Docker URI to Singularity instead?

The version 2.6.1 was likely the last release that could be installed with apt. Aside from the GoLang refactor, most of the core functionality still works as you’d expect, which is why there hasn’t been huge drive to change it. I also suspect other rootless container technologies are coming into the scene that could be a good replacement.

@nhartwic
Copy link
Contributor Author

nhartwic commented Oct 1, 2020

Why not pull down a Docker URI to Singularity instead?

Can you elaborate on this. I'm' not sure what you mean.

@vsoch
Copy link
Contributor

vsoch commented Oct 1, 2020

When you specify a container with a docker:// unique resource identifier and —use-singularity the layers are pulled from
docker hub (or your other OCI registry) down into a Singularity container.

@vsoch
Copy link
Contributor

vsoch commented Oct 1, 2020

@nhartwic
Copy link
Contributor Author

nhartwic commented Oct 1, 2020

Can dockerhub host singularity images? Or are you just suggesting that I convert all of my singularity recipes to dockerfiles and then use dockerhub to host the docker images? To be clear, I'd prefer to stick with singularity since I have more experience with it.

@vsoch
Copy link
Contributor

vsoch commented Oct 1, 2020

No, it in fact is Singularity pulling down Docker layers and assembling them into a SIF at runtime. So you’d just build a Docker container for your analysis instead, and Singularity can use it.

@nhartwic
Copy link
Contributor Author

nhartwic commented Oct 1, 2020

I agree that converting my existing images to docker images and hosting them on docker hub is a sollution. I'd prefer not to as I already have a bunch of singularity containers/recipes written and would really prefer to not have to redo them all. I may end up going with this sollution, but as an exercise, I'm currently working on building a new snakemake container that includes a modern version of singularity. I can then direct snakemake to use this container instead of the default container when spinning up on aws instances using the "--container-image" flag. If that works, I can just use move my existing singularity images to s3 and pull them with https addresses. If it doesn't work, I'll at least of got some more experience dicking around with docker containers.

@vsoch
Copy link
Contributor

vsoch commented Oct 1, 2020

Sounds like a good plan! If you do work to update the singularity version in the container and think it would be useful to others, please contribute a PR. For Docker Hub, make sure to read about their updates purge policy and build/pull you containers with some frequency so they aren’t purged (GitHub actions is good for this).

@nhartwic nhartwic closed this as completed Oct 1, 2020
@nhartwic nhartwic reopened this Oct 1, 2020
@nhartwic
Copy link
Contributor Author

nhartwic commented Oct 1, 2020

Background info, I'm testing this crap using a rule that runs a busco analysis. (as a further test, I used a rule that runs miniasm and that worked completely fine)

I now have a docker container that includes snakemake and a new version of singularity. Snakemake+tibanna is correctly using the container I specify with the "--container-image" flag, my singularity image with busco, call it the Assembly Image, is then pulled successfully from s3 via https, then my job starts running, then my busco fails because Busco/Augustus can't write to a file location inside the container. This is basically a quirk of how augustus was implemented that model files basically have to be placed in the install directory for Augustus.

Here is the thing though, the same image worked fine with the default Snakemake Image. To summarize:

  1. Default Snakemake Image + Assembly Image (built by shub) : works fine and somehow permits writing to the assembly image
  2. New Snakemake Image + Assembly Image (built locally and pushed to s3) : Fails because it somehow can't write to the assembly image

...Any ideas why this is happening? For reference, you can find my Snakemake Image at...

https://github.com/nhartwic/docker-containers
docker://nhartwic/docker-containers

@vsoch
Copy link
Contributor

vsoch commented Oct 1, 2020

Your Dockerfile doesn't look much different than what snakemake uses, except you've pinned the version, is that correct?

I think if you have two cases, one working and one not, the approach I'd take is to figure out the differences between the two. Perhaps what you can do is:

  1. Use the --skip-script-cleanup and debug to grab the exact command that is issued to the container. When I'm debugging I also just add print() statements to a local install directly (e.g., pip install -e . will install a development version via the local folder)
  2. Not using snakemake, pull both your shub image, and the image from Docker, and issue the command.

See if you can reproduce the bug with the above. Then you'll have an idea of how that's related to snakemake (or possibly not).

@nhartwic
Copy link
Contributor Author

nhartwic commented Oct 2, 2020

I've verified that my Assembly image from shub doesn't work with my Snakemake image. Meaning some difference in the Snakemake image seems to be responsible for this change in behavior...

Your Dockerfile doesn't look much different than what snakemake uses, except you've pinned the version, is that correct?

The differences basically amount to...

  1. Remove all the logic related to installing singularity
  2. Install singularity using conda along with snakemake
  3. Install snakemake from conda, instead of from source (mostly because its simpler as my repo isn't connected to snakemake)

...Since this list is so short, it seems like the only candidate here is that singularity 3.X behaves differently from singularity 2.X when it comes to writes to file locations in the image. So apparently there was a good reason snakemake never updated?

@vsoch
Copy link
Contributor

vsoch commented Oct 2, 2020

I don’t know if that is true, but it’s a good theory if that is indeed the only difference. What exactly is the error message?

@nhartwic
Copy link
Contributor Author

nhartwic commented Oct 2, 2020

Here is the relevant log snippet...

INFO:   ***** Start a BUSCO v4.0.6 analysis, current time: 10/02/2020 00:57:14 *****
INFO:   Configuring BUSCO with /opt/conda/envs/myenv/config/config.ini
INFO:   Mode is genome
INFO:   Input file is Ta1014.a03.genome.fasta
INFO:   Downloading information on latest versions of BUSCO data...
INFO:   Downloading file 'https://busco-data.ezlab.org/v4/data/lineages/eudicots_odb10.2020-08-05.tar.gz'
INFO:   Decompressing file '/data1/snakemake/busco_downloads/lineages/eudicots_odb10.tar.gz'
ERROR:  Cannot write to Augustus species folder, please make sure you have write permissions to /opt/conda/envs/myenv/config/species
ERROR:  BUSCO analysis failed !
ERROR:  Check the logs, read the user guide, and check the BUSCO issue board on https://gitlab.com/ezlab/busco/issues

I'm not sure if you are familiar with Busco. But basically as part of execution Busco is going to create an Augustus gene prediction model and then tries to add it to Augustus by moving the model files into the Augustus config directory. This directory exists inside the singularity image and apparently can't be written to when singularity 3.X is used but can be written to when singularity 2.X is used, assuming my current understanding of what is happening is correct.

@nhartwic
Copy link
Contributor Author

nhartwic commented Oct 3, 2020

Since my intended sollution is apparently not going to work. On to sollution two. Build docker containers instead of singularity containers. Anyone have any experience wrapping conda envs in a docker container in a way that actually works with singularity as used by snakemake?

My initial attempt is found here...

https://hub.docker.com/repository/registry-1.docker.io/nolanhartwick/salk-containers/builds/097767b4-deae-403d-9b91-2ec9093238b3

...And was derived from...

https://pythonspeed.com/articles/activate-conda-dockerfile/

...But this isn't correctly activating the conda environment inside the singularity container that gets created by a call to "singularity pull". Any advice would be appreciated. Any documentation detailing what actually happens when singularity pulls a docker container would be appreciated.

@vsoch
Copy link
Contributor

vsoch commented Oct 4, 2020

Hey @nhartwic I can try to help! Can you please provide a dummy Snakefile and any other dependencies, along with the command you are running to reproduce the issue? If I can reproduce and understand there’s a good chance I can help.

@nhartwic
Copy link
Contributor Author

nhartwic commented Oct 4, 2020

I got there with a bit of trial and error this morning. Here is an example of the dockerfile I'm currently using to wrap a conda env that seems to cooperate with singularity...

FROM continuumio/miniconda3

WORKDIR /app

# specify  environment yml:
COPY edta.yml environment.yml

# build provided env
RUN conda env create -f environment.yml -n myenv
RUN echo "source activate myenv" > ~/.bashrc

# force bash to be used instead of shell
run rm /bin/sh && ln -s /bin/bash /bin/sh

# build singularity dir structure to hopefully hack singularity functionality when relevant
run mkdir -p /.singularity.d/env
# run echo '#!/bin/bash' >> /.singularity.d/env/91-environment.sh
run echo '. /opt/conda/etc/profile.d/conda.sh' >> /.singularity.d/env/91-environment.sh
run echo 'conda activate myenv' >> /.singularity.d/env/91-environment.sh

# Make RUN commands use the new environment (for docker execution):
SHELL ["conda", "run", "-n", "myenv", "/bin/bash", "-c"]

...A few notes on this...

  1. My hack to force singularity to use bash by default feels dangerous to me. It works though
  2. I'm basically building part of the singularity directory structure even though docker ignores it. I'm also exploiting the implimentation of singularity containers. I don't think we are supposed to be directly writing to "91-environment.sh", but whatever, it works
  3. I've no idea if this actually works if you just run the container using docker

...All that said, the end result of this is that any command singularity sends to the container it builds based on the docker container that this docker file specifies will be executed with "myenv" activated using the typical activation method of conda, which should work for essentially any conda env.

@nhartwic
Copy link
Contributor Author

nhartwic commented Oct 4, 2020

At this point I think I have a sollution that works. And the original purpose of this issue has been explored, if not completely. I'm willing to call this issue closed.

@nhartwic nhartwic closed this as completed Oct 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants