Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Singularity Support #4175

Merged
merged 1 commit into from Jun 10, 2017

Conversation

Projects
None yet
2 participants
@jmchilton
Copy link
Member

commented Jun 9, 2017

See http://singularity.lbl.gov/ for more information on Singularity and see comments added to job_conf.xml.sample_advanced for information on setting up job runners to exploit singularity.

The biggest current caveat is probably that currently these container images need to be setup manually by the admin and the paths hard-coded for each tool in job_conf.xml. There are people who do such manual setups for Docker (https://github.com/phnmnl/container-galaxy-k8s-runtime/blob/develop/config/job_conf.xml) - so it wouldn't be surprising if someone wanted to set this up for singularity as well. That said I'm sure this will be followed up by magic to fetch and convert Docker containers and leverage published singularity containers (such as mulled can now produce galaxyproject/galaxy-lib#64 - thanks to @bgruening). Work on maintaining singularity image caches would really benefit from completing #3673 for Docker first IMO.

Implement singularity support.
See http://singularity.lbl.gov/ for more information on Singularity and see job_conf.xml.sample_advanced for information on setting up job runners to exploit singularity.

The biggest current caveat is probably that currently these container images need to be setup manually by the admin and the paths hard-coded for each tool in job_conf.xml. There are people who do such manual setups for Docker (https://github.com/phnmnl/container-galaxy-k8s-runtime/blob/develop/config/job_conf.xml) - so it wouldn't be surprising if someone wanted to set this up for singularity as well. That said I'm sure this will be followed up by magic to fetch and convert Docker containers and leverage published singularity containers (such as mulled can now produce (galaxyproject/galaxy-lib#64).
@bgruening

This comment has been minimized.

Copy link
Member

commented Jun 10, 2017

Welcome to the @jmchilton games! :-) I will get the Galaxy Docker testing done.
What do you think about a simple tool_deps/_singularity/ dir for the start where Galaxy is searching for mulled names and picks it up automatically.

@jmchilton

This comment has been minimized.

Copy link
Member Author

commented Jun 10, 2017

@bgruening Well I'd like to auto convert Docker images as needed and I'd like to also auto-build Singularity as needed so how about...

database/container_cache/singularity/mulled

This way

database/container_cache/singularity/from_docker

can be used for auto-converted Docker images and down the road we can use

database/container_cache/docker

for #3673.

I'll redo the two of three mulled container resolvers for Singularity - the first one will just check that directory for existing images (that you can pre-populate for testing if you'd like), the second will build an image using your work from yesterday as needed. Down the road the a third would check for a published Singularity mulled image - but we need to publish them first I think :).

@jmchilton

This comment has been minimized.

Copy link
Member Author

commented Jun 10, 2017

@bgruening I'm going to keep this PR clean because this part doesn't require the mulled enhancements we've been working #4173 - so I've started a new branch for testing mulled+singularity (https://github.com/jmchilton/galaxy/tree/singularity_mulled) which has this commit added to #4173 - I'll let you know if I make any progress.

@bgruening

This comment has been minimized.

Copy link
Member

commented Jun 10, 2017

John what do you mean by 'from_docker'? Singularity2Docker is an ugly hack. Not sure if it is worth to support. Why in database/ this is more tool_deps, no?

@jmchilton

This comment has been minimized.

Copy link
Member Author

commented Jun 10, 2017

I maybe be wrong that we need a cache for Docker images - I have not looked into this closely for sure. But the newest Singularity supports this conversion natively right - http://singularity.lbl.gov/docs-docker? If there are like intermediate files that can be cached to improve the performance of that or something we could use that directory. Regardless though I'd like to maintain a cache for non-mulled Singularity options. I'd like to support other more ad hoc modalities of container specification - the way we do for Docker - going forward.

@bgruening

This comment has been minimized.

Copy link
Member

commented Jun 10, 2017

For sure we should support more use-cases than the mulled based ones. I was thinking you are talking about https://github.com/singularityware/docker2singularity.
I'm not aware of any intermediate flat files, but I'm also not an expert. There is one folder under /usr/local/var/singularity/mnt/container which is used, but this is hardcoded during compile-time. I'm in contact with the devs to make this configurable, but afaik this also does not store Docker images and as soon as you have the image generated it is not needed. There is a strict distinction between build time and run time and the images can be share and transferred via NFS for example, which is not possible with Docker images, afaik.

Regarding database/container_cache/docker, I'm also not aware of any way to cache Docker images and distribute them across the cluster.

@bgruening

This comment has been minimized.

Copy link
Member

commented Jun 10, 2017

Looks great, thanks John!

@bgruening bgruening merged commit 3bff7f3 into galaxyproject:dev Jun 10, 2017

5 checks passed

api test Build finished. 279 tests run, 0 skipped, 0 failed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
framework test Build finished. 150 tests run, 0 skipped, 0 failed.
Details
integration test Build finished. 34 tests run, 0 skipped, 0 failed.
Details
toolshed test Build finished. 579 tests run, 0 skipped, 0 failed.
Details
@jmchilton

This comment has been minimized.

Copy link
Member Author

commented Jun 12, 2017

Thanks for your wisdom regarding Singularity and I'm happy you are open to non-mulled Singularity deployment options. I'm sure we both agree the mulled use case should be the best practice and the primary initial focus though.

Regarding database/container_cache/docker, I'm also not aware of any way to cache Docker images and distribute them across the cluster.

Galaxy can already be configured to do this already 89f8d36 - the implementation is a bit hacky but it seemed like a very important optimization for large workloads and doing Docker at scale. Kyle's use case back when we added this made a lot of sense - if you have 40 workers each pulling the same image down independently it is probably much slower (and maybe more expensive) than pulling it down once and then sharing it over faster local NFS. From a reproducibility standpoint it should also be clear that it would be better if you knew every worker was going to use the same container. I really want to take this further - add a GUI and management API - always prefetch containers the way we prebuild dependencies etc....

Also I didn't respond to why this should be in database instead of tool_dependency_dir IMO. This makes some sense conceptually but in practice it seems very likely that you'd want some nodes to run container images and other nodes to run with Conda dependencies - so you'd potentially want them on different mounts accessible to different clusters - that seems easier if they aren't nested. Also the containers are generated from the dependencies but they cause the dependencies to be ignored (usually) - so even from a sort of conceptual standpoint I'm not convinced they should be nested.

Anyway - thanks for the merge and I'll follow up with a second PR in a bit adding the mulled stuff you graciously debugged and tested.

@jmchilton jmchilton deleted the jmchilton:singularity branch Jun 12, 2017

@bgruening

This comment has been minimized.

Copy link
Member

commented Jun 12, 2017

Galaxy can already be configured to do this already 89f8d36 - the implementation is a bit hacky but it seemed like a very important optimization for large workloads and doing Docker at scale. Kyle's use case back when we added this made a lot of sense - if you have 40 workers each pulling the same image down independently it is probably much slower (and maybe more expensive) than pulling it down once and then sharing it over faster local NFS. From a reproducibility standpoint it should also be clear that it would be better if you knew every worker was going to use the same container. I really want to take this further - add a GUI and management API - always prefetch containers the way we prebuild dependencies etc....

I don't think this is very practical in production and comes with a few implications. I think what people will do instead is hosting a local registry and mirroring important images in a local network. But fair enough to keep this functionality.

Also I didn't respond to why this should be in database instead of tool_dependency_dir IMO. This makes some sense conceptually but in practice it seems very likely that you'd want some nodes to run container images and other nodes to run with Conda dependencies - so you'd potentially want them on different mounts accessible to different clusters - that seems easier if they aren't nested. Also the containers are generated from the dependencies but they cause the dependencies to be ignored (usually) - so even from a sort of conceptual standpoint I'm not convinced they should be nested.

I think you assume to much knowledge about the containers from the admin. Imho people will see containers, pretty much like conda, as requirement for dependencies. They should not care and don't need to know from where they were derived. Intuitively I would search such things in tool_deps and not in database. if you want to have an advanced setup with splitting conda and singularity images to different nodes you can change the default config path in your galaxy.ini right? We are talking here about the standard/default location, aren't we? Imho this should be intuitive by default - complex for advanced setups.

@jmchilton

This comment has been minimized.

Copy link
Member Author

commented Jun 13, 2017

I don't think this is very practical in production and comes with a few implications. I think what people will do instead is hosting a local registry and mirroring important images in a local network. But fair enough to keep this functionality.

Perhaps - I'd like to hear the implications. A local registry may be better - I hadn't considered this and I'm unaware of the pros and cons versus disk images. I think we can agree at large scale the current setup of talking to quay.io and Dockerhub doesn't seem to scale well to me.

We are talking here about the standard/default location, aren't we? Imho this should be intuitive by default - complex for advanced setups.

Indeed - though I don't find the current setup any less intuitive than separating tool dependencies from container caches - and perhaps I think it is more intuitive to have them separate.

@bgruening

This comment has been minimized.

Copy link
Member

commented Jun 13, 2017

You basically loose all the advantages of the layered file system. Consider a node pulling two base images, that share a common base image, with a registry you will only pull the diff. Deploying a registry also seems to be as easy as:

docker run -p 5000:5000 --restart=always registry:2

But I feel uncomfortable to argue about this, I'm not an expert in this, just saying what people tend to use.

Indeed - though I don't find the current setup any less intuitive than separating tool dependencies from container caches - and perhaps I think it is more intuitive to have them separate.

Why? My argument is that conda, docker, singularity is all the same stuff - it powers our tools based on different technologies. Why are we doing this arbitrary split. Taking your argument why should data be in the same folder as an image cache - people probably want to have this separated as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.