-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syncing with galaxy-docker-project #2
Comments
Hi guys! I gave this a first try today (well, actually, this would be like my third try so far, I have attempted this in the past but have been a bit overwhelmed by the amount of details needed in contrast to my availability and other priorities), but it is going to require some work. Basically what we are doing currently in the phenomenal helm chart setup is:
These conditional variables are currently taken care by logic inside the phenomenal container
On the build side of our container (so not in the runtime of the container in k8s), it currently does:
I think this covers most of it. I have started efforts to duplicate env variables injected in helm to cover most of the needed GALAXY_CONFS_*, but for deciding when to trigger most of the functionality that I need to be able to use a particular git revision of a defined galaxy git repo (fork). Sometimes my PRs for k8s don't make it in time to the galaxy releases, so I need to use certain releases with some of my commits on top, to deliver functionality for our own releases in time. This is easy in the current scenario where I control our galaxy container, but would be more complex (or I fail to see if it possible) if moving to the docker-galaxy-stable ones. I have started a flavour inside compose for a galaxy-k8s which is derived from galaxy-base and has most of galaxy-web functionality but getting rid of slurm and other schedulers that we (in our project) don't need. Maybe we should discuss where in the hierarchy of compose image this should go, and maybe some containers down the line could later add the required things for other schedulers. confd sounds like a good idea, but I would go by parts I guess, to have something functional soon, and later introduce more sofistication. Orchestrating other containers (like the ftp part) shouldn't be a problem, but I would first aim to sort all the issues above. The main complexity is that there is some loose coupling between the helm chart version and galaxy container to be used, and I need to keep maintaining the working ones as well (happily both objects are versioned, so this shouldn't be an issue). While I'm eager to integrate more to the docker-galaxy-stable compose containers as discussed with @bgruening, moving away from helm is a no-no for me (for the reasons I showed to @bgruening yesterday in Paris) and would seriously hamper my ability to pursue this further integration (as my main responsibility is to have something working on PhenoMeNal). I hope that this is useful! |
@pcm32 Thanks for this excellent write-up Pablo, it's very detailed and really helpful! I tend to agree that we should start with something simple and functional, and refactor as necessary to introduce more complexity as required. I'm hoping that the Helm chart will completely insulate us from those issues, and allow us to refactor the container arrangement as necessary. One issue that comes to mind, is how to handle different flavours of Galaxy. This is particular important for the GVL/Galaxy-on-the-cloud, to achieve the desired level of feature parity with existing CloudMan deployments. In the current incarnation of GVL/Galaxy-on-the-cloud, this is achieved by using a VM image, which is matched at runtime to a user-selected tarball containing Galaxy+the tool database, resulting in the "flavour". In addition, it could also have prepopulated datasets, and whatever else that's desired. This also has the added advantage that the database schema does not have to be created at runtime, making startup much faster. Although we've been using a single compressed postgres database so far, @jmchilton suggested that a different approach would be to have the tool database in sqlite, which could simply be mounted into the Galaxy container. I think that sounds like a much better option. A drawback is that we won't be able to have any other pre-populated artefacts, like workflows, shared histories etc. So far, the current phnmnl container loads the workflows into the database at runtime, and does not use any toolshed tools, is that correct? My previous experience has been that it's not very practical to install tools from the toolshed at runtime - it takes a really long time to install a considerable number of tools - although I'm not sure whether this can be significantly cut down by using the Dockerized tool versions. @bgruening What approach are you using? What would be a good way to achieve an effect similar to the above? How can we dynamically extract a tarball containing the database and link it up to the container? Or is there a more desirable option? |
I will try to answer to all points but will skip the
I guess this can be fixed by changing the configs in
Solved, with GALAXY_CONFIG
This is already possible. If you mount files into the /export/ directory it will be taken up.
Or we should use this. The startup script is really complex nowadays ... on the otherhand ansible + python is quite big. Not sure about this point.
Jupp, lets add this.
On the build side of our container (so not in the runtime of the container in k8s), it currently does:
Is this a big problem? I happy to update our images, the only reason I have not done this, or I'm conservative with this is that users need to migrate there database to a potentially new postgresql version.
We can add those during runtime with
Would be nice to use this to also slim down our images.
You can do this with:
Oh yes, this should be fixed. Ideally in Galaxy main.
Do they need in the container, or can we mount them into /export/, assuming we still need manually edited config files.
We have solved this here:
I'm wondering why this is needed.
This could be used:
Yeah, this should be part of the the Phenomenal flavor.
What is this? Is this general testing, can we move this to a separate testing container as here for example: https://github.com/bgruening/docker-galaxy-stable/tree/master/test
This should be upstreamed into Galaxy. You can also mount this in via
Let me know if I can help here. In general everything should be configurable via ENV.
easy to do with: https://github.com/bgruening/docker-galaxy-stable/blob/master/compose/buildlocal.sh#L5 or with Docker ARGs: https://github.com/bgruening/docker-galaxy-stable/blob/master/galaxy/Dockerfile#L14
Please make a proposal, this should be easy to sort out I hope.
Fully agree.
I'm wondering if we could generate a helm chart out of the yaml and the metadata which we already have. So that we provide some small tool ... |
I don't see why this is different from the current state. Falvors are just adding tools/workflows to a bare-high-quality instance.
Have a look at how a flavor is created here.
It gets never created during startup time.
I guess I'm missing something here. Why is this so complicated? We create predefined flavors and pull these containers down if needed, no? So far, the current phnmnl container loads the workflows into the database at runtime, and does not use any toolshed tools, is that correct? My previous experience has been that it's not very practical to install tools from the toolshed at runtime - it takes a really long time to install a considerable number of tools - although I'm not sure whether this can be significantly cut down by using the Dockerized tool versions. @bgruening What approach are you using? We install tools during bulid time. This is fast, as it is just downloading the conda packages. However, it makes the image big/huge. If size matters we can put the conda-envs into CVMFS and share it or install them simply via Conda during tool-runtime.
I think this is way to complicated. Let's create prebuild images and pull them down as needed. |
Thanks @bgruening and @nuwang for the excellent feedback... I'll try to advance based on the suggestions that @bgruening added in-line. I think I have most of the helm changes needed by now. |
@bgruening I don't mind about ubuntu:14.04, I was just listing all the steps for the sake of completion. |
We are going to have to think of ways to handle database upgrades transparently, and while I don't think we need to solve that problem during the first iteration, it would be good to keep a discussion going on strategies for handling this. This is a discussion on the official Postgres docker image: docker-library/postgres#37
Ok, so you are using a bundled sqlite database as the tool database, and installing all tools into the Galaxy container? If this scales well, I don't see an insurmountable problem. For comparison, how big is the tool database on Galaxy main?
A few reasons - my understanding was that these container sizes were very large, in the order of 10GB+ when there are a sizeable number of tools. That means we are looking at a 20 minute+ install time, vs < 7 minutes, which has generally been our performance target (up from the original 2 minutes that CloudMan took in the days of using volume snapshots). The CVMFS option is sounds good. The second is that it doesn't address how we can have a pre-populated user database, say workflows. Again, importing at startup maybe ok, but impacts startup time. Which is why I was wondering whether there was a good solution for handling these kinds of pre-populated databases that would generalise well to any database connected app, in addition to Galaxy. Also, relates to upgrade problem above.
Agree, this seems like the sane approach to start with. |
If we are here talking about upgrading postgresql databases, this is only relevant for persistent Galaxy's over many years. And in this case I assume that an admin knows how to upgrade a postgresql datatbase from one version to the other. I don't think this is a problem we need to solve, especially not now.
No sqlite. Everything is in postgresql. And this are just entries in a DB, some rows. Maybe a few thousands. This is imho neglectable in size.
The container size is only large if you include "tool-dependencies". This is not a must anymore.
Correct me if I'm wrong, but at anytime you need to pull some data down or set something up. If you store a pre-calculated tarball somewhere that does magic things, you can also store a pre-compiled Docker image somewhere.
|
Distributing bundles (shed_tool_conf entries, tool directories, sqlite for the tool shed install database, and potentially conda dependencies) as tarballs would allow reuse of these flavors outside of Docker / containerized Galaxies. Also it would allow combining several flavors together potentially which can't be done with Docker I don't think. Maybe think of it as a "compiled" version of the emphemeris tool lists? I like this idea a lot but I'm not sure if resources would be better spent on this or improving ephemeris or setting CVMFS for all tool dependencies. |
Wouldn't this mean that we would need to keep the postgres database container version frozen in the helm chart, because the moment we upgrade to a newer postgres container, the data will no longer be usable? We should aim for a minimal admin system. "helm upgrade galaxy" and you are done, dependencies and all - no need to know anything about systems admin, CloudMan can even run "helm upgrade" on behalf of the user. I think it would be good to plan a pathway to make upgrades possible, even if we don't implement upgrades in the first iteration, or we risk locking people into specific versions.
I'm probably missing something here, is the postgres database bundled with the Galaxy container? If it's separate, do you have a separate build of postgres for each flavour?
Ok, in that case, I guess we can start off with this minimal approach, where dependencies are pulled in at runtime, and then transition to having a globally shared CVMFS with dependencies. Sound ok?
I was thinking of any app with a database, say lovd: http://www.lovd.nl/3.0/home |
Sure. But this does not exist yet and it solves a problem which we don't have at the moment. @nuwang is this a problem you where referring to? I understood that it is pure disc-space concern?
This is partially working with Docker. You can base one bundle on top of the other, you can not freely mix them. Freely mixing them means you allow replication.
I think the CVMFS approach could work today.
Yes. But it is not that bad as it sounds. You usually keep your postgresql database for many years stable. You need to migrate data, no matter what.
And this will work. It will upgrade Galaxy. It will not upgrade postgresql from 9.3 to 10.0. You want at least to do a backup before this. @nuwang how many database upgrades have you done with GVL until now?
Here is the plane to upgrade an image: https://github.com/bgruening/docker-galaxy-stable#upgrading-images--toc, its all documented. You can make this automatically, I'm not sure you want to. Its important data and you do this once in 3 years or so.
Its separate https://github.com/bgruening/docker-galaxy-stable/tree/master/compose/galaxy-proftpd
Jupp!
I can only think and these both. Ship a tarball of the pg database or do and sql import at the beginning. |
I think this summarises things really well. We probably have enough content to start merging the helm chart with the common docker container. I think there are some possible solutions to the database upgrade issue too.
The concern was more about how the initial data population was done. I tried running the compose setup, and I have a better understanding of how things work, but it still isn't a 100% clear how/why things are working the way they are. It looks like the /export directory on the host is being populated by the contents of the docker image, which means that we can propagate default data from the container to the host. However, I don't really understand why this works, because the docs seem to suggest that this should only happen for volume mounts, not bind mounts: https://docs.docker.com/engine/admin/volumes/bind-mounts/ and you are using a bind mount correct? In fact, I even tried modifying a single file in /export on the host, and deleting everything else. Docker restored all files from the container except the modified one. This is excellent, but I must be missing something since the docs state that only empty volumes will be populated this way. Also, it still doesn't quite answer how pre-populated data in the database will be handled. I was under the impression that any tools installed from the tool-shed will be stored in the tool database - i.e Postgres. If the database is empty, the tools will not function, is that correct? If so, since the database and galaxy are being built separately, who is installing tool data into the database and how will we ensure that, when we recreate the compose setup elsewhere, the pre-installed data is restored without a build from scratch? The same question arises if we want pre-populated data like a workflow. I also noticed that you have only mentioned a procedure for building the containers and then running it. Can't we just run it from pre-built containers, without having to build it? If so, can we provide a pre-populated database with workflows, histories etc?
This is super useful. I think that helm should be able to transparently handle the upgrade process, including taking a backup of the database, upgrading it, and rolling everything back if things go south: https://docs.helm.sh/developing_charts/#hooks |
here is an attempt to replicate galaxy docker-swarm/compose implementation on kubernetes k8s cluster |
I'm pleased to mention that I have a first working version of helm charts with compose containers. Still some bits to go, but mostly there. I'm managing to avoid completely the config file, but I think that, at least for my use case, avoiding to inject the job_conf file will be very difficult since dynamic destinations for resource usage limit are needed. |
Hello there. Jumping in as part of work to get galaxy-kubernetes working on Azure. We ("we" being @abhi2cool ) are having an issue scaling HTCondor jobs, getting an error 'Job has not been considered by the matchmaker'. Which is an interesting message, to be sure. Do you have any suggestions about how we might interrogate/debug such an issue? ( @bgruening you were suggested as being wise in the ways of these things). Thank you! |
@rc-ms would it serve your final purpose to simply use the job dispatching/scheduling of Kubernetes instead of using condor on top of Kubernetes? |
After a few more days of work on this, I have examples of usage for general Galaxy deployments and for our PhenoMeNal deployment with compose based images are available here. The non-phenomenal one is failing due to this issue, the PhenoMeNal one works fine (because I inject our job_config as part of making my derived ini container). I would say though that until this PR on the Galaxy side is not done, I wouldn't use this yet for heavy analysis loads as clusters can get chocked. |
Thank you @pcm32 I think it would help, since HTCondor is what is blocking us right now. Oh, and filesystems :). Let me check with team and get back. |
@rc-ms can you give me more information how you run the containers and how you submit jobs. Our travis testing can run Condor jobs currently. Any way how I can reproduce this would be fantastic! |
@rc-ms: if we need to discuss specific aspects of your deployment, probably best to email me directly (find my email here) |
@rc-ms maybe open a new issue for your use case of Galaxy with k8s, this issue is being over used for too many parallel discussion. |
will do @pcm32 . @abhi2cool will you start the thread? @bgruening Abhik will share his configuration and issues there. |
Hello @pcm32 and @bgruening created new issue #4 to discuss our configuration / operational issues. @abhi2cool will upload logs and configuration info forthwith. thanks! |
I think that this has been working for a while as well, so I will close this. Let me know if you lack documentation (and where) to make it work locally. |
I'd like to start/document discussion about what will it take to make it possible to integrate and/or interchange resources from this repo with the resources from the https://github.com/bgruening/docker-galaxy-stable/tree/master/compose repo.
A few things come to mind and please add others:
I'm sure there's more but those seem like the minimal set given my current familiarity with the two efforts. Please comment and let's see if&how this can be accomplished.
The text was updated successfully, but these errors were encountered: