Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes job runner #2314

Merged
merged 62 commits into from Jun 5, 2016

Conversation

Projects
None yet
4 participants
@pcm32
Copy link
Member

commented May 5, 2016

Introduces a first working version of a Kubernetes job runner, plus documentation for the runner in the job_conf.xml.sample_advanced. This runner requires a shared filesystem accesible by Galaxy and Kubernetes (details explained in the documentation). It relies on pykube for communicating with the Kubernetes REST API, but since not all the required functionalities (pulled requests) have been approved there, this version relies on an unofficial fork of pykube. It requires a Kubernetes version that supports Jobs in the extensions API (tested with current version 1.2). The runner inherits from AsynchronousJobRunner.

Testing these changes requires a Kubernetes cluster to be accesible, shared filesystem and setting up persistent volumes and persistent volume claims in Kubernetes against this shared filesystem as explained in job_conf.xml.sample_advanced. Then galaxy tools need to be linked to the runner as explained in that same file (see destinations and tools parts). Docker container images for the tools that are linked need to be available at an accessible docker registry for Kubernetes to use them.

The contents of the PR have been tested against a Kubernetes cluster with simple jobs, but there might still be borderline cases to test. This PR addresses issue #1902 .

pcm32 added some commits Mar 15, 2016

Removes url_to_destination, as this is only for legacy destinations (…
…which I guess our's isn't, wouldn't have any URL to transform either)

@martenson martenson removed the triage label May 6, 2016

@martenson martenson added this to the 16.07 milestone May 6, 2016

@jmchilton

This comment has been minimized.

Copy link
Member

commented May 16, 2016

Just a follow up - I had very similar comments regarding condor work that also moves Docker resolution up to the runner level - #2278 (comment). My work with Galaxy abstractions for that PR can be found here - bgruening#10. That PR helps illustrate why setting these abstractions up in a slightly deeper part of the framework will allow innovations of this PR such as decomposing the Docker image identifier - to be reused instantly in other components of Galaxy.

@pcm32

This comment has been minimized.

Copy link
Member Author

commented May 20, 2016

Hi @jmchilton, I'm sorry but I haven't had time to come back to this unfortunately. I intend to do today. However, I'm looking at your #2278, I see that you introduce this find_container( job_wrapper ) method, which I cannot find implemented anywhere (nor could pycharm). I looked within the condor runner class and its parent classes as well (Async job runner and base runner), but couldn't find it. The only closest I found was a _find_container( job_wrapper ) on base runner. Is this the method?

If I understand correctly, what you would like me to do is to use that find_container( job_wrapper ) method to resolve the docker container that the k8s runner should use, adding to it the ability to understand both the single container name convention used by most galaxy runners and the fragmented container name that is introduced on this PR, using the defaults and override that you suggested. Correct?

pcm32 added some commits May 20, 2016

Adds the ability to build container ids for pieces for the centralise…
…d galaxy containers class and takes into account the default and override modes, suggested by jmchilton.
Moves the kubernetes runner to use the centralised galaxy way of pick…
…ing containers, allowing destinations to override a tool, use a tool, or have a default container if the tool doesn't provide one.
Additional new required parameter for destinations (that will be used…
… with the kubernetes runner) to play nicely with existing galaxy handling of containers.
@pcm32

This comment has been minimized.

Copy link
Member Author

commented May 20, 2016

These new set of commits should address the proposal from @jmchilton regarding default and override modes for containers. This forces though a slight change in the idiom (docker_repo_default instead of docker_default_repo), but only for the case where the docker container id is produced out of its components, as it is proposed on this PR. Current implementation also can use docker containers defined by tools.

I'm seeing some errors/badly handled exceptions on some borderline cases, like manual (through kubectl) deletions of jobs in kubernetes (that are sent from galaxy), or trying to create jobs with ids that already exist. So this is probably not completely ready, but I wonder whether this kind of things should be depurated with more users testing this after being merged or not.

@jmchilton

This comment has been minimized.

Copy link
Member

commented May 25, 2016

@galaxybot test this

pcm32 added some commits Jun 1, 2016

@jmchilton

This comment has been minimized.

Copy link
Member

commented Jun 2, 2016

Is this out of "work in progress" state then?

@pcm32

This comment has been minimized.

Copy link
Member Author

commented Jun 3, 2016

I was just updating my branch to be in sync with galaxyproject/dev. However, today I tried it again and a new error appeared, maybe there were some modifications to the internal galaxy container part in the meantime. But besides this, I would merge if you are happy. I'll come back and report ASAP so that you can merge.

@jmchilton jmchilton changed the title [WIP] Kubernetes job runner Kubernetes job runner Jun 5, 2016

@jmchilton jmchilton removed the status/WIP label Jun 5, 2016

@jmchilton jmchilton merged commit a931b4b into galaxyproject:dev Jun 5, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

jmchilton added a commit to jmchilton/galaxy that referenced this pull request Jun 8, 2016

jmchilton added a commit to jmchilton/galaxy that referenced this pull request Jun 8, 2016

Make pykube an optional dependency since we don't have wheels for it.
I should have requested this as a change to galaxyproject#2314.

Rebased with improved job conf sample comment thanks to suggestions by @nsoranzo.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.