Offloading jobs to Kubernetes #1902

pcm32 · 2016-03-10T13:37:21Z

Hi there,

We are interested in using Kubernetes to offload jobs to. This means that each job is executed within a docker container. For our use case, we should be able to rely on a shared filesystem, as we intend to run galaxy within kubernetes (as a pod). I was wondering how should I proceed? Should I just try to write a job runner? Or should I go through a more complex route with Pulsar/LWR? I understand that Pulsar/LWR is the option when there is no shared filesystem, but if there is, then I should refrain from using those.

Basically, with Kubernetes to send a job, you need to write a json or yaml file defining the job, where a crucial part is to define which container image will run it. Different galaxy tools normally will use different docker images; we have many tools that we want to use containerized already. So here, within the galaxy use case, we need to provide a mapping from tools to docker images (that is, which tool is executed within which container). So I wonder how this tool-container mapping fits within the normal galaxy structure (were would you place something like this, considering that this might be dependent on the local kubernetes cluster you're using). It could well be fetched from a container in the cluster holding those definitions/mappings.

We currently have a proof of principle working, with an older galaxy version, but this entails modifying the galaxy tool's wrappers, which is of course a bad idea. It essentially interfaces with kubectl (the cli provided by Kubernetes) to send jobs and wait for completion/failure. But we want move to a more mainstream usage of galaxy.

I did look around for something like this, but couldn't find anything.

What would be the best way to proceed in terms of implementation? Write a job runner or something else? Thanks!

dannon · 2016-03-10T13:46:25Z

It's great to hear someone else is working on this, we're also very interested! Is the source hosted somewhere we could see the approach?

Does the current 'container' requirement tag (example: https://github.com/galaxyproject/galaxy/blob/dev/test/functional/tools/catDocker.xml#L4) convey enough information for a job runner to create the necessary config files?

pcm32 · 2016-03-10T14:10:17Z

Well, our proof of concept is simply a wrapper to kubectl, nothing fancy. I was starting now to look at Galaxy code, but I don't have anything yet to show (just forked the project). As you might be aware, Kubernetes is normally abbreviated k8s, so I'll use that downwards.

Considering a tool that initially looks like this (no k8s integration):

<tool id="upps_blankfilter" name="BlankFilter_Regular" version="0.1.0">
    <requirements>
        <requirement type="package">Rscript</requirement>
    </requirements>
    <command><![CDATA[
    Rscript BlankFilter.r "$input1" "$output1"
    ]]></command>
    <inputs>
        <param type="data" name="input1" format="xls" />
    </inputs>
    <outputs>
        <data name="output1" format="xls" />
    </outputs>
    <help><![CDATA[
        TODO: Fill in help.
    ]]></help>
</tool>

The same tool, when modified to be able to use this wrapper, looks like this (but bear in mind that this is what we are moving away from):

<tool id="upps_blankfilter" name="BlankFilter" version="0.1.0">
    <requirements>
        <requirement type="package">submit_k8s_job</requirement>
    </requirements>
    <stdio>
        <exit_code range="1:" />
    </stdio>
    <command><![CDATA[
        submit_k8s_jobs 
                   -j blankfilter
                   -n blankfilter
                   -c blankfilter
                   --cimgrepos docker-registry.local:50000
                   --cimgowner phnmnl 
                   --cimgname ex-blankfilter
                   --cimgver latest
                   --volpath /mnt/glusterfs
                   --volname glusterfsvol
                   --glusterfspath scratch
   --
   "$input1" "$output1"
    ]]></command>
    <inputs>
        <param type="data" name="input1" format="xls" />
    </inputs>
    <outputs>
        <data name="output1" format="xls" />
    </outputs>
    <help><![CDATA[
        TODO: Fill in help.
    ]]></help>
</tool>

So in the command part, before the -- , you find all the definitions necessary for setting up the job on k8s. After the -- you find all the arguments passed to the tool running inside the container.

You can get the bigger picture from this internal demo I wrote about this:
https://github.com/pcm32/k8s_demo/tree/master/embassy_demo

But as you see, this is a lot of tool modification, which is what we want to move away from. Adding the <requirement> tag specifically to be able to run it in docker looks to me as modifying the tool as well. This would also mean adding a requirement in the correct galaxy path for each container that we want to use, right?. What I would like to have, maybe stored in the same k8s cluster within a mapping container, is something like this:

- exec: blankfilter
  containers:
     - image: blankfilter_container
       image_version: 16
       image_owner: phnmnl
- exec: tool2
  containers:
     - image: cont_for_tool2
     ...

So that when the k8s galaxy job runner (which doesn't exist yet) encounters a command being sent where the exec is blankfilter, it queries this and knows that it needs to use the blankfilter_container and give all the rest of the command as arguments to the k8s job being created. If the mapping is not found, then default to another (local?) runner. This would allow to run tools on k8s without having to touch the tool wrapper in galaxy. Would this make sense in the galaxy context, or would you advice to modify or write new tools wrappers that can particularly run on k8s?

But yes, answering your question, what you have there in the requirement would suffice, supposing that the either the executable matches the image's entrypoint and you pass the rest as arguments; or that the given executable can be run within the image.

pcm32 · 2016-03-15T15:18:20Z

To follow up on this, our understanding (@RJMW, @korseby, @sneumann and me) so far (for our use case: shared fs available) is that we should implement a kubernetes job runner in libs/galaxy/jobs/runners which inherits from AsyncJobRunner, and uses pykube to interface with the k8s REST API.

In addition to this, we should add a entry in the config/job_conf.xml file such that it looks something like this:

<plugin id="k8s" type="runner" load="galaxy.jobs.runners.kubernetes:kubernetes">
            <param id="k8s_config_path">/path/to/kubeconfig</param>
            ...
</plugin>

a destination for each docker container that we want to use:

<destination id="blankfilter-container" runner="k8s">
    <param id="repo">docker-registry.lan:80000</param>
    <param id="owner">bfcreator</param>
    <param id="image">ex-blankfilter</param>
    <param id="tag">latest</param>
</destination>

and then pair the tool to the destination and runner in the same file within <tools> </tools> with:

<tool id="blankfilter" destination="blankfilter-container"/>

Would this be the correct way of adding this feature to galaxy? We have some code written here. Thanks for the feedback @dannon !

bgruening · 2016-03-15T15:23:31Z

For me this sounds like to correct way of doing it.
ping @abdulrahmanazab as he is working on similar things and might be interested.

Awesome work! Don't forget to register for a talk at next GCC :)

pcm32 · 2016-03-15T15:27:07Z

Thanks @bgruening! We will continue on this track then, I would certainly be interested in joining efforts with @abdulrahmanazab as we are only starting this.

dannon · 2016-03-15T15:57:18Z

@pcm32 Sorry for the lag here, most of us are at a meeting this week. I like the approach you've outlined above.

pcm32 · 2016-03-15T16:03:10Z

No problem @dannon! and thanks for confirming that this is the way to go.

pcm32 · 2016-04-25T13:21:08Z

@bgruening , @dannon I have now something that works here for Kubernetes (k8s) and galaxy. I still haven't tried all borderline cases (restarting failed jobs, etc), but the basic functionality is there (it submits the jobs to k8s, monitors progress, signals jobs when done or failed and fills in stdout/stderr files). I have made some assumptions though and some things need to improve yet. My questions to you guys are:

I currently set the k8s job name with "galaxy-" + job_wrapper.get_id_tag() so that it is DNS-able. However, I'm considering to add something more distinctive to the kubernetes job name, like the Galaxy job's UUID. Issue being that Job names cannot be repeated in k8s and I can imagine an scenario in which the galaxy job counter is reseted (or more than one galaxy instance is sending jobs) and then the get_id_tag() is no longer reliable to identify the job. I want to still be able to track the job run in k8s from galaxy for auditing purposes. Any suggestions on what to use from galaxy to set the k8s job name (name acts like an ID really)?
I intend to use as well k8s namespaces (which helps with the previous issue), however, I'm wondering what to use here from galaxy. I would like something that identifies the current galaxy instance (not the session, as it is restarted it should still be the same), but I'm not sure whether there is something like this on galaxy. Any suggestions? My idea here is that if multiple users are allowed to fire their own galaxy instance on kubernetes, then I would like a separate k8s namespace for their jobs (as currently kubernetes Job ids are based on the get_id_tags(), which would clash between different galaxy instances running). Maybe the galaxy user as a namespace might work.
In k8s there is no way to tell currently between the stderr and stdout of a Pod (this is where the container actually runs), it is all mixed in the Pod's log kubectl logs <pod-name>. I'm currently dumping this content, for each container in the pod, to the Galaxy expected job_state.out_file. I'm writing the expected job_state.error_file with errors/warnings related to the Job submission to k8s (but not the tool's error). I hope this is reasonable, as it is the best I can do in the current case I think.
Where do I document the usage of the runner? I see that you have some autogenerated .rst files, but I'm not sure if I'm supposed to put documentation on the code files with some formatting and then this will fill in that section on the .rst automatically? or whether you edit the .rst files manually after automatic generation of headers. In any case, I'm referring mostly to how you write the job_conf.xml sections required to be able to use the k8s runner, how do you write the adequate destinations and what are currently the main assumptions of the implementation. So the runner's code might not be the best place for doing this. What would be the place for this documentation?

Currently installation doesn't work out of the box because my pull requests to pykube haven't been accepted yet, but I'll work on that to finally make a pull request here. Thanks for the feedback!

bgruening · 2016-04-30T17:30:24Z

@pcm32 this sounds awesome. Can't wait to try this!
Concerning the job name, I think it's fine what you outlined above. @dannon can probably help you more.
Documentation I would store in https://github.com/galaxyproject/galaxy/blob/dev/config/job_conf.xml.sample_advanced. Extra points for the Galaxy wiki page :)

In related news we managed to convince HTCondor to submit Docker containers. According to @abdulrahmanazab this scales better than kubernetes. The PR is here: #2278

pcm32 · 2016-05-04T09:37:19Z

@bgruening, @dannon I'm still waiting for my PR to be accepted at pykube (the library I use to communicate with k8s through its REST API) before I can make a pull request here. Is it acceptable to make the pull request here in galaxy if it means that I need to add a github source for the pykube package on the requirements file (doing something like this) instead of the currently official pip install of pykube? I would fix this once my changes are pulled at pykube and a new release is made available on the pip repo.

bgruening · 2016-05-04T10:14:13Z

@pcm32 Galaxy has already shipped modified python packages in the past. So I think this is possible. You can also label your PR as WIP so we can search for testers and reviewers.
Also we probably need to create a wheel for it: https://github.com/galaxyproject/starforge/tree/master/wheels

bgruening · 2016-11-06T12:42:02Z

@pcm32 this can be closed correct! Hope your talk went well!

pcm32 · 2016-11-06T14:50:00Z

Thanks! Yes we can close it!

bgruening mentioned this issue Apr 30, 2016

HTCondor Docker Integration #2278

Merged

pcm32 mentioned this issue May 5, 2016

Kubernetes job runner #2314

Merged

pcm32 closed this as completed Nov 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offloading jobs to Kubernetes #1902

Offloading jobs to Kubernetes #1902

pcm32 commented Mar 10, 2016

dannon commented Mar 10, 2016

pcm32 commented Mar 10, 2016

pcm32 commented Mar 15, 2016

bgruening commented Mar 15, 2016

pcm32 commented Mar 15, 2016

dannon commented Mar 15, 2016

pcm32 commented Mar 15, 2016

pcm32 commented Apr 25, 2016 •

edited

Loading

bgruening commented Apr 30, 2016

pcm32 commented May 4, 2016

bgruening commented May 4, 2016

bgruening commented Nov 6, 2016

pcm32 commented Nov 6, 2016

Offloading jobs to Kubernetes #1902

Offloading jobs to Kubernetes #1902

Comments

pcm32 commented Mar 10, 2016

dannon commented Mar 10, 2016

pcm32 commented Mar 10, 2016

pcm32 commented Mar 15, 2016

bgruening commented Mar 15, 2016

pcm32 commented Mar 15, 2016

dannon commented Mar 15, 2016

pcm32 commented Mar 15, 2016

pcm32 commented Apr 25, 2016 • edited Loading

bgruening commented Apr 30, 2016

pcm32 commented May 4, 2016

bgruening commented May 4, 2016

bgruening commented Nov 6, 2016

pcm32 commented Nov 6, 2016

pcm32 commented Apr 25, 2016 •

edited

Loading