Strategy for Output / Non-shared Fileystem #2

vsoch · 2023-07-11T02:15:34Z

@johanneskoester and @alculquicondor I want to bring you in for discussion and updates here!

With the current push, I have the most basic skeleton of running a hello world batch Job (starting with this before venturing into other CRD types, because there are many small problems to solve) with Snakemake via Kueue. This is the log from running the snakemake step in the container:

The Kubernetes job succeeds - JobStatus.SUCCEEDED is linked to the succeeded status of the job == the completions.

But of course as Snakemake is expecting output files, and we have neither a shared filesystem nor a way to get them back immediately, the snakemake run fails. There are several problems:

I'm using a ConfigMap for the Snakefile, which totally disregards any additional context of the local working directory
The working directory where the job runs needs write, so I use an EmptyVolume to avoid setting up something complicated for development.

Thus we need solutions for:

Handling a working directory and getting it into a pod or some volume for pods.
That volume needing write
The results of the run being able to be passed back / saved somewhere.

I wanted to open these questions for discussion, and I'm thinking for early prototyping we could require output files to be sent to some remote (e.g., gs://) and then maybe there could be a way to not expect an actual file, but to see some text result in logs?

On a higher level, this is likely going to be a bigger problem as we try to support many different environments with snakemake, where everyone uses a different cloud / different storage/ different file-types. My experiences with using Kubernetes and setting up storage is that it's really hard (and adds developer and user toil that we should want to avoid). For Google LifeSciences + Snakemake we assume using Google Storage, and for the working directory we upload it and download it on startup. We could do something similar here, but then we face the challenge of needing to support multiple clouds. Paulo from Nextflow is working on fusion that minimally can do a flexible fuse-style mount, but it's not an open source project. Services like Google Filestore are also really nice, but again, we can't assume being on Google Cloud.

Let me know your thoughts - and chat tomorrow @johanneskoester !

The text was updated successfully, but these errors were encountered:

vsoch · 2023-07-11T02:43:52Z

This is also something I think snakemake should support for the executor plugins: snakemake/snakemake#2349

vsoch · 2024-01-31T00:27:50Z

This is handled with a remote (e.g., AWS). I still haven't gotten this working on the MPI Operator - if the design means the output is generated elsewhere (and it's expected to be on the launcher) we might run into an issue - will try testing again.

vsoch closed this as completed Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strategy for Output / Non-shared Fileystem #2

Strategy for Output / Non-shared Fileystem #2

vsoch commented Jul 11, 2023

vsoch commented Jul 11, 2023

vsoch commented Jan 31, 2024

Strategy for Output / Non-shared Fileystem #2

Strategy for Output / Non-shared Fileystem #2

Comments

vsoch commented Jul 11, 2023

vsoch commented Jul 11, 2023

vsoch commented Jan 31, 2024