Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

submission: auxiliary one-off data as files mounted into job container #45

Open
lukasheinrich opened this issue May 29, 2017 · 1 comment
Projects

Comments

@lukasheinrich
Copy link
Member

lukasheinrich commented May 29, 2017

There is a need for having data mounted into the container that is not part of the workflow work directory but rather encapsulated information that is only needed by the specific job.

Examples are

  1. normally, we submit a container and a cmd to the job controller, where the cmd is prepared by the workflow controller (it constructs the from a template, and workflow specific data, like file paths that are only known at run-time. Sometimes the cmd is pretty long and a one-off multi-linescript is a better choice. The script can be constructed by the workflow controller, but needs to be mounted into the container by the job controller

Example:

cat /path/only/known/at/runtime/by/wflowcontroller/input.txt
echo some
echo very
echo long
echo script
cat /path/only/known/at/runtime/by/wflowcontroller/output.txt

we would like this to be mounted at some well-defined location in the container say /reana/script, such that we can submit a job with command: bash /reana/script

The Job manifest could look like this

experiment: ATLAS
docker_img: my_atlas_analysis
cmd: bash /reana/script
aux_mounts: 
   -  mountpath: /reana/script
      data: |
         echo some
         echo very
         echo long
         echo script
  1. a related Example deals with situations when the commands/script become to large, we'd like to mount some of the data into the container. Take the example of merging 500 ROOT files into a single output file. For few files this is possible via hadd merged.root inputA.root inputB.root. For large lists (of absolute paths) this can become unworkable, and we'd rather write a script such as merge.py merged.root inputfiles.json. The inputfiles.json can be constructed by the workflow controller and submitted like so:
experiment: ATLAS
docker_img: my_atlas_analysis
cmd: merge.py /reana/inputfiles.json /workdir/location/merged.root
aux_mounts: 
   -  mountpath: /reana/inputfiles.json
      data: |
         {"inputsfiles": [
            "/one/very/long/path/to/a/file"
            "/one/very/long/path/to/a/file"
            "/one/very/long/path/to/a/file"
            "/one/very/long/path/to/a/file"
           ... 100s of more file paths
            "/one/very/long/path/to/a/file"
            "/one/very/long/path/to/a/file"
           ]
        }

Implementation:

Kubernetes should transparently support this via either secrets or configmaps

@diegodelemos diegodelemos added this to the Someday milestone May 29, 2017
@lukasheinrich
Copy link
Member Author

this is also relevant for reanahub/reana-workflow-engine-serial#17 (comment)

if we could mount the desired stdin as a one-off file we could have a nicer command without the base64 hack e.g.

command: ['sh','-c','root < /my/mounted/script']

@diegodelemos diegodelemos removed this from the Someday milestone Oct 6, 2019
@diegodelemos diegodelemos added this to To triage in Triage Oct 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Triage
To triage
Development

No branches or pull requests

2 participants