Feature Request: Custom job names #3

agraubert · 2019-08-21T18:53:54Z

Allow user to specify custom names for array jobs, instead of naming by index.

Firecloud adapter should default to provide entity names as custom job names, if not provided by user

agraubert · 2019-08-28T16:27:23Z

So right now, all jobs on SLURM are indexed using $SLURM_ARRAY_TASK_ID and execute job-specific setup by sourcing {staging_dir}/jobs/$SLURM_ARRAY_TASK_ID/setup.sh. That means one of three things needs to happen to make custom job names work out:

In pipelines with custom names, canine can generate a {staging_dir}/aliases file, with one job alias per line. Jobs can read line $SLURM_ARRAY_TASK_ID to determine their name, then continue setup by running {staging_dir}/jobs/$CANINE_JOB_ALIAS/setup.sh
The jobs directory should jointly encode the task id and custom name (ie: {staging_dir}/jobs/0_foo/) so that jobs can source {staging_dir}/jobs/${SLURM_ARRAY_TASK_ID}_*/setup.sh. This would allow jobs to jump straight to the correct directory, while still keeping them human-readable.
In pipelines with custom names, canine should symlink {staging_dir}/alias/{custom name} to {staging_dir}/jobs/{proper job id}. That way, jobs can continue to launch as normal, and humans can inspect the workspace by browsing the alias directory

At the moment, I'm leaning towards option 2, because it seems like the simplest change to achieve the desired goal. It also avoids any uniqueness requirements because the outputs/ folder could also follow the same id_alias naming scheme.

@hurrialice @julianhess what are your thoughts?

hurrialice · 2019-08-28T19:10:46Z

I like the first one best - I just want a table to trace my jobs and this does not really need to be reflected in the file structure.

If we will have a table of aliases - is it possible that we combine with #5 ?
A possible table format could be -
<job_id> <custome_name> <job_status>

agraubert · 2019-09-03T16:04:08Z

Okay, so it seems like overall, nobody really needs the jobs/ directory to be labeled with entity names, so here's my compromise:

jobs/ stays numbered by the array task id
Custom aliases are set within setup.sh like other canine variables
The output/ folder will use custom aliases (which requires that the aliases all be unique)
The job alias will be included in the output dataframe from Orchestrator.run_pipeline() a la Feature Request: Better output format #5

hurrialice · 2019-09-03T18:12:10Z

That is beautiful! 👏

agraubert · 2019-09-04T20:25:09Z

closed in 63cc655

agraubert closed this as completed Sep 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Custom job names #3

Feature Request: Custom job names #3

agraubert commented Aug 21, 2019

agraubert commented Aug 28, 2019

hurrialice commented Aug 28, 2019

agraubert commented Sep 3, 2019

hurrialice commented Sep 3, 2019

agraubert commented Sep 4, 2019

Feature Request: Custom job names #3

Feature Request: Custom job names #3

Comments

agraubert commented Aug 21, 2019

agraubert commented Aug 28, 2019

hurrialice commented Aug 28, 2019

agraubert commented Sep 3, 2019

hurrialice commented Sep 3, 2019

agraubert commented Sep 4, 2019