Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Custom job names #3

Closed
agraubert opened this issue Aug 21, 2019 · 5 comments
Closed

Feature Request: Custom job names #3

agraubert opened this issue Aug 21, 2019 · 5 comments

Comments

@agraubert
Copy link
Collaborator

Allow user to specify custom names for array jobs, instead of naming by index.

Firecloud adapter should default to provide entity names as custom job names, if not provided by user

@agraubert
Copy link
Collaborator Author

So right now, all jobs on SLURM are indexed using $SLURM_ARRAY_TASK_ID and execute job-specific setup by sourcing {staging_dir}/jobs/$SLURM_ARRAY_TASK_ID/setup.sh. That means one of three things needs to happen to make custom job names work out:

  1. In pipelines with custom names, canine can generate a {staging_dir}/aliases file, with one job alias per line. Jobs can read line $SLURM_ARRAY_TASK_ID to determine their name, then continue setup by running {staging_dir}/jobs/$CANINE_JOB_ALIAS/setup.sh
  2. The jobs directory should jointly encode the task id and custom name (ie: {staging_dir}/jobs/0_foo/) so that jobs can source {staging_dir}/jobs/${SLURM_ARRAY_TASK_ID}_*/setup.sh. This would allow jobs to jump straight to the correct directory, while still keeping them human-readable.
  3. In pipelines with custom names, canine should symlink {staging_dir}/alias/{custom name} to {staging_dir}/jobs/{proper job id}. That way, jobs can continue to launch as normal, and humans can inspect the workspace by browsing the alias directory

At the moment, I'm leaning towards option 2, because it seems like the simplest change to achieve the desired goal. It also avoids any uniqueness requirements because the outputs/ folder could also follow the same id_alias naming scheme.


@hurrialice @julianhess what are your thoughts?

@hurrialice
Copy link

I like the first one best - I just want a table to trace my jobs and this does not really need to be reflected in the file structure.

If we will have a table of aliases - is it possible that we combine with #5 ?
A possible table format could be -
<job_id> <custome_name> <job_status>

@agraubert
Copy link
Collaborator Author

Okay, so it seems like overall, nobody really needs the jobs/ directory to be labeled with entity names, so here's my compromise:

  1. jobs/ stays numbered by the array task id
  2. Custom aliases are set within setup.sh like other canine variables
  3. The output/ folder will use custom aliases (which requires that the aliases all be unique)
  4. The job alias will be included in the output dataframe from Orchestrator.run_pipeline() a la Feature Request: Better output format #5

@hurrialice
Copy link

That is beautiful! 👏

@agraubert
Copy link
Collaborator Author

closed in 63cc655

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants