Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dry run? #15

Open
jayoung opened this issue Dec 7, 2022 · 4 comments
Open

dry run? #15

jayoung opened this issue Dec 7, 2022 · 4 comments

Comments

@jayoung
Copy link

jayoung commented Dec 7, 2022

here (the 'validate a workflow' section) it says "It does not perform a “dry run” or check to see if any of your inputs are actually available, only that it can interpret what you told it."

to me, an obvious question here is - is there such a thing as a 'dry run' with Cromwell/WDL? I have not (yet?) tackled learning snakemake but I think I know enough to have an idea of what 'dry run' means, and it seems like a very appealing concept.

If there is a way to do a dry run, would be great to tell us how. If not, also perhaps helpful to say it's just not a thing for WDLs.

same goes for the "check to see if any of your inputs are actually available" part - is there a capability for that?

@vortexing
Copy link
Contributor

Good point!
Cromwell doesn't really have a dry run mode per se, but this validate process IS basically what those other tools consider to be a dry run. I edited it to be the following text?

This checks the format of your workflow files to make sure you have a valid file in a known format that Cromwell can interpret. This is called a "dry run" to ensure that your tasks are wired up correctly, but Cromwell does not try to see if any of your inputs are actually available, only that it can interpret what you told it. One of the reasons this is is that since Cromwell can pull files from local filesystems, AWS S3, Google buckets and Azure blobs, the process to test it's ability to actually get your inputs will happen while you run the workflow the first time. Luckily, Cromwell will only get file inputs it needs at that moment, and if it can't it won't do that specific task (but can continue with the other tasks it can do!).

@jayoung
Copy link
Author

jayoung commented Dec 9, 2022

yes, that makes sense, and that's useful. Can I suggest a rewrite for clarity?

This checks your workflow files (wdl / jsons) to test:

  • are they in a known format that Cromwell can interpret?
  • are they formatted properly?
  • are the tasks wired up correctly?

This is called a "dry run".

Note that this does NOT test whether your input files are actually available, partly because Cromwell can pull files from local filesystems, AWS S3, Google buckets and Azure blobs. The process of testing input availability will only happen when you run the workflow for the first time. If some input files are missing, Cromwell will run tasks for the input files that ARE available, skipping tasks where inputs can't be found.

@jayoung
Copy link
Author

jayoung commented Dec 9, 2022

I can see myself WANTING to check all the inputs before I start when a workflow is long, so that I can troubleshoot immediately rather than a day later. Is that something you sometimes do? e.g. I could imagine writing an additional task at the start of the workflow that checks for existence of ALL inputs from the workflow, and exits if one or more are missing.

Example - let say in diy-cromwell-server/testWorkflows/tg-wdl-VariantCaller you want to check for the annovar inputs like known_indels_sites_VCFs upfront, so that you know you have all your ducks in a row before running the whole thing.

@vortexing
Copy link
Contributor

I have put in your text into WDL 101 now with PR fhdsl/FH_WDL101_Cromwell#35 . However the issue of actually testing for workflow inputs should live in WDL 102 I think. I'm not aware of a function in Cromwell that will do what you're wanting so I need to go explore a bit and see if it exists, and then document it in WDL 102 if it does. OR like you say, make a hacky "input tester" task (I actually have this for another reason), that you could copy and use at the beginning of your workflow to force localization of all inputs prior to running anything.

@vortexing vortexing transferred this issue from fhdsl/FH_WDL101_Cromwell Dec 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants