Do not add PFNs for subworkflow input files #4617
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While looking over #4535 I realised that we still had some oddities in how subworkflow file inheritance worked
In particular, consider our separate jobs (think minifollowups) that use input files generated in the main workflow (e.g. TRIGGER_MERGE).
When these jobs run, they use tools like
resolve_url_to_file
to create file objects for the input files. This also adds a PFN into the minifollowsup dax for that file. If these tools are run standalone this is exactly what you want. However, if the minifollowups code is run within a workflow, you do not want to add a PFN because pegasus (and our own code around it) will handle the file transfer for you. Especially when the minifollowup dag creation job runs on a condor node, the PFNs it generates will be invalid and we get things like:Now this doesn't break anything because the first PFN is the right one, and the one coming from pegasus, but it worries me that the second one is always there, and I want it gone.
This patch will not create for files like this that are being passed to subworkflows. resolve_url_to_file now has a new add_pfn option, which these codes set. There is also some sanity checking to avoid some illegal use cases, but hopefully people don't touch this unless they know what it does! I used the
dax-file-directory
option to decide when to set this new option to False. dax-file-directory needs to be set when running in this subworkflow mode anyway, and should never be set otherwise, so if it's set, you can assume this is a subworkflow.I also noted that the seed option for the sbank workflow wasn't working as it should if that workflow is run as a top-level workflow. There will be cases where pegasus will not know where the actual file is. This should also use resolve_url_to_file.