Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"output cleanup" in workflow appears to delete files more or less randomly #10449

Open
jancrichter opened this issue Oct 19, 2020 · 3 comments
Open
Labels

Comments

@jancrichter
Copy link
Contributor

Scenario: "Output cleanup" is activated in final step of workflow.

Expectation: All non-starred (or rather non-checked) and therefore hidden workflow outputs should be deleted.

Observation: Only 50 of around 180 files are deleted. Furthemrore, there are "checked" or "starred" files among the deleted ones.

This occurs on 19.09, 20.05 and 20.09 (latest, from today). I have not tested this on any older versions.

I could find nothing in the logs that might be related to this.

Further question: Any chance of an option to directly "purge" the files in question?

Thanks for any feedback!

@dannon
Copy link
Member

dannon commented Oct 19, 2020

Well this doesn't sound right at all. Is this a workflow you could share (.ga format export is fine) that I can look at?

@jancrichter
Copy link
Contributor Author

Hi,

unfortunately I cannot share the original workflow. If needed I can try and prepare an artificial example.

However, I think at first I need some clarification on what I should expect. In the workflow editor the "output cleanup" is described as such:

image

I have until now interpreted this as "unmarked datasets from all steps of the workflow which are completed at time of running this step, no matter which branch of the workflow".

The description in the workflow launch screen seems to limit this to datasets which were "parent" to the step in which is activated, which sounds to me like it could mean either only the datasets used as input to this step, or any datasets from steps with a direct or indirect connection to this step.

image

If this is the case, the results I am getting make a lot more sense suddenly.

I would however still expect no deletion of datasets marked as Workflow Output.

@dannon
Copy link
Member

dannon commented Oct 20, 2020

@jancrichter Yep, that's exactly right. This action is designed to execute at the time the step in question finishes, cleaning up parent datasets (which are no longer being used as inputs for other steps). You can use this multiple times to manage dataset outputs in a large workflow, or only at the final outputs for a workflow-level cleanup.

That said, that it is deleting things that are marked as outputs sounds like a bug -- do you think you could reproduce a more minimal example that this happens with?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants