New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic task cleanup #3849
base: master
Are you sure you want to change the base?
Automatic task cleanup #3849
Conversation
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
…emote paths) Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
As I mentioned in the other PR, this eager cleanup currently won't work correctly with file publishing because the publishing is asynchronous. So for each task we need to wait for any files to be published first... |
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
81f7cb7
to
8a43489
Compare
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
If I understand correctly, your current method of implementing this is to wait for the output of each process to be output then perform cleanup. It may be better to wait until the end of all processes of a single run and then delete. This would probably make it easier to run since you do not need to worry about what files need to be stored. This would still offer a significant amount of space saving. Exmaple: |
The cleanup strategy needs to be reworked due to the upcoming workflow output DSL #4670. If the publish definition is moved to the workflow level, the task has no idea which of its outputs will be published, and the cleanup observer can't delete a task until it knows that all of the outputs that are going to be published have been published. A simple solution would be to mark an output for deletion when it is published (and downstream tasks are done with it, etc). The downside is that outputs that are not published are not deleted. Thinking further, the current POC of the output DSL just appends some "publish" operators to the DAG, so I might be able to trace each process output through the DAG to see if it's connected to a publish op. That way we know if an output can "not" be published and delete it sooner. It still misses files that "could" be published but aren't at runtime, e.g. because they get filtered out by a Finally, we can always fall back to the existing strategy of "delete whatever is left at the end". As long as the eager cleanup can delete enough files early on, it should be enough to move many pipelines from un-runnable to runnable. |
Alternative to #3818
Instead of adding a
temporary
option to output paths, this PR facilitates the automatic cleanup through thecleanup
config option. By settingcleanup = 'eager'
, Nextflow will automatically delete task directories during the workflow run. Caveats are documented in the PR.TODO: