New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to stop running pipeline. #1441
Comments
You can just CTRL+C if it's running in the foreground or kill it as any other job if it's running in background |
@pditommaso Thanks for the quick response. The issue is that we have multiple pipelines running in the background simultaneously, and killing the driver pod does not kill the associated process pods. Is there a way to find the process pods associated with a particular driver? |
it should, which executor are u using? |
ah sorry, kubernetes |
nextflow/modules/nextflow/src/main/groovy/nextflow/k8s/K8sTaskHandler.groovy Lines 347 to 360 in a5a6a1f
|
Yes, if we do a run with 100 simultaneous processes, the running processes are not killed when we delete the driver. We can delete all nf-* processes, but if there is another pipeline running it kills those processes as well. |
Does it happen only with a large number of pods? There's nothing useful in the log file? it may be the K8s it's unable to handle a large number of requests altogether. |
I might know the issue. I realized that instead of killing the client process running in the background on our machines, we have been killing the driver pod on the K8s cluster. Other causes are when a transient error with the driver pod(such as network timeout) causes it to fail, leaving the process pods hanging. I will test and get back to you. If this is the issue, maybe there is a way for the client process to submit a "cleanup" pod if the driver pod fails/is unreachable, or resubmit the driver pod which still tracks the running process pods.... |
Does the client process track the process pods, or just the driver pod? |
It tries to delete each job pod one by one. |
Using version 19.07, neither driver or job pods are not deleted when the client process is closed in the foreground or background. I will test a few more times to see if this bug persists. When I leave the client running but kill the driver pod, it is deleted but the job pods remain. I will also repeat this test a few times. |
Bug persists, I killed the client with CTRL+C. These were runs with 10 simultaneous jobs, so I do not think it is an issue caused by a large number of requests. |
Please run a small test enabling the trace logging and including the resulting .nextflow.log file.
|
Here are the log files generated by the cluster and locally. Note that while the timestamps are very different, the driver pod is the same, so it is the same run. Also note that the local log is much smaller, presumably because it was killed early. I had to wait on the pipeline to finish before the cluster log stopped being written to. |
Nextflow is normally able to clean up after itself for most executors. For example if you kill a nextflow run on PBS it deletes all submitted jobs via The problem I think is that killing a nextflow run on k8s means deleting the submitter pod, so the nextflow process running on that pod might not get the CRTL-C signal that would normally trigger clean up. It might have to be implemented as a lifecycle hook instead. @cbmckni In the meantime you can use a script I added to kube-runner called https://github.com/SystemsGenetics/kube-runner/blob/master/kube-pods.sh It lists each pod with the associated nextflow run so you can use it as an example of how to find the worker pods for a particular run. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
New feature
I would like for the user to be able to stop a running pipeline, on any of the supported executors.
Previous work would not be deleted, and the user would be able to resume the workflow as usual.
Usage scenario
It would be very useful to be able to stop a pipeline on command. This would allow users to stop pipelines that are running incorrectly, more easily replicate transient or intermittent faults, temporarily free up resources, etc.
Suggest implementation
Implementation would involve coding wrappers for each of the executors.
We primarily use Nextflow in a Kubernetes environment. For this example, Nextflow would keep track of running driver pods, then kill them and all other associated pods. This could probably be easily implemented if the driver pods keep track of associated process pods.
For this example, the command to kill a workflow submitted using
nextflow kuberun systemsgenetics/kinc-nf -v deepgtex-prp
would look something like
nextflow kuberun stop systemsgenetics/kinc-nf -v deepgtex-prp
or tags could be implemented, if they are not already:
nextflow kuberun systemsgenetics/kinc-nf -v deepgtex-prp -t workflow1
would look something like
nextflow kuberun stop workflow1
Thoughts?
If this feature was added, it could open the door to other useful features(such as the ability to stop and resume at an earlier process)
The text was updated successfully, but these errors were encountered: