Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kill processes in container before copying output files #433

Open
PhilippWendler opened this issue Jun 19, 2019 · 2 comments
Open

Kill processes in container before copying output files #433

PhilippWendler opened this issue Jun 19, 2019 · 2 comments
Labels
container related to container mode

Comments

@PhilippWendler
Copy link
Member

Currently, the following happens at the end of a run in container mode:

  • Main process of tool is stopped (because it terminates or because it is killed due to a limit violation).
  • Wall time and energy are measured.
  • Output files are copied.
  • Container is killed (this also kills all subprocesses).
  • Limits are cancelled.
  • Subprocesses are killed via cgroups (has no effect).
  • CPU time is read.

This means that if subprocesses change output files after the main process has terminated, the output files can be in an inconsistent state. Furthermore, subprocesses can continue accumulating CPU time while output files are copied.

We should probably kill all subprocesses and potentially do even more cleanup before copying output files. The problem is that the best way to kill the subprocesses is to kill the whole container, but we still need the container for copying output files (cf. 4bba456 and 78a801a). Using cgroups to kill the subprocesses is more inefficient and can be really problematic if the freezer cgroup is not available and a fork bomb runs in the container.

@PhilippWendler PhilippWendler added the container related to container mode label Jun 19, 2019
@PhilippWendler
Copy link
Member Author

Another solution would be to open a file descriptor for accessing the directory in the container before starting the tool. As long as we keep this file descriptor open, we should be able to access the container's directories, so we can kill the container (and all subprocesses within it) immediately. This would probably be the cleanest solution, but requires rewriting containerexecutor._transfer_output_files with directory fds and works only on Python >= 3.3.

PhilippWendler added a commit that referenced this issue Jun 19, 2019
This is at least an improvement for #433,
although a proper solution should be implemented after support for
Python 2 was dropped.
@PhilippWendler
Copy link
Member Author

This is a lot of implementation effort unless we can make use of glob.iglob with directory fd's, which will land in Python 3.10 (cf. python/cpython#16075). So let's wait.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
container related to container mode
Development

No branches or pull requests

1 participant