Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove all child cgroups before removing parents #94

Merged
merged 1 commit into from
Feb 26, 2021

Conversation

jb3
Copy link
Member

@jb3 jb3 commented Feb 26, 2021

After #91 was merged we starting seeing errors with large stdout causing errors like the following:

2021-02-26 16:02:56,706 |    14 |                 snekbox.nsjail |     INFO | Executing code...
2021-02-26 16:02:56,741 |    14 |                 snekbox.nsjail |     INFO | Output exceeded the output limit, sending SIGKILL to NsJail.
2021-02-26 16:02:56,743 |    14 |                 snekbox.nsjail |     INFO | nsjail return code: 137
2021-02-26 16:02:56,744 |    14 |     snekbox.api.resources.eval |    ERROR | An exception occurred while trying to process the request
Traceback (most recent call last):
  File "/snekbox/snekbox/api/resources/eval.py", line 75, in on_post
    result = self.nsjail.python3(code)
  File "/snekbox/snekbox/nsjail.py", line 228, in python3
    Path(self.config.cgroup_mem_mount, cgroup).rmdir()
  File "/usr/local/lib/python3.9/pathlib.py", line 1352, in rmdir
    self._accessor.rmdir(self)
OSError: [Errno 16] Device or resource busy: '/sys/fs/cgroup/memory/snekbox-87fd27a1-d0e9-4387-8160-8d1a985d43e4'

I think that this external killing of nsjail does not let it clear up the child cgroup (which resides in a directory like /sys/fs/cgroup/memory/snekbox-87fd27a1-d0e9-4387-8160-8d1a985d43e4/NSJAIL.12). Once nsjail exits we then try to clear up the parent cgroup (in the example, /sys/fs/cgroup/memory/snekbox-87fd27a1-d0e9-4387-8160-8d1a985d43e4) which cannot succeed because the child cgroup exists.

This PR fixes that by removing the child cgroup before attempting to remove the parent, therefore recursively clearing the tree and ensuring that no cgroup can be busy.

My understanding is that it's safe to remove that cgroup (which I think technically removes the limits from nsjail) because by the point this code is reached the execution has either finished or has been killed.

@jb3 jb3 added area: backend Related to internal functionality and utilities area: nsjail Related to NsJail and its configuration priority: 2 - normal labels Feb 26, 2021
@coveralls
Copy link

Coverage Status

Coverage decreased (-1.7%) to 56.855% when pulling 06173ee on recursive-remove-cgroups into af704c1 on master.

@jb3
Copy link
Member Author

jb3 commented Feb 26, 2021

Also worth noting that the RAM limits and PID limits are not affected by this, only evaluations which result in a large stdout.

Screenshot 2021-02-26 at 17 07 18

Copy link
Contributor

@Akarys42 Akarys42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does solve our current issue, thanks captain Joe!

@Akarys42 Akarys42 merged commit 59a1bf4 into master Feb 26, 2021
@Akarys42 Akarys42 deleted the recursive-remove-cgroups branch February 26, 2021 17:21
MarkKoz added a commit that referenced this pull request Mar 8, 2021
The branch needs the fixes from #94 to make the tests pass.
@MarkKoz MarkKoz mentioned this pull request Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: backend Related to internal functionality and utilities area: nsjail Related to NsJail and its configuration priority: 2 - normal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants