Limit STDOUT to prevent OOM events in container#81
Conversation
Recently, we discovered that for some code inputs, snekbox would get
into an OOM event on the container level, seemingly bypassing the memory
restrictions laid on code execution by NSJail.
After investigating the issue, we identified the culprit to be the
STDOUT pipe we use to get output back from NSJail: As output is piped
out of the jailed process, it will be gathered outside of the NSJail in
the main container process instead. This meant that our initial attempts
of limiting the allowed filesize within the NSJail failed, as the OOM
happened outside of the jailed environment.
To mitigate the issue, I've written a loop that consumes the STDOUT pipe
in chunks of 100 characters. Once the size of the accrued output reaches
a certain limit (currently set to 1 MB), we send a SIGTERM signal to
NSJail to terminate itself. The output up to that point will be relayed
back to the caller.
A minimal code snippet to trigger the event and the mitigation:
```py
while True:
print(" ")
```
I've included a test for this vulnerability in `tests/test_nsjail.py`.
| # Ask nsjail to terminate itself using SIGTERM | ||
| nsjail.terminate() |
There was a problem hiding this comment.
SIGKILL is better since it'll result in a consistent error on the front end. Furthermore, SIGKILL cannot be handled by the child process, so it's more robust in this case.
| # We'll consume STDOUT as long as the subprocess is running | ||
| while nsjail.poll() is None: | ||
| # Read 100 characters from the STDOUT stream | ||
| chars = nsjail.stdout.read(100) |
There was a problem hiding this comment.
100 seems really low. I suggest something in the 10's of kilobytes. Since an overflow of that magnitude is still insignificant, it should not discard the overflowed bytes it reads.
There was a problem hiding this comment.
Consider using io.DEFAULT_BUFFER_SIZE or setting a custom bufsize for Popen.
There was a problem hiding this comment.
I can increase the number of characters read from the stream, although this increases the uncertainty about the size in bytes we're reading due to UTF-8 characters being up to 4 bytes long.
There was a problem hiding this comment.
Consider using
io.DEFAULT_BUFFER_SIZEor setting a custombufsizeforPopen.
This is the default value for Popen due to the negative default value for bufsize (-1):
negative bufsize (the default) means the system default of io.DEFAULT_BUFFER_SIZE will be used.
There was a problem hiding this comment.
this increases the uncertainty about the size in bytes we're reading due to UTF-8 characters being up to 4 bytes long.
Well, even being 40 KB over is not a big deal.
This is the default value for Popen due to the negative default value for bufsize
That was worded poorly. I mean that when you read from the stream, use read(io.DEFAULT_BUFFER_SIZE) or set the bufsize to the same number you use to read from the stream. Not sure if it really makes a difference.
| output_size = 0 | ||
| output = [] | ||
|
|
||
| # We'll consume STDOUT as long as the subprocess is running | ||
| while nsjail.poll() is None: | ||
| # Read 100 characters from the STDOUT stream | ||
| chars = nsjail.stdout.read(100) | ||
| chars_size = sys.getsizeof(chars) | ||
|
|
||
| # Check if these characters take us over the output limit | ||
| if output_size + chars_size > OUTPUT_MAX: | ||
| # Ask nsjail to terminate itself using SIGTERM | ||
| nsjail.terminate() | ||
| break | ||
|
|
||
| output_size += chars_size | ||
| output.append(chars) |
There was a problem hiding this comment.
It'd help readability to move this to a separate function that returns the output.
|
Just to double check, there will always be a front-end indication of truncation since 1 MB is 250,000 - 1 million characters, but pastebin has a limit of 100,000 characters or 0.1 - 0.4 MB. Is this correct? |
That is correct. My idea was that I did not want to go lower than the pastebin limit, so I used a safe margin. Now I'm thinking that we could also count the number of characters we consume and limit that to, say, 99,900. That means we can always upload the results. We could even include a message like " Edit: Hmm, on second thought, that would mean that all the joke runs that generate a lot of output are uploaded to the bin as well, filling it up with large outputs that don't make sense. |
|
I think it's fine as-is, though it should be documented that it truncates at around 1 megabyte. |
The function now returns a single, joined string instead of a list of a list of strings. That way, we don't have to join the list in two different code branches.
Previously, the chunk of output that took us over the output limit was dismissed. As we've already consumed it and it's not going to have a very large size, we might as well include it in the final output we return.
I've increased the number of characters in each chunk we read from stdout to 10_000. This means we now read roughly 10 KB - 40 KB in each chunk.
This new behavior matches how other limiters terminate the subprocess, resulting in a more consistency in the front-end for the end users as well.
| """).strip() | ||
|
|
||
| result = self.nsjail.python3(stdout_flood) | ||
| self.assertEqual(result.returncode, -9) |
There was a problem hiding this comment.
Is -9 equivalent to 137? Is this some sort of two's complement thing around 127 or 1 byte?
There was a problem hiding this comment.
No, it's what you get back when you terminate a subprocess with a signal in Popen: the signal code, but negative. So, that's what we're getting here as well, a -9, because SIGKILL is 9.
There was a problem hiding this comment.
Then this needs to be adjusted somewhere so that the bot will understand that it's still SIGKILL, because right now I believe it will only understand 137.
There was a problem hiding this comment.
We could return a manual return code from the consume function:
- If we exceed the limit, we return a manual 137
- If we don't, we propagate the actual exit code.
There was a problem hiding this comment.
How about something more generic to normalise it:
if result.returncode < 0:
result.returncode = abs(result.returncode) + 128There was a problem hiding this comment.
I guess - can be used instead of abs()...
MarkKoz
left a comment
There was a problem hiding this comment.
This looks good. I'd just like a test for the output truncation. The test you added only tests for whether the process is killed.
When you send a signal `N` to a subprocess using Popen, it will return `-N` as its exit code. As the rest of the code returns signal exit codes as `128 + N`, we convert those negative exit codes into the standard form used by the rest of the code.
I've added a test that checks if output exceeding the limit is correctly truncated. To make the test more robust, I've defined a constant for the read chunk size.
| # When you send signal `N` to a subprocess to terminate it using Popen, it | ||
| # will return `-N` as its exit code. As we normally get `N + 128` back, we | ||
| # convert negative exit codes to the `N + 128` form. | ||
| returncode = -nsjail.returncode + 128 if nsjail.returncode < 0 else nsjail.returncode |
There was a problem hiding this comment.
Why here and not in the main function? I'd rather avoid returning a tuple.
There was a problem hiding this comment.
I'd rather avoid returning a tuple.
I'll change it, but what is your motivation for disliking it? Just personal preference?
There was a problem hiding this comment.
I tend to associate it with functions that are broader in scope than they should be. It could also be a sign of clumsiness, like when returning some sort of boolean in addition to data. It may indicate a function can be further refactored. It's also more difficult to document, especially when not using named tuples. Returning two ostensibly unrelated things in a tuple makes the purpose of the function less clear.
In fact, ideally this function would only be concerned with the output and not do anything to the process, but killing the process within doesn't seem avoidable. This also adds to the trouble of clearly documenting it: this function mainly returns the output, but, by the way, it may also kill the process. That may be an important distinction which wouldn't be clear at a glance.
|
|
||
| Once the subprocess has exited, either naturally or because it was terminated, | ||
| we return the exit code as an int and the output as a single string. | ||
| we rturn the output as a single string. |
There was a problem hiding this comment.
I don't know where the suggestion button went, but there is a typo here.
Recently, we discovered that for some code inputs, snekbox would get into an OOM event on the container level, seemingly bypassing the memory restrictions laid on code execution by NSJail.
After investigating the issue, we identified the culprit to be the STDOUT pipe we use to get output back from NSJail: As output is piped out of the jailed process, it will be collected outside of the NSJail in the main container process instead. This meant that our initial attempts of limiting the allowed filesize within the NSJail failed, as the OOM happened outside of the jailed environment.
To mitigate the issue, I've written a loop that consumes the STDOUT pipe in chunks of 100 characters. Once the size of the accrued output reaches a certain limit (currently set to 1 MB), we send a SIGTERM signal to NSJail to terminate itself. The output up to that point will be relayed back to the caller.
A minimal code snippet to trigger the event and the mitigation:
I've included a test for this vulnerability in
tests/test_nsjail.py.