Rework Code Evaluation #195

mbaruh · 2022-08-04T20:10:45Z

Currently the code evaluation suffers from possible exploits. These stem from how code evaluation currently works on the forms app:

The user's code, as well as the pre-supplied unit tests, are sent to snekbox.
The unit tests are run on the user's code inside snekbox.
The result of the tests are supplied back from snekbox through stdout.

This means that if the user manages to control the stdout, they are able to control the output of the tests, as far as the forms app can understand it.

There are several means in place already to mitigate this issue, but to solve it, this ultimately requires a more thorough/drastic solution.

Solution 1

Separate user code from the tests.

The user code alone will be sent to snekbox.
The result of the user's code will be fed to stdout.
The stdout will be read from snekbox.
The output will be compared to a pre-supplied string to check whether the user's code passes (similarly to how it works in Code Wars).

This has the disadvantage of limiting what we can evaluate, as currently we are able to inspect Python objects during testing.

Solution 2

We can see whether we can add an additional way to supply information from snekbox, specifically for the purpose of getting test responses. For example via an encrypted byte-stream. Either way something that can't be accessed from the evaluated code. The details can be hashed out if this is something we want to do.

In any case, considering the protections we already have in place against something like that, and considering Code Jam qualifiers are meant to assess a person's Python knowledge, even if someone is able to exploit the current system, it's not the worst thing in the world (although preferrably it wouldn't happen anyway). Therefore, this issue should be relatively low priority.

ShakyaMajumdar · 2022-08-05T07:25:44Z

I propose a slight modification to Solution 1.

To make Forms's unittesting as general as possible, the API could accept python code in a setup field for each unittest form field. Forms evaluates^[1] this setup code with the form submitter's response, and then sends this result to snekbox. snekbox sends back stdout, forms matches it with some output field.

The test template currently in forms is then moved to Code Jam backend, which sends a request like

{
    "setup": // essentially https://github.com/python-discord/forms-backend/blob/main/backend/routes/forms/unittesting.py#L104-L108
    "output": "5 Tests Passed" // i.e., output of the unittests patched in by setup
}

This maintains the current behaviour: you get to test object states and such, but its possible for the submitter to control stdout. Not an issue, since

Code Jam qualifiers are meant to assess a person's Python knowledge, even if someone is able to exploit the current system, it's not the worst thing in the world

For Esoteric Challenges, a setup field won't be necessary - the code submitted by the user is directly sent to snekbox, and what that code prints to stdout is directly matched against the output field.

[1] Simplest way is to run it within Forms backend, but that assumes we trust people who create forms. Marginally more complicated way is to run the setup in snekbox too, with the minor downside of having two snekbox requests per test.

HassanAbouelela · 2022-08-05T08:02:56Z

Running any user code directly on the backend is a no-go. The part I don’t understand about your proposal is what it ends up achieving. You still can’t check state the way you described because the same issue of not being able to safely communicate between the unit test suite and the backend remains.

Also to touch on Zig’s last point: the discussion yesterday was that this issue is currently sitting on low priority, and adding support for things that aren’t the code jam isn’t something we’ll look into unless we have a good idea.

ShakyaMajumdar · 2022-08-05T08:43:00Z

what it ends up achieving

It removes the possibility of exploiting stdout in tests where we directly match the user's prints, while letting you keep the current behaviour of code jams where you check prints by unittest

That is to say, you can decide on a per-event (or even per-test) basis whether you want (unexploitable stdout + no unittests) or (exploitable eval + unittests)

HassanAbouelela · 2022-08-05T17:16:53Z

Isn’t that just suggestion 1?

ShakyaMajumdar · 2022-08-05T17:46:15Z

You don't get to keep code jam's unit tests with suggestion 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework Code Evaluation #195

Rework Code Evaluation #195

mbaruh commented Aug 4, 2022 •

edited

ShakyaMajumdar commented Aug 5, 2022

HassanAbouelela commented Aug 5, 2022

ShakyaMajumdar commented Aug 5, 2022 •

edited

HassanAbouelela commented Aug 5, 2022

ShakyaMajumdar commented Aug 5, 2022

Rework Code Evaluation #195

Rework Code Evaluation #195

Comments

mbaruh commented Aug 4, 2022 • edited

Solution 1

Solution 2

ShakyaMajumdar commented Aug 5, 2022

HassanAbouelela commented Aug 5, 2022

ShakyaMajumdar commented Aug 5, 2022 • edited

HassanAbouelela commented Aug 5, 2022

ShakyaMajumdar commented Aug 5, 2022

mbaruh commented Aug 4, 2022 •

edited

ShakyaMajumdar commented Aug 5, 2022 •

edited