Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition when multiple users use isolate #26

Closed
seirl opened this issue Mar 7, 2017 · 11 comments
Closed

Race condition when multiple users use isolate #26

seirl opened this issue Mar 7, 2017 · 11 comments

Comments

@seirl
Copy link
Contributor

seirl commented Mar 7, 2017

There is a race condition when a lot of users are constantly using isolate in parallel, even if they try to play nice with each other.

Let's say you want to get a new box for your program. The "proper" way to do that is to list the directories in /var/lib/isolate/ and to --init with an available ID. But between the time you check that an ID is available and the time you call --init, another program might have done a --init for the same ID and you get the same cgroup and sandbox for both.

I see two solutions for that:

  • The first one is really simple to implement in isolate, it would be an option like --require-empty (name suggestions welcome) that makes isolate --init fail if the box folder already exists (mkdir returns EEXIST).
  • The second one is to move the responsibility of assigning box IDs from the calling program to isolate. When calling --init, you'd be able to specify, say, --box-id new. Then, isolate would try to find an available box id by looking at the directories in /var/lib/isolate, would try to mkdir it and if it gets EEXIST, it tries to find a new one until the directory has been properly created. Then, when --init is over, we print the box id along with the path of the box, so that --run and --cleanup can be called with this ID.

I'm okay to implement either solution as long as you give me approval for one of them and comments on the API & short/long option names (and any other implementation detail you might think of).

@seirl
Copy link
Contributor Author

seirl commented Mar 7, 2017

If we use the first solution, I also need to know how to signal that the fail was because a box already existed. A different exit code? Or just exit(1) and require the user to parse the output message themself to know that they need to retry?

@gollux
Copy link
Member

gollux commented Mar 7, 2017 via email

@hermanzdosilovic
Copy link
Contributor

I agree with @gollux, isolate should not care about how you use it's boxes. @seirl you said about "proper" way of handling race condition

The "proper" way to do that is to list the directories in /var/lib/isolate/ and to --init with an available ID. But between the time you check that an ID is available and the time you call --init, another program might have done a --init for the same ID and you get the same cgroup and sandbox for both.

I don't think this is "proper" way, because problem you have lies inside your solution of handling race conditions. You should change your solution and not isolate. 😄

As an example please take a look at Judge0 API. REST API that directly uses isolate to run untrusted programs. There, this race problem is solved by building Submission model which is stored in the database. It's unique ID inside database is used as box ID.

@seirl
Copy link
Contributor Author

seirl commented Mar 7, 2017

All of what you are describing is already what we are doing.

The problem is when someone decides to run two instances of our program, for instance a server that handles the requests (like what judge0 is doing) is already running, and then someone else decides to run the unit tests on the same machine.

I'm pretty sure running Judge0 alongside with an isolate testsuite will cause the same problem, they just ignore the problem completely from what I've seen.

@hermanzdosilovic
Copy link
Contributor

I now better understand what your problem is, but I still believe that isolate should not be the one fixing it. Isolate provides simple interface for running untrusted code on your server/machine. If you have concurrency problems when creating isolate boxes, then you should create another layer of abstraction above isolate which solves your problem. So you could create a program which has the same interface as isolate (or at least similar) which solves your concurrency problem, and this program should use isolate "under the hood".

"Quick" fix for you problem could be one of the following:

  • Run tests on another server, not production server
  • On your server compile two isolates. Production isolate which uses default box_root, and the other which uses some other box_root. See default.cf. Then use production isolate for production server, and this other isolate for test environment.
  • Wrap your application inside Docker

Also worth mentioning. Yes, Judge0 API has the same problem if you install it on your server in the "old school way". But main power of it is its mobility from machine to machine. And that is achieved with Docker. So every instance of Judge0 is run inside Docker container which has its own isolate.

@seirl
Copy link
Contributor Author

seirl commented Mar 7, 2017

While what you're saying makes sense, I think this is the only issue that prevents different programs from using isolate at the same time, and it's pretty minor so it could be easily changed.

Adding a --require-empty option is trivial and has some real value even for single programs, as it might also help find problems where you reuse a box ID that has not been properly --cleanup'd. For instance, if you forgot to --cleanup the lingering box IDs you're using when you start your frontend after a server hard reset, it might be nice to have that extra check that tells you "hey, you're trying to --init something that already exists, are you sure you want to do that?" and it would pretty much solve the RC problem for testsuites etc.

Are you really opposed to making that small change?

@gollux
Copy link
Member

gollux commented Mar 7, 2017 via email

@seirl
Copy link
Contributor Author

seirl commented Mar 7, 2017 via email

seirl added a commit to seirl/isolate that referenced this issue Mar 7, 2017
seirl added a commit to seirl/isolate that referenced this issue Mar 7, 2017
@seirl
Copy link
Contributor Author

seirl commented Mar 7, 2017

Added pull request #27 as a followup to @gollux 's first suggestion.

@niemela
Copy link
Member

niemela commented Mar 7, 2017

I think having --init failing when the directory isn't empty is the
best way to go.

+1

Silently "fixing" the error when assumptions seems to have been broken feels scary. Much better to fail early and controlled.

@seirl
Copy link
Contributor Author

seirl commented Mar 8, 2017

I also realized that another advantage of that PR is that it will ensure people check the exit code of their --cleanup call instead of assuming that it always work :-)
For instance, if the cgroup has not been deleted for some reason you really don't want to run something in the same box again, or else you might reuse that box.

It also forces you to cleanup everything when you start your frontend instead of assuming the directory is empty.

@gollux gollux closed this as completed in b9ce5b6 Jul 31, 2017
stefano-maggiolo added a commit to stefano-maggiolo/cms that referenced this issue Oct 23, 2017
Since the last version (see ioi/isolate#26),
isolate fails on init if the box already exists. We didn't cleanup in
two situations: when the box was kept around, and when the worker is
terminated in the middle of an execution. The former is easy to fix,
the latter is not, so we also essentially revert to the previous
behaviour by always calling cleanup before init.
stefano-maggiolo added a commit to stefano-maggiolo/cms that referenced this issue Oct 30, 2017
Since the last version (see ioi/isolate#26),
isolate fails on init if the box already exists. We didn't cleanup in
two situations: when the box was kept around, and when the worker is
terminated in the middle of an execution. The former is easy to fix,
the latter is not, so we also essentially revert to the previous
behaviour by always calling cleanup before init.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants