-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition when multiple users use isolate #26
Comments
If we use the first solution, I also need to know how to signal that the fail was because a box already existed. A different exit code? Or just |
Hello!
So far, we did not think of isolate as a front-end for regular users,
but more as a back-end for another service, which provides high-level
program testing services to users. In such cases, allocation of box IDs
is naturally handled by the high-level service as is waiting for free
ID to be available (you usually do not want the number of sandboxes
executed simultaneously to exceed the number of logical CPUs).
Could you please describe your situation in more detail?
|
I agree with @gollux, isolate should not care about how you use it's boxes. @seirl you said about "proper" way of handling race condition
I don't think this is "proper" way, because problem you have lies inside your solution of handling race conditions. You should change your solution and not isolate. 😄 As an example please take a look at Judge0 API. REST API that directly uses isolate to run untrusted programs. There, this race problem is solved by building Submission model which is stored in the database. It's unique ID inside database is used as box ID. |
All of what you are describing is already what we are doing. The problem is when someone decides to run two instances of our program, for instance a server that handles the requests (like what judge0 is doing) is already running, and then someone else decides to run the unit tests on the same machine. I'm pretty sure running Judge0 alongside with an isolate testsuite will cause the same problem, they just ignore the problem completely from what I've seen. |
I now better understand what your problem is, but I still believe that isolate should not be the one fixing it. Isolate provides simple interface for running untrusted code on your server/machine. If you have concurrency problems when creating isolate boxes, then you should create another layer of abstraction above isolate which solves your problem. So you could create a program which has the same interface as isolate (or at least similar) which solves your concurrency problem, and this program should use isolate "under the hood". "Quick" fix for you problem could be one of the following:
Also worth mentioning. Yes, Judge0 API has the same problem if you install it on your server in the "old school way". But main power of it is its mobility from machine to machine. And that is achieved with Docker. So every instance of Judge0 is run inside Docker container which has its own isolate. |
While what you're saying makes sense, I think this is the only issue that prevents different programs from using isolate at the same time, and it's pretty minor so it could be easily changed. Adding a Are you really opposed to making that small change? |
Are you really opposed to making that small change?
I understand that people sometimes initialize a box with leftover files by
mistake. If we want to help them avoid that trap, then the right solution is
not to add a special option, but to to make --init fail in all such cases.
Or perhaps to make --init do an implicit --cleanup.
Also, this would not solve the race conditions this thread started with...
there would still be a small time window between checking that the directory
is empty and populating it, when another instance of isolate can step in.
Having a real allocator of box IDs (either as a part of isolate or as
a stand-alone program) would help, but so far I have an impression that
automatic allocation of box IDs is far from trivial: in most cases, you want
to avoid running more boxes simultaneously; also, you might to pin boxes to
specific CPUs to improve consistency of results. So the supply of available
boxes is usually very limited and the allocation requests would often fail.
I will think about it a little bit more...
|
2017-03-07 17:20 GMT+01:00 Martin Mareš <notifications@github.com>:
> Are you really opposed to making that small change?
I understand that people sometimes initialize a box with leftover files by
mistake. If we want to help them avoid that trap, then the right solution is
not to add a special option, but to to make --init fail in all such cases.
That would work for me, yes.
Or perhaps to make --init do an implicit --cleanup.
I don't like this one at all, there might be important things inside
and it's not even guaranteed that the --cleanup will succeed. For
instance, if the sandbox crashes, it won't restore the permissions
because it requires a manual investigation when that happens.
I think having --init failing when the directory isn't empty is the
best way to go.
Also, this would not solve the race conditions this thread started with...
there would still be a small time window between checking that the directory
is empty and populating it, when another instance of isolate can step in.
If the check is only "the directory should not already exist", then we
can do it on mkdir() which is atomic, and it solves the problem.
Having a real allocator of box IDs (either as a part of isolate or as
a stand-alone program) would help, but so far I have an impression that
automatic allocation of box IDs is far from trivial: in most cases, you want
to avoid running more boxes simultaneously; also, you might to pin boxes to
specific CPUs to improve consistency of results. So the supply of available
boxes is usually very limited and the allocation requests would often fail.
Yes, I am now convinced that having an allocator of IDs as part of
isolate is a bad idea and that it should be the responsibility of the
caller.
…--
Antoine Pietri
|
+1 Silently "fixing" the error when assumptions seems to have been broken feels scary. Much better to fail early and controlled. |
I also realized that another advantage of that PR is that it will ensure people check the exit code of their It also forces you to cleanup everything when you start your frontend instead of assuming the directory is empty. |
Since the last version (see ioi/isolate#26), isolate fails on init if the box already exists. We didn't cleanup in two situations: when the box was kept around, and when the worker is terminated in the middle of an execution. The former is easy to fix, the latter is not, so we also essentially revert to the previous behaviour by always calling cleanup before init.
Since the last version (see ioi/isolate#26), isolate fails on init if the box already exists. We didn't cleanup in two situations: when the box was kept around, and when the worker is terminated in the middle of an execution. The former is easy to fix, the latter is not, so we also essentially revert to the previous behaviour by always calling cleanup before init.
There is a race condition when a lot of users are constantly using isolate in parallel, even if they try to play nice with each other.
Let's say you want to get a new box for your program. The "proper" way to do that is to list the directories in
/var/lib/isolate/
and to--init
with an available ID. But between the time you check that an ID is available and the time you call--init
, another program might have done a--init
for the same ID and you get the same cgroup and sandbox for both.I see two solutions for that:
--require-empty
(name suggestions welcome) that makes isolate--init
fail if the box folder already exists (mkdir returns EEXIST).--init
, you'd be able to specify, say,--box-id new
. Then, isolate would try to find an available box id by looking at the directories in/var/lib/isolate
, would try tomkdir
it and if it gets EEXIST, it tries to find a new one until the directory has been properly created. Then, when--init
is over, we print the box id along with the path of the box, so that--run
and--cleanup
can be called with this ID.I'm okay to implement either solution as long as you give me approval for one of them and comments on the API & short/long option names (and any other implementation detail you might think of).
The text was updated successfully, but these errors were encountered: