Broken timeouts #3
Unfortunately this doesn't resolve all of the issues that come up.
(1) If you allow extremely heavy nesting of timeouts and you don't handle the exceptions in the timeout, sometimes the "guarantee" that the timeout exception doesn't propagate up past the timeout command fails. This can mean that sometimes a bot can cause the entire tournament to crash!
(2) The "timeout" command uses threadDelay, which has no real guarantees w.r.t. timing - from the documentation:
Suspends the current thread for a given number of microseconds (GHC only).
There is no guarantee that the thread will be rescheduled promptly when the delay has expired, > but the thread will never continue to run earlier than specified.
I have a partial solution that partially resolves the Issue #3 as well as problems (1) and (2) above, but ultimately because of the nature of threading there is no guarantee that you'll get the same results every time you run the same pair of bots.
Basically,
i) Have an inner "time" function without the handleAll part, so that each timeout exception can only be caught by the associated timeout, and not by inner ones
ii) When running a bot for the tournament, then you wrap it in a handler so that causing an exception can't cause the tournament to crash.
iii) Instead of giving bots access to "runBot" the way they currently do, give them access to a runBot command that looks something like this:
runBot bot opp hist = do
threadDelay 10
runBot_ bot opp hist
That way, timeouts are much more likely to happen roughly when they're actually supposed to happen.
You still run into issues if two timeouts are scheduled at roughly the same time, though, because of race conditions - it becomes essentially indeterminate if the inner timeout will trigger, the outer timeout will trigger, or if both will trigger.
Here's one way to avoid the issues with threads and timing:
instead of timeouts, allow the user to limit the recursive simulation depth, or the number of calls to runBot. That would get you much more consistent results.
I've been poking around trying to figure out if there is a clean way to limit the total number of calls to runBot; so far I haven't come up with anything workable. If you have any ideas, I would absolutely love to hear them.
I think a recursion depth limit is probably better than a # of calls limit.
Anyways; here's some quick thoughts:
(1) Analogously to timeout from System.Timeout, you can wrap every call to runBot into a handler that handles only the exception associated with its unique ID.
(2) However, unlike "timeout", you don't have a separate timer thread for each timeout; instead, you just need to keep track of the sequence of calls.
(3) Every time runBot is called, you would want to add 1 to the depth / call count (if you're doing depth you also need to subtract 1 from the depth when the call finishes).
To make sure that the recursion works correctly, with inner bots having their own limits, you could store a stack of (ID, limit) pairs, where the top of the stack is always going to be the limit that is going to happen the soonest. As soon as the topmost limit is hit you trigger an exception with the associated ID; this will then be caught by the associated handler, returning a value of "Nothing" to signify that the computation terminated unsuccessfully.
Any computation that finishes, or is terminated by an exception, also needs to pop its entry off the stack (if it has one); fortunately that's pretty easy; you can do it using the bracket pattern
http://www.haskell.org/haskellwiki/Bracket_pattern
the same way as is done by System.Timeout.
I think having a recursion depth limit is useful and very desirable, but a # of calls limit is also necessary, because runMatch needs to ensure that bots don't hog all the computational resources, and a depth limit does not prevent a bot from running lots of simulations one after another.
And thank you for linking to the bracket pattern -- I had no idea that pattern was formalized and had a name.
On second thought -- I think I'm just going to re-implement "time" so that each function call throws its own custom exception that only it can catch, so that the behavior described in the README works properly; a tournament with a fixed number of runBot calls is something I'll probably run another time.
If you remove the handleAll in time, but keep using timeout, then I believe bots can still break time by calling handleAll themselves. (Of course, you can just say that this is an abuse of the rules and reject those bots.)
Actually, because handling exceptions can only be done within the IO monad, bots can't use "handleAll" unless explicitly given permission to do so within the BotEnvironment monad.
Interestingly enough, though, there's an asymmetry between catching and throwing exceptions, because although they can't catch them, bots can still throw exceptions. For example, a bot could call error "HA". That isn't really a problem, though, because the only thing you can achieve by doing that is to make bots defect against you where they otherwise wouldn't do so.
Actually, I do see that as a somewhat significant problem--if you anticipate that you can do no better than (Defect, Defect) against an opponent, you can force this outcome by calling error every time.
A somewhat related point: Is there some elegant way of generating exception types in Haskell, to do what I described in the comment above?
ErrorBot is strictly dominated by DefectBot because ErrorBot is the same as DefectBot but sometimes gets (D, D) where DefectBot would get (D, C).
As for generating unique exceptions, that's already how the "timeout" function in System.Timeout works. You can see a description here:
http://chimera.labs.oreilly.com/books/1230000000929/ch09.html#sec_timeout
Thank you - I haven't gotten around to reading Marlow's book yet. In this case, it seems like the simplest fix is simply to have time = timeout, and then put a more generic handler in runMatch.
I have had displayTournament fail to exit due to what I am assuming is issue being discussed.
I changed the code like so:
instance BotEnvironment IO where
rand = randomRIO (0.0, 1.0)
time i = timeout i . evalM
and things now seem to work as expected.
Sorry to seem impatient, but - do you have an ETA on a fix? I'm happy to design my bot using time i = timeout i . evalM, but I'd be more confident if I was running under the exact same code that the tournament itself will use.
with time i = timeout i . evalM something goes awry and this defect bot dominates due to timing issues:
quickMirrorBot :: Bot
quickMirrorBot = Bot (\op hist -> do
simulation <- time (1) . runBot op mirrorBot $ invert hist
return (case simulation of
Nothing -> Defect
Just move -> move))
Sorry for the delay, I've been rather busy. I will likely extend the deadline to compensate for that. I'm working on the issue now.
Please see the proposed fix--I did some testing and I think it solves the issues discussed, but I'm not 100% sure.
Agbell, that example fails because "time 1" is not enough time for any simulation to complete.
By handling all exceptions in your timeout function, you're actually
making timeouts work in an unreasonable way.
The "timeout" function works by raising an exception in the
calculating thread, and so in the case of nested timeouts, you want
the innermost timeout to only handle its own exception. Otherwise, it
might catch exceptions from outer levels, causing the outer timeouts
to fail to work as expected.
For example, if I wrote a bot that did this:
time (6 * 10 ^ 6) . [do something that takes a long time]
then the 5 * 10 ^ 6 limit will trigger before the 6 * 10 ^ 6 limit does, and so the supposed "limit" on how long the bot is allowed for its turn will be averted.
I'm pretty sure that's the core reason for the weirdness being discussed here:
http://lesswrong.com/r/discussion/lw/knd/announcing_the_2014_program_equilibrium_iterated/b6dg