Errors in timeouts #664

olexandr-konovalov · 2016-03-09T10:04:54Z

We have recently merged #342 and it seems that the stability of timeouts code suffered from this. I am creating a new issue to avoid this being overlooked and setting 4.8.3 milestone - at least to revert #342 in the stable-4.8 branch. Here I am collecting reports on observed problems:

master branch, 32-bit build on Ubuntu 14.04:

########> Diff in /home/hudson/hudson/workspace/GAP-dev-compilers/GAPCOPTS/32b\ 
uild/GAPREADLINE/noreadline/label/clang33/GAP-git-snapshot/tst/testinstall/tim\
eout.tst:25
# Input is:
CallWithTimeout(100000, spin2,2000,50000);
# Expected output:
[ true, [ false ] ]
# But found:
Error, Could not set interval timer
########

2. Hanging builds, for example:
* Cygwin64 on Windows 
* 64-bit build with gcc 4.8.4 on Ubuntu 14.04

The text was updated successfully, but these errors were encountered:

hulpke · 2016-03-09T17:23:32Z

This same issue seems to cause the build of #667 to fail. Is there any reason to make the timeout test a part of the standard install tests?

olexandr-konovalov · 2016-03-09T20:24:23Z

@hulpke timeouts were actually there in standard tests before #342, and caused no problems. It is nested timeouts code which triggered this instability. I suggest to take it out of the stable branch for now.

Also, the timeout.tst indeed increases the duration of testinstall, maybe we should move it into teststandard - becayse of using TestDirectory this is as simple as git mv ....

Another often seen diff - this is nested timeout:

########> Diff in /home/hudson/hudson/workspace/GAP-dev/GAPCOPTS/64build/GAPGM\
P/gmp/GAPTARGET/install/label/graupius/GAP-git-snapshot/tst/testinstall/timeout.tst:33
# Input is:
ct := 0;; CallWithTimeout(1000000, function() while true do spin2(500,500000);\
ct := ct+1; od; end);; 2 <= ct; ct <= 10;
# Expected output:
true
true
# But found:
false
true
########

olexandr-konovalov · 2016-03-10T17:12:48Z

I've submitted PR #668 to revert nested alarms in the stable-4.8 branch.

markuspf · 2016-03-10T19:22:16Z

Mhm. Even after an hour of

while true do
  Test("tst/testinstall/timeout.tst");
od;

On my DragonFlyBSD system I cannot get the timeout test to fail or hang. On yin (x86_64 Linux) the timeout test hangs every so often, and bizarrely does so in a call to getrusage (according to an attached gdb). I wonder whether Linux gets something into a twist there wrt syscalls and signals? I'll try next on the Ubuntu cloud machine...

markuspf · 2016-03-10T19:31:51Z

Ok, so DragonFly seems to be using one of the implementations of timeouts, wheras linux uses the other. Now to find out what the differences are.

markuspf · 2016-03-28T11:37:31Z

I fixed this in #672, the hangs were due to setting a timer with timeout 0.

olexandr-konovalov · 2016-03-28T11:55:57Z

#672 is done against the master branch, so I am re-assigning this issue to the GAP 4.9.0 milestone. I think this corresponds to the policy of adding new features in the master branch and bugfixes in the stable (with minor exceptions, if needed).

fingolfin · 2017-01-09T14:22:51Z

See also issue #765

So, is this still occuring? And both for Ubuntu and Windows?

markuspf · 2017-01-09T16:24:32Z

No, I fixed the hangs in #672 (at least for any system that uses posix timers)

fingolfin · 2017-03-16T22:50:38Z

We (i.e. @alex-konovalov @markuspf and me) discussed this in France, and came to the conclusion that it may be impossible to fix all the issues with CallWithTimeout -- in particular, it can leave you in an inconsistent state, and the only way to fix that in general is GAP -- else you may be doing invalid computations without even realizing.

So perhaps we should really considering to phase this feature out again in 4.9, and tell people to not use it anymore...

ChrisJefferson · 2017-03-17T09:29:23Z

Just to say, I agree it's unfixable -- and the problem is real, I knocked up a quick "fake experiment" and found I quickly got nonsense results due to groups having broken stabilizer chains.

We can bring back similar functionality in HPC-GAP by just killing a whole thread off (as long as it doesn't have any locks at the time).

Also an "interested party" could add a similar function to the IO package, which used fork -- I would be happy to discuss how to implement such a function but won't have time to do it myself.

fingolfin · 2017-03-17T09:34:37Z

Of course io already offers functionality for running background tasks, including e.g. ParTakeFirstResultByFork, and I assume this takes care of many needs -- well, on Unix at least, tough luck for Windows users.

But I am curious how you'd do timeouts using io without incurring similar problems -- at least if by that you mean that in the "main" GAP a callback handler is called when the timeout runs out -- as that callback would also potentially see corrupt data, depending on what the main GAP was doing at the moment (granted, the effect would be much smaller, and a careful callback should be able to avoid issues). But actually I am not sure it would even be worth the bother -- the existing work farm code in io should cover 95% of needs for this anyway, shouldn't it?

Anyway, as to removing this code (or rather: deprecating it): If we really want to do that, somebody should bring this up on the gap@ list ASAP for broader discussions.

ChrisJefferson · 2017-03-17T09:53:35Z

Sorry, this is offtopic, but I'll quickly answer.

It would be easy(ish) to extend Par*Fork functions take a timeout which applies to each problem individually -- that is (I think) what many people are using CallWithTimeout for -- to run a series of experiments. When the timeout runs out, we can just kill the forked GAP and make a new one if we want to run another test.

It had fundamental problems which seem unlikely to be resolved. See also issues gap-system#664, gap-system#695, gap-system#765.

It had fundamental problems which seem unlikely to be resolved. See also issues #664, #695, #765.

fingolfin · 2017-06-11T16:00:42Z

Resolved by #1324.

olexandr-konovalov added this to the GAP 4.8.3 milestone Mar 9, 2016

olexandr-konovalov modified the milestones: GAP 4.8.4, GAP 4.8.3 Mar 16, 2016

olexandr-konovalov modified the milestones: GAP 4.9.0, GAP 4.8.4 Mar 28, 2016

fingolfin assigned markuspf Mar 16, 2017

This was referenced Mar 16, 2017

Heavy use of CallWithTimeout makes GAP crash / produce wrong results #695

Closed

CallWithTimeout broken on Windows #765

Closed

HPC-GAP doesn't support CallWithTimeout #377

Closed

Extend Timeouts with memory limits maybe? #373

Closed

fingolfin added a commit to fingolfin/gap that referenced this issue May 9, 2017

Remove the timeout code

e25278b

It had fundamental problems which seem unlikely to be resolved. See also issues gap-system#664, gap-system#695, gap-system#765.

fingolfin mentioned this issue May 9, 2017

Remove the timeout code #1324

Merged

fingolfin added a commit to fingolfin/gap that referenced this issue May 19, 2017

Remove the timeout code

f019951

It had fundamental problems which seem unlikely to be resolved. See also issues gap-system#664, gap-system#695, gap-system#765.

fingolfin added a commit to fingolfin/gap that referenced this issue Jun 8, 2017

Remove the timeout code

81c8330

It had fundamental problems which seem unlikely to be resolved. See also issues gap-system#664, gap-system#695, gap-system#765.

ChrisJefferson pushed a commit that referenced this issue Jun 11, 2017

Remove the timeout code

c1a673f

It had fundamental problems which seem unlikely to be resolved. See also issues #664, #695, #765.

fingolfin closed this as completed Jun 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors in timeouts #664

Errors in timeouts #664

olexandr-konovalov commented Mar 9, 2016

hulpke commented Mar 9, 2016

olexandr-konovalov commented Mar 9, 2016

olexandr-konovalov commented Mar 10, 2016

markuspf commented Mar 10, 2016

markuspf commented Mar 10, 2016

markuspf commented Mar 28, 2016

olexandr-konovalov commented Mar 28, 2016

fingolfin commented Jan 9, 2017

markuspf commented Jan 9, 2017

fingolfin commented Mar 16, 2017

ChrisJefferson commented Mar 17, 2017

fingolfin commented Mar 17, 2017

ChrisJefferson commented Mar 17, 2017

fingolfin commented Jun 11, 2017

Errors in timeouts #664

Errors in timeouts #664

Comments

olexandr-konovalov commented Mar 9, 2016

hulpke commented Mar 9, 2016

olexandr-konovalov commented Mar 9, 2016

olexandr-konovalov commented Mar 10, 2016

markuspf commented Mar 10, 2016

markuspf commented Mar 10, 2016

markuspf commented Mar 28, 2016

olexandr-konovalov commented Mar 28, 2016

fingolfin commented Jan 9, 2017

markuspf commented Jan 9, 2017

fingolfin commented Mar 16, 2017

ChrisJefferson commented Mar 17, 2017

fingolfin commented Mar 17, 2017

ChrisJefferson commented Mar 17, 2017

fingolfin commented Jun 11, 2017