is_alive doctest failure in map_reduce #24241

videlec · 2017-11-19T12:38:09Z

Some patchbots report unstopped workers

sage -t --long src/sage/parallel/map_reduce.py
**********************************************************************
File "src/sage/parallel/map_reduce.py", line 1090, in sage.parallel.map_reduce.RESetMapReduce.start_workers
Failed example:
    all(w.is_alive() for w in S._workers)
Expected:
    True
Got:
    False
**********************************************************************
1 item had failures:
   1 of   9 in sage.parallel.map_reduce.RESetMapReduce.start_workers
    [296 tests, 1 failure, 28.70 s]
----------------------------------------------------------------------
sage -t --long src/sage/parallel/map_reduce.py  # 1 doctest failed
----------------------------------------------------------------------

see

this sardonis log

CC: @hivert

Component: combinatorics

Keywords: random_fail

Author: Florent Hivert

Branch/Commit: 6eeda41

Reviewer: Jeroen Demeyer

Issue created by migration from https://trac.sagemath.org/ticket/24241

The text was updated successfully, but these errors were encountered:

embray · 2017-11-22T12:26:46Z

comment:2

I'm not sure an obvious way to reproduce this, but maybe we could go ahead and merge #21233 and see if that fixes it? I've been waiting forever for someone to just give it positive review (which it had previously but Volker removed it...)

videlec · 2017-11-22T14:30:51Z

comment:3

Replying to @embray:

I'm not sure an obvious way to reproduce this, but maybe we could go ahead and merge #21233 and see if that fixes it?

+1. And I would love to have a way to grep through the patchbot reports easily!

embray · 2017-11-22T15:54:22Z

comment:4

Replying to @videlec:

Replying to @embray:

I'm not sure an obvious way to reproduce this, but maybe we could go ahead and merge #21233 and see if that fixes it?

+1. And I would love to have a way to grep through the patchbot reports easily!

Open an issue on the patchbot GitHub project for that. I would love that too but it's probably not entirely trivial (if nothing else we'd want to index the report logs).

embray · 2017-12-21T10:22:03Z

comment:5

I'm not totally sure this was fixed by #21233. Now, on several of my Cygwin patchbot runs, this module fails on the initial test run, not quite in the way reported by this ticket, but possibly similar. I get sage -t --long src/sage/parallel/map_reduce.py # Timed out after testing finished which is something I've never seen before...

vbraun · 2017-12-25T18:17:04Z

comment:6

I'm also seeing this on the buildbot

vbraun · 2017-12-25T18:17:04Z

Changed keywords from none to random_fail

jdemeyer · 2018-07-12T11:35:33Z

comment:7

This is just to say that I got this again.

jdemeyer · 2018-07-12T11:41:54Z

comment:9

The doctest looks like a race condition. If I'm understanding things correctly, the workers are started and will then stop naturally (after an unspecified amount of time). If they stop really quickly, then this doctest will fail:

            sage: from sage.parallel.map_reduce import RESetMapReduce
            sage: S = RESetMapReduce(roots=[])
            sage: S.setup_workers(2)
            sage: S.start_workers()
            sage: all(w.is_alive() for w in S._workers)
            True

jdemeyer · 2018-07-12T11:47:49Z

comment:10

I can make this test fail pretty consistently with

sage: sleep(0.02); all(w.is_alive() for w in S._workers)

If a doctest is sensitive to 20ms delays, it's a bad test.

embray · 2018-07-12T12:51:44Z

comment:11

Indeed; I see the problem here. When I originally commented on this ticket, I admit, I don't think I looked very closely at the exact test that was failing.

If there's no work for the workers to do, then there's no guarantee that you'll ever find them all running simultaneously.

If you really wanted to test this, one possibility might be to set up a test logger that collects all log messages in a list, and then checks that the expected log messages are found (e.g. one "Started" and one "Exiting" for each worker started.

hivert · 2018-07-12T14:23:28Z

comment:12

Replying to @embray:

If you really wanted to test this, one possibility might be to set up a test logger that collects all log messages in a list, and then checks that the expected log messages are found (e.g. one "Started" and one "Exiting" for each worker started.

Thanks to all of you for catching this one. I'm confirming jdemeyer analysis. If there is no work to do, there is no robust lower bound for the time the worker stays alive.

@embray: there is a logger is the code but the level is normally set too low to see the message. Another possibilities would be to give as work to the worker a sleep(1) instruction.

hivert · 2018-07-12T14:27:45Z

Branch: u/hivert/is_alive_doctest_failure_in_map_reduce

hivert · 2018-07-12T14:27:45Z

comment:13

Sorry based my file on the wrong branch... Fixing it

jdemeyer · 2018-07-12T14:31:10Z

comment:14

The doctest fix looks good on first sight, I would still keep the sleep(1) in the last test though.

I cannot really comment on the other changes, which seem to be related to Python 3.

New commits:

`8778e24`	`Tentative fix of MapReduce.is_alive`

jdemeyer · 2018-07-12T14:31:10Z

Commit: 8778e24

sagetrac-git · 2018-07-12T14:35:20Z

Changed commit from 8778e24 to 9c1ab33

sagetrac-git · 2018-07-12T14:35:20Z

Branch pushed to git repo; I updated commit sha1. New commits:

`9c1ab33`	`Fixed wrong base`

hivert · 2018-07-12T14:35:42Z

comment:16

Replying to @jdemeyer:

The doctest fix looks good on first sight, I would still keep the sleep(1) in the last test though.

I cannot really comment on the other changes, which seem to be related to Python 3.

Sorry based my file on the wrong branch... Should be fixed now

New commits:

`9c1ab33`	`Fixed wrong base`

sagetrac-git · 2018-07-12T16:04:30Z

Branch pushed to git repo; I updated commit sha1. New commits:

`6eeda41`	`Put back timeout to 1`

sagetrac-git · 2018-07-12T16:04:30Z

Changed commit from 9c1ab33 to 6eeda41

hivert · 2018-07-12T16:14:03Z

Author: Florent Hivert

jdemeyer · 2018-07-13T09:08:31Z

Reviewer: Jeroen Demeyer

hivert · 2018-07-13T14:12:20Z

comment:20

Replying to @jdemeyer:
Thanks Jeroen

vbraun · 2018-08-05T08:43:32Z

Changed branch from u/hivert/is_alive_doctest_failure_in_map_reduce to 6eeda41

videlec added this to the sage-8.1 milestone Nov 19, 2017

videlec added c: combinatorics labels Nov 19, 2017

This comment has been minimized.

Sign in to view

jdemeyer changed the title ~~unstopped MapReduce workers~~ is_alive doctest failure in map_reduce Jul 12, 2018

This comment has been minimized.

Sign in to view

hivert added the s: needs review label Jul 12, 2018

jdemeyer added s: positive review and removed s: needs review labels Jul 13, 2018

vbraun removed the s: positive review label Aug 5, 2018

vbraun closed this as completed in 5920569 Aug 5, 2018

embray mentioned this issue Dec 11, 2017

Fix to RESetMapReduce timeout #21233

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is_alive doctest failure in map_reduce #24241

is_alive doctest failure in map_reduce #24241

videlec commented Nov 19, 2017

This comment has been minimized.

embray commented Nov 22, 2017

videlec commented Nov 22, 2017

embray commented Nov 22, 2017

embray commented Dec 21, 2017

vbraun commented Dec 25, 2017

vbraun commented Dec 25, 2017

jdemeyer commented Jul 12, 2018

This comment has been minimized.

jdemeyer commented Jul 12, 2018

jdemeyer commented Jul 12, 2018

embray commented Jul 12, 2018

hivert commented Jul 12, 2018

hivert commented Jul 12, 2018

hivert commented Jul 12, 2018

jdemeyer commented Jul 12, 2018

jdemeyer commented Jul 12, 2018

sagetrac-git mannequin commented Jul 12, 2018

sagetrac-git mannequin commented Jul 12, 2018

hivert commented Jul 12, 2018

sagetrac-git mannequin commented Jul 12, 2018

sagetrac-git mannequin commented Jul 12, 2018

hivert commented Jul 12, 2018

jdemeyer commented Jul 13, 2018

hivert commented Jul 13, 2018

vbraun commented Aug 5, 2018

is_alive doctest failure in map_reduce #24241

is_alive doctest failure in map_reduce #24241

Comments

videlec commented Nov 19, 2017

This comment has been minimized.

embray commented Nov 22, 2017

videlec commented Nov 22, 2017

embray commented Nov 22, 2017

embray commented Dec 21, 2017

vbraun commented Dec 25, 2017

vbraun commented Dec 25, 2017

jdemeyer commented Jul 12, 2018

This comment has been minimized.

jdemeyer commented Jul 12, 2018

jdemeyer commented Jul 12, 2018

embray commented Jul 12, 2018

hivert commented Jul 12, 2018

hivert commented Jul 12, 2018

hivert commented Jul 12, 2018

jdemeyer commented Jul 12, 2018

jdemeyer commented Jul 12, 2018

sagetrac-git mannequin commented Jul 12, 2018

sagetrac-git mannequin commented Jul 12, 2018

hivert commented Jul 12, 2018

sagetrac-git mannequin commented Jul 12, 2018

sagetrac-git mannequin commented Jul 12, 2018

hivert commented Jul 12, 2018

jdemeyer commented Jul 13, 2018

hivert commented Jul 13, 2018

vbraun commented Aug 5, 2018