New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is_alive doctest failure in map_reduce #24241
Comments
This comment has been minimized.
This comment has been minimized.
comment:2
I'm not sure an obvious way to reproduce this, but maybe we could go ahead and merge #21233 and see if that fixes it? I've been waiting forever for someone to just give it positive review (which it had previously but Volker removed it...) |
comment:4
Replying to @videlec:
Open an issue on the patchbot GitHub project for that. I would love that too but it's probably not entirely trivial (if nothing else we'd want to index the report logs). |
comment:5
I'm not totally sure this was fixed by #21233. Now, on several of my Cygwin patchbot runs, this module fails on the initial test run, not quite in the way reported by this ticket, but possibly similar. I get |
comment:6
I'm also seeing this on the buildbot |
Changed keywords from none to random_fail |
comment:7
This is just to say that I got this again. |
This comment has been minimized.
This comment has been minimized.
comment:9
The doctest looks like a race condition. If I'm understanding things correctly, the workers are started and will then stop naturally (after an unspecified amount of time). If they stop really quickly, then this doctest will fail:
|
comment:10
I can make this test fail pretty consistently with
If a doctest is sensitive to 20ms delays, it's a bad test. |
comment:11
Indeed; I see the problem here. When I originally commented on this ticket, I admit, I don't think I looked very closely at the exact test that was failing. If there's no work for the workers to do, then there's no guarantee that you'll ever find them all running simultaneously. If you really wanted to test this, one possibility might be to set up a test logger that collects all log messages in a list, and then checks that the expected log messages are found (e.g. one "Started" and one "Exiting" for each worker started. |
comment:12
Replying to @embray:
Thanks to all of you for catching this one. I'm confirming jdemeyer analysis. If there is no work to do, there is no robust lower bound for the time the worker stays alive. @embray: there is a logger is the code but the level is normally set too low to see the message. Another possibilities would be to give as work to the worker a |
comment:13
Sorry based my file on the wrong branch... Fixing it |
comment:14
The doctest fix looks good on first sight, I would still keep the I cannot really comment on the other changes, which seem to be related to Python 3. New commits:
|
Commit: |
Branch pushed to git repo; I updated commit sha1. New commits:
|
comment:16
Replying to @jdemeyer:
Sorry based my file on the wrong branch... Should be fixed now New commits:
|
Branch pushed to git repo; I updated commit sha1. New commits:
|
Author: Florent Hivert |
Reviewer: Jeroen Demeyer |
comment:20
Replying to @jdemeyer: |
Changed branch from u/hivert/is_alive_doctest_failure_in_map_reduce to |
Some patchbots report unstopped workers
see
CC: @hivert
Component: combinatorics
Keywords: random_fail
Author: Florent Hivert
Branch/Commit:
6eeda41
Reviewer: Jeroen Demeyer
Issue created by migration from https://trac.sagemath.org/ticket/24241
The text was updated successfully, but these errors were encountered: