New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AsyncResultTest.test_wait_for_send fails in 1-core VM #380
Comments
|
Hi! I’m going through and cleaning up old/stale issues on this repo. Sorry for not responding in a reasonable amount of time! I'd recommend skipping the test in that case. Race conditions are hard to rigorously eliminate in asynchronous distributed code, and this example does exactly that. It could probably be mocked sufficiently to fake the behavior, but that wouldn't accurately test the relevant code anymore. |
|
Aren't the tests there to be able to find and fix such race conditions, especially because it is so hard to do? |
|
To be clear, this test failure is not due to a bug in the code. This is a test for low-level machinery coordinating information between libzmq's C++ io threads and Python. The race is in the test itself, not the code—The code is behaving correctly even in the failure case. The error is a failure to produce the intended test scenario, due to the configuration of the VM. The case being tested:
The race is because libzmq immediately begins attempting to process the send in another GIL-less thread. Thread scheduling means this is nondeterministic, but in any realistic scenario, it takes a finite amount of time. I guess
The right thing to do is skip this test when run in an environment that cannot reproduce the test scenario. If there is an obvious way to detect this, I'd add the skip automatically. If there were a mechanism to force a delay into libzmq's underlying send, that would help. But I'm not aware of such a mechanism. |
While working on reproducible builds for openSUSE, I found that
our python-ipyparallel-6.2.4 package failed to build on 1-core Linux VMs because
ipyparallel/tests/test_asyncresult.py:357expects a timeout(0) to trigger an exception,but that depends on the scheduling, which happens differently in 1-core VMs.
It seems, one can trigger this behaviour by running the tests under
taskset 1The text was updated successfully, but these errors were encountered: