New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix random test failure #84
Conversation
63cab0f
to
3bd6bd7
Compare
|
rebased to master and added more sleeps to make tests pass. |
|
Often the behaviour of 1-core-VMs can be simulated by prefixing a command with |
|
I don't really know |
|
i'm no longer working on execnet |
testing/test_gateway.py
Outdated
| @@ -90,6 +91,7 @@ def test_gateway_status_busy(self, gw): | |||
| ch2.waitclose() | |||
| for i in range(10): | |||
| status = gw.remote_status() | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gw.remote_status() is a synchronous operation -- no time.sleep() should be neccessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it was already repeated 10 times - so it is waiting on something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also the test run from the first commit:
https://travis-ci.org/pytest-dev/execnet/jobs/427581580
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe with the introduction of the execmodel backends behaviour changed the expectations and a wait-for with some sleep time is needed to accommodate for the extra sleep time
|
Do i see it correctly that all test failures are bout "nice" settings and verifications? If so, i'd say there is some flakyness there and the time.sleep() just make it less likely issues occur. |
testing/test_xspec.py
Outdated
| @@ -137,6 +138,7 @@ def getnice(channel): | |||
| gw.exit() | |||
| if remotenice is not None: | |||
| gw = makegateway("popen//nice=5") | |||
| time.sleep(0.1) | |||
| remotenice2 = gw.remote_exec(getnice).receive() | |||
| assert remotenice2 == remotenice + 5 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have a supporting traceback for changing this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is part of the first commit's message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I uploaded a full log at https://www.zq1.de/~bernhard/temp/python-execnet-build.log
To trigger it, I generated some parallel load with perl -e 'fork;fork;fork;while(1){}'
2e1d05d
to
aedc878
Compare
|
Added 4th sleep because of https://travis-ci.org/pytest-dev/execnet/jobs/432459677 |
|
ping. Anything left to do? |
|
Thanks for the ping @bmwiedemann, I'll let @hpk42 decide on this one. |
|
Can any of these test failures (without your sleep's) be reproduced on python3?
And then marking the offending tests as |
|
@hpk42 https://travis-ci.org/pytest-dev/execnet/jobs/432459677 from above said so I think it is a general timing problem. |
|
thanks for the clarification. i don't see a difference between @FlakyTest as compared to time.sleep -- in fact silently passing a timing problem (time.sleep) is worse than marking a test and getting it reported as "x" or "X". |
74ba598
to
d80684e
Compare
|
Updated to use |
While working on reproducible builds for openSUSE, I noticed
that test_popen_nice would randomly fail on 1-core-VMs
remotenice = gw.remote_exec(getnice).receive()
gw.exit()
if remotenice is not None:
gw = makegateway("popen//nice=5")
remotenice2 = gw.remote_exec(getnice).receive()
> assert remotenice2 == remotenice + 5
E assert 0 == (0 + 5)
In travis-ci it did test_status_with_threads E AssertionError: numexecuting didn't drop to zero test_gateway_status_busy E Failed: did not get correct remote status test_popen_stderr_tracing > assert slave_line in err E AssertionError: assert '[2361] creating slavegateway' in 'ay_base.Popen2IO object at 0x7f19024bba58>\n[2361] gw0-slave [serve] spawning receiver thread\n[2361] gw0-slave [serv...1 lendata=6>\n[2361] gw0-slave execution finished\n[2361] gw0-slave sent <Message CHANNEL_CLOSE channel=1 lendata=0>\n'
While working on reproducible builds for openSUSE, I noticed
that test_popen_nice would randomly fail on 1-core-VMs