-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_pty: test_basic() fails randomly on Travis CI #75341
Comments
https://travis-ci.org/python/cpython/jobs/262606831 0:13:26 load avg: 134.81 [256/403] test_pty
test test_pty failed -- Traceback (most recent call last):
File "/home/travis/build/python/cpython/Lib/test/test_pty.py", line 109, in test_basic
self.assertEqual(b'For my pet fish, Eric.\n', normalize_output(s2))
AssertionError: b'For my pet fish, Eric.\n' != b'For m' |
I don't know why test_pty fails. #3037 makes sure that all data is written into the master fd using a new write_all() helper function which loops until all data is written: handle os.write() partial write. |
Checking for short writes is worthwhile, but in bpo-29070 it looks like Cornelius identified the main problem was short _reads_. See the parts of his patch to do with “_os_read_exactly” and related functions. |
I've observed the exact same issue on Fedora at random times. |
I observed the same issue, but the problem occurs mainly when reading data. In my proposed patch in bpo-29070, I use the existing pty._writen() to make sure all data is written. As Martin mentioned, reading is a problem. My patch proposes _os_timeout_read(), _os_readline(), _os_read_exactly(), _os_read_exhaust_exactly() to give fine-grained, deterministic control. Cheeky advertisement: Anybody cares to review bpo-29070 or cherry pick the pty._writen() and _os_read* parts? :) |
Failure on x86 Gentoo Refleaks 3.x too: ====================================================================== Traceback (most recent call last):
File "/buildbot/buildarea/3.x.ware-gentoo-x86.refleak/build/Lib/test/test_pty.py", line 105, in test_basic
self.assertEqual(b'For my pet fish, Eric.\n', normalize_output(s2))
AssertionError: b'For my pet fish, Eric.\n' != b'For m' |
Oh, in fact, it's not that hard to reproduce the failure on Linux: haypo@selma$ ./python -m test test_pty -m test_basic -F
Run tests sequentially
0:00:00 load avg: 2.17 [ 1] test_pty
0:00:00 load avg: 2.17 [ 2] test_pty
0:00:00 load avg: 2.17 [ 3] test_pty
(...)
0:00:00 load avg: 2.17 [ 15] test_pty
0:00:00 load avg: 2.17 [ 16] test_pty
test test_pty failed -- Traceback (most recent call last):
File "/home/haypo/prog/python/master/Lib/test/test_pty.py", line 105, in test_basic
self.assertEqual(b'For my pet fish, Eric.\n', normalize_output(s2))
AssertionError: b'For my pet fish, Eric.\n' != b'For m' test_pty failed 1 test failed: Total duration: 486 ms |
Regarding PR 3802:
I also fixed these issues in one of my (unmerged) pull requests (bpo bpo-29070). Unfortunately, the patch I propose does way too many things at the same time and I don't have time currently to untangle it :( Feel free to cherry pick the _os_readline function: Cheers! |
This string TEST_STRING_1 is used for a single os.write() call, whereas TEST_STRING_2 is splitted and written in two parts with two os.write() calls. I prefer minimal change, since I don't know well the pty module.
Yeah, I looked at this function. normalize_output() can return a string shorter than the input: len(normalize_output(s2)) <= len(s2). So I think that len(s2) < len(expected) is correct. |
I like the idea of minimal change, too :D Yet, I think your patch does not solve the core problem of read/write being nondeterministic. In theory, a pty is similar to a pipe (with termios processing in the middle). So any os.write to a pty fd is nondeterministic and may put less bytes into the pty buffer than given to the write call (see the return value of os.write, which test_pty.py does not check). Multiple writes are buffered by the kernel, until the buffer is full. So the kernel already accumulates the chunked writing for us. Usually, this works fine. Similarly, a os.read may also be nondeterministic, depending on how many bytes are ready in the pty buffer. This may have nothing to do with the chunked writing of the test_pty.py module because the kernel is doing the read/write syscalls and handling the pty buffer. Here is a PoC: I checked out your code and stressed my system with it. I have 2 physical and 2 virtual cores and started 8 instances of the test to stress my kernel with scheduling, locking of kernel buffers (the pty buffer), and making read/write more nondeterministic. ./python -b -m test -j 8 test_pty -m test_basic -F Here is what I got:
0:00:13 load avg: 3.25 [119/1] test_pty failed
test test_pty failed -- Traceback (most recent call last):
File "XXX/cpython/Lib/test/test_pty.py", line 99, in test_basic
normalize_output(s1))
AssertionError: b'I wish to buy a fish license.\n' != b'I wish to buy a fish license.'
Warning, obscure corner cases ahead. |
Oh, one more thing, since it is unclear whether read or write is causing the error. I replaced 0:01:09 load avg: 6.17 [597/1] test_pty failed
test test_pty failed -- Traceback (most recent call last):
File "XXX/cpython/Lib/test/test_pty.py", line 99, in test_basic
normalize_output(s1))
AssertionError: b'I wish to buy a fish license.\n' != b'I wish to buy a fish license.' So the main problem is a nondeterministic read. But I'm only testing on Linux. Other systems with different kernel may behave differently. And the documentation allows a kernel to make both short reads and writes. |
I also added a pull request which I cannot get to fail on my system. What do you think, haypo? Sorry for the noise! |
Cornelius:Here is a PoC: I checked out your code and stressed my system with it. (...) AssertionError: ..." Did you get this failure with my PR 3802? Cornelius: "We have b'For my pet fish, Eric.\r\n' in the pty buffer. We read b'For my pet fish, Eric.\r' from the buffer into s2. Now len(s2) == len(expected) but a b'\n' is still unread in the buffer. This would make the test fail. I admit, this is a corner case, but also an argument that a clean test case may want to have a readline function." Right. I can just modify my condition from "len(s2) < len(expected)" to "b'\n' not in s2". Cornelius: "I also added a pull request which I cannot get to fail on my system. What do you think, haypo?" Your code calls read(1) in a loop until it gets the newline character b'\n'. Is it better to os.read(1024) in a loop until the output string contains b'\n'? I have no strong preference. |
Hum, let me clarify: my main goal is to get a stable CI. I don't really care of the exact implementation of one specific unit test in test_pty ;-) |
Yes, I get the AssertionError with the latest version of PR 3802. From the high load avg of my system, you can see that the error occurs very rarely and that I need to stress my system to trigger it. With PR 3802, the error occurs way less frequently than without it. So we are definitely moving into the right direction. With PR 3808, I could not trigger any error (on my system). Changing the conditions to "b'\n' not in s2" should work. But we actually mean to read a line, so this should be better reflected in the code. I prefer calling a readline() function directly, which is almost self-documenting code.
Personal thought: I care about good code in the unit tests because people might look at this as reference how to use a module and copy&paste from it. I want the tests to be deterministic, which---as long as the tests pass---implies a stable CI ;-) |
I prefer Cornelius’s current proposal (rev 4f8137b) because it fixes both sites, rather than just patching the immediate problem site. I don’t think read(1) is a big problem, just less efficient. But if you prefer to do larger reads, that should be fine too. You could even use os.fdopen(...).readline(), which would use BufferedReader. It is not documented, but BufferedReader should do large reads and return the line without waiting to fill its internal buffer. FWIW I think it would be okay to remove the “chunked output” test. It isn’t exercising the “pty” module any more than the “Writing to slave_fd” test above it. All we need to do is verify that the “master_open” function returns a PTY master and that “slave_open” returns the corresponding slave terminal file descriptor. Checking “isatty” and sending one line of data through seems sufficient. |
Thanks Cornelius Diekmann for your fix ;-) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: