Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test fail in writev: basic blocking #726

Closed
olafhering opened this issue Sep 28, 2019 · 16 comments · Fixed by #742
Closed

test fail in writev: basic blocking #726

olafhering opened this issue Sep 28, 2019 · 16 comments · Fixed by #742
Labels
Milestone

Comments

@olafhering
Copy link
Contributor

I got this spurious error in devel:languages:ocaml/ocaml-lwt with ocaml-4.05 on x86_64:

[   86s] Testing library 'unix'...
[   86s] ........................SSS.................................S.................................................................................S................SS...............F...................
[   86s] Test 'writev: basic blocking' in suite 'lwt_unix' produced 'false'
[   86s] error: Bad exit status from /var/tmp/rpm-tmp.TKxkvY (%check)

On other hosts, and other ocaml versions, runtest succeeded.

@aantron
Copy link
Collaborator

aantron commented Sep 28, 2019

Am I correct in understanding that this error occurs only sometimes, but the tests usually pass? I vaguely recall us having occasional problems with this test here in the main repo as well. I will try to debug this test in detail. However, will this test occasionally failing be an ongoing problem for your package? E.g., will the tests run on every install, or only during packaging, when you are able to make a decision to ignore a flakey test?

@aantron aantron added this to the 4.3.2 milestone Sep 28, 2019
@aantron aantron added the bug label Sep 28, 2019
@olafhering
Copy link
Contributor Author

Yes, it happens only sometimes. I think just once.

If a package fails to build (a failed built-in dune runtest counts as failure) it could be easily triggered for rebuild. So it is not really an issue I think.

In the buildsevice packages will be scheduled for rebuild if their dependencies changed for whatever reason. The dune runtest is part of the build process, so spurious failures may happen.

@aantron
Copy link
Collaborator

aantron commented Sep 28, 2019

I'll try to reproduce this issue, but I might not be able to fix it in a reasonable amount of time, because it has been very rare for me. So, I suggest to go on with it unfixed for now.

In the past, I considered it not a problem, because only Lwt developers were running the test suite. Hence the multiple test bugs you've reported :) However, I'll now focus on fixing spurious test failures whenever I observe them, so the tests can be reliable in more contexts.

aantron added a commit that referenced this issue Sep 29, 2019
@aantron
Copy link
Collaborator

aantron commented Sep 29, 2019

I've instrumented the tests in the above commit, so that they can be fixed the next time the error is observed. If they are failing, they will now print something like:

writev: expected to write 6 bytes; wrote 5

I'm going to close this issue to keep the repo tidy. The underlying problem will get a fix whenever we observe it again.

@aantron aantron closed this as completed Sep 29, 2019
@aantron
Copy link
Collaborator

aantron commented Sep 29, 2019

I did the same for all other tests that fail occasionally.

@olafhering
Copy link
Contributor Author

Is there supposed to be some debug for the reported failure? I get nothing but the error result:

[   78s] Command [78] exited with code 1:
[   78s] $ (cd _build/default/test/unix && ./main.exe)
[   78s] Testing library 'unix'...
[   78s] ................SSS......................................................................................................SS..............................................F.......SS................
[   78s] Test 'writev: basic blocking' in suite 'lwt_unix' produced 'false'

@aantron
Copy link
Collaborator

aantron commented Oct 24, 2019

Yes, there should be. I may be overlooking something, but...

This is the final condition for that test:

Lwt_list.for_all_s (fun t -> t ())
[writer ~blocking:true write_fd io_vectors 9;
reader read_fd "foobarbaz"]);

writer should be printing something if it resolves with false:

if bytes_written <> Bytes.length data then
Printf.eprintf "\nwritev: expected to write %i bytes; wrote %i\n"
(Bytes.length data) bytes_written;
Lwt.return (bytes_written = Bytes.length data)

The same for reader:

if bytes_read <> expected_count then
Printf.eprintf "\nreadv: expected to read %i bytes; read %i\n"
expected_count bytes_read;
if actual <> expected_data then
Printf.eprintf "\nreadv: expected to read %s; read %s\n"
expected_data actual;
Lwt.return (actual = expected_data && bytes_read = expected_count)

Can you confirm that this code is present in the ref you are using?

@aantron
Copy link
Collaborator

aantron commented Oct 24, 2019

Ah my bad, I see the instrumentation is in the wrong function! These functions are shadowed later. I will add a commit.

@olafhering
Copy link
Contributor Author

Thanks. It was just failing a few times and I wonder if perhaps some other path failed. I will add a patch to my pkg once it is ready.

aantron added a commit that referenced this issue Oct 24, 2019
@aantron
Copy link
Collaborator

aantron commented Oct 24, 2019

The latest master should have the patch for this. More precisely, the issue was that there were separate reader and writer helpers for the readv and the writev tests, and I did not notice, and instrumented the readv helpers. Thanks for reporting it again.

@olafhering
Copy link
Contributor Author

current failure: aarch64
[ 213s] $ (cd _build/default/test/unix && ./main.exe)
[ 213s] Testing library 'unix'...
[ 213s] ...........................SSS..................................................................................................SS.............................................F....SS................
[ 213s]
[ 213s] readv: expected to read foobarbaz; read foobar���
[ 213s] Test 'writev: basic blocking' in suite 'lwt_unix' produced 'false'

@olafhering
Copy link
Contributor Author

00000020: 2066 6f6f 6261 7262 617a 3b20 7265 6164   foobarbaz; read
00000030: 2066 6f6f 6261 72ef bfbd efbf bdef bfbd   foobar.........
00000040: 0a5b 2020 3231 3373 5d20 5465 7374 2027  .[  213s] Test '

@aantron
Copy link
Collaborator

aantron commented Oct 25, 2019

Just want to confirm that I'm working on this, it just looks like it will take some time due to the rarity of the problem. I can reproduce it occasionally, once every few hours of testing. I'm trying to get it to happen more frequently at the moment.

I also made some other tests more reliable or instrumented them in hopes of doing so if they fail again later, the commits for that are now in master.

@aantron aantron reopened this Oct 25, 2019
@aantron
Copy link
Collaborator

aantron commented Oct 25, 2019

By the way, the three ef bf bd sequences in the hex dump are actually the UTF-8 encoding of the Unicode replacement character, which must have been inserted by the web UI or other program that was interpreting the console output. See here (search the page for "UTF-8"). The actual byte output was only three bytes, of course junk.

@olafhering
Copy link
Contributor Author

In case the junk is important, should it perhaps print some sort of hexdump?
I added commit 7dafec5 for the time being.

@aantron
Copy link
Collaborator

aantron commented Oct 25, 2019

I'm almost 100% sure the junk isn't important, at least not for the present bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants