test fail in writev: basic blocking #726

olafhering · 2019-09-28T19:33:00Z

I got this spurious error in devel:languages:ocaml/ocaml-lwt with ocaml-4.05 on x86_64:

[   86s] Testing library 'unix'...
[   86s] ........................SSS.................................S.................................................................................S................SS...............F...................
[   86s] Test 'writev: basic blocking' in suite 'lwt_unix' produced 'false'
[   86s] error: Bad exit status from /var/tmp/rpm-tmp.TKxkvY (%check)

On other hosts, and other ocaml versions, runtest succeeded.

The text was updated successfully, but these errors were encountered:

aantron · 2019-09-28T20:47:33Z

Am I correct in understanding that this error occurs only sometimes, but the tests usually pass? I vaguely recall us having occasional problems with this test here in the main repo as well. I will try to debug this test in detail. However, will this test occasionally failing be an ongoing problem for your package? E.g., will the tests run on every install, or only during packaging, when you are able to make a decision to ignore a flakey test?

olafhering · 2019-09-28T21:02:37Z

Yes, it happens only sometimes. I think just once.

If a package fails to build (a failed built-in dune runtest counts as failure) it could be easily triggered for rebuild. So it is not really an issue I think.

In the buildsevice packages will be scheduled for rebuild if their dependencies changed for whatever reason. The dune runtest is part of the build process, so spurious failures may happen.

aantron · 2019-09-28T21:09:42Z

I'll try to reproduce this issue, but I might not be able to fix it in a reasonable amount of time, because it has been very rare for me. So, I suggest to go on with it unfixed for now.

In the past, I considered it not a problem, because only Lwt developers were running the test suite. Hence the multiple test bugs you've reported :) However, I'll now focus on fixing spurious test failures whenever I observe them, so the tests can be reliable in more contexts.

Part of #726.

aantron · 2019-09-29T11:55:59Z

I've instrumented the tests in the above commit, so that they can be fixed the next time the error is observed. If they are failing, they will now print something like:

writev: expected to write 6 bytes; wrote 5

I'm going to close this issue to keep the repo tidy. The underlying problem will get a fix whenever we observe it again.

aantron · 2019-09-29T12:06:39Z

I did the same for all other tests that fail occasionally.

olafhering · 2019-10-24T07:10:22Z

Is there supposed to be some debug for the reported failure? I get nothing but the error result:

[   78s] Command [78] exited with code 1:
[   78s] $ (cd _build/default/test/unix && ./main.exe)
[   78s] Testing library 'unix'...
[   78s] ................SSS......................................................................................................SS..............................................F.......SS................
[   78s] Test 'writev: basic blocking' in suite 'lwt_unix' produced 'false'

aantron · 2019-10-24T07:17:10Z

Yes, there should be. I may be overlooking something, but...

This is the final condition for that test:

lwt/test/unix/test_lwt_unix.cppo.ml

Lines 481 to 483 in 336566d

    
           Lwt_list.for_all_s (fun t -> t ()) 
        
             [writer ~blocking:true write_fd io_vectors 9; 
        
              reader read_fd "foobarbaz"]);

writer should be printing something if it resolves with false:

lwt/test/unix/test_lwt_unix.cppo.ml

Lines 307 to 310 in 336566d

    
           if bytes_written <> Bytes.length data then 
        
             Printf.eprintf "\nwritev: expected to write %i bytes; wrote %i\n" 
        
               (Bytes.length data) bytes_written; 
        
           Lwt.return (bytes_written = Bytes.length data)

The same for reader:

lwt/test/unix/test_lwt_unix.cppo.ml

Lines 329 to 336 in 336566d

    
           if bytes_read <> expected_count then 
        
             Printf.eprintf "\nreadv: expected to read %i bytes; read %i\n" 
        
               expected_count bytes_read; 
        
           if actual <> expected_data then 
        
             Printf.eprintf "\nreadv: expected to read %s; read %s\n" 
        
               expected_data actual; 
        
           Lwt.return (actual = expected_data && bytes_read = expected_count)

Can you confirm that this code is present in the ref you are using?

aantron · 2019-10-24T07:19:27Z

Ah my bad, I see the instrumentation is in the wrong function! These functions are shadowed later. I will add a commit.

olafhering · 2019-10-24T07:23:03Z

Thanks. It was just failing a few times and I wonder if perhaps some other path failed. I will add a patch to my pkg once it is ready.

See #726.

aantron · 2019-10-24T07:33:25Z

The latest master should have the patch for this. More precisely, the issue was that there were separate reader and writer helpers for the readv and the writev tests, and I did not notice, and instrumented the readv helpers. Thanks for reporting it again.

olafhering · 2019-10-24T14:00:57Z

current failure: aarch64
[ 213s] $ (cd _build/default/test/unix && ./main.exe)
[ 213s] Testing library 'unix'...
[ 213s] ...........................SSS..................................................................................................SS.............................................F....SS................
[ 213s]
[ 213s] readv: expected to read foobarbaz; read foobar��
[ 213s] Test 'writev: basic blocking' in suite 'lwt_unix' produced 'false'

olafhering · 2019-10-24T14:10:03Z

00000020: 2066 6f6f 6261 7262 617a 3b20 7265 6164   foobarbaz; read
00000030: 2066 6f6f 6261 72ef bfbd efbf bdef bfbd   foobar.........
00000040: 0a5b 2020 3231 3373 5d20 5465 7374 2027  .[  213s] Test '

aantron · 2019-10-25T05:09:26Z

Just want to confirm that I'm working on this, it just looks like it will take some time due to the rarity of the problem. I can reproduce it occasionally, once every few hours of testing. I'm trying to get it to happen more frequently at the moment.

I also made some other tests more reliable or instrumented them in hopes of doing so if they fail again later, the commits for that are now in master.

aantron · 2019-10-25T05:16:27Z

By the way, the three ef bf bd sequences in the hex dump are actually the UTF-8 encoding of the Unicode replacement character, which must have been inserted by the web UI or other program that was interpreting the console output. See here (search the page for "UTF-8"). The actual byte output was only three bytes, of course junk.

olafhering · 2019-10-25T08:19:43Z

In case the junk is important, should it perhaps print some sort of hexdump?
I added commit 7dafec5 for the time being.

aantron · 2019-10-25T08:54:30Z

I'm almost 100% sure the junk isn't important, at least not for the present bug.

aantron added this to the 4.3.2 milestone Sep 28, 2019

aantron added the bug label Sep 28, 2019

aantron added a commit that referenced this issue Sep 29, 2019

Instrument writev tests

1b61b5b

Part of #726.

aantron closed this as completed Sep 29, 2019

aantron added a commit that referenced this issue Oct 24, 2019

Instrument writev tests

f0aa560

See #726.

aantron reopened this Oct 25, 2019

aantron mentioned this issue Oct 30, 2019

Retain references to buffers while doing blocking I/O #742

Merged

aantron closed this as completed in #742 Oct 30, 2019

aantron modified the milestones: 4.4.0, 4.4.1 Oct 30, 2019

aantron mentioned this issue Dec 15, 2019

Lwt 4.5.0 - promises and concurrent I/O ocaml/opam-repository#15536

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test fail in writev: basic blocking #726

test fail in writev: basic blocking #726

olafhering commented Sep 28, 2019

aantron commented Sep 28, 2019

olafhering commented Sep 28, 2019

aantron commented Sep 28, 2019

aantron commented Sep 29, 2019

aantron commented Sep 29, 2019

olafhering commented Oct 24, 2019

aantron commented Oct 24, 2019

aantron commented Oct 24, 2019

olafhering commented Oct 24, 2019

aantron commented Oct 24, 2019

olafhering commented Oct 24, 2019

olafhering commented Oct 24, 2019

aantron commented Oct 25, 2019

aantron commented Oct 25, 2019

olafhering commented Oct 25, 2019

aantron commented Oct 25, 2019

test fail in writev: basic blocking #726

test fail in writev: basic blocking #726

Comments

olafhering commented Sep 28, 2019

aantron commented Sep 28, 2019

olafhering commented Sep 28, 2019

aantron commented Sep 28, 2019

aantron commented Sep 29, 2019

aantron commented Sep 29, 2019

olafhering commented Oct 24, 2019

aantron commented Oct 24, 2019

aantron commented Oct 24, 2019

olafhering commented Oct 24, 2019

aantron commented Oct 24, 2019

olafhering commented Oct 24, 2019

olafhering commented Oct 24, 2019

aantron commented Oct 25, 2019

aantron commented Oct 25, 2019

olafhering commented Oct 25, 2019

aantron commented Oct 25, 2019