test: flaky parallel/test-net-connect-local-error on macOS & freeBSD #12950

refack · 2017-05-10T19:45:45Z

Version: master
Platform: macOS,freeBSD
Subsystem: test

https://ci.nodejs.org/job/node-test-commit-freebsd/8933/nodes=freebsd11-x64/tapResults/
https://ci.nodejs.org/job/node-test-commit-osx/9696/nodes=osx1010/tapResults/

738	parallel/test-net-connect-local-error	
duration_ms	
0.162
severity	
fail
stack	
Mismatched onError function calls. Expected 1, actual 0.
    at Object.<anonymous> (/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1010/test/parallel/test-net-connect-local-error.js:15:27)
    at Module._compile (module.js:569:30)
    at Object.Module._extensions..js (module.js:580:10)
    at Module.load (module.js:503:32)
    at tryModuleLoad (module.js:466:12)
    at Function.Module._load (module.js:458:3)
    at Function.Module.runMain (module.js:605:10)
    at startup (bootstrap_node.js:144:16)
    at bootstrap_node.js:537:3

The text was updated successfully, but these errors were encountered:

sebastianplesciuc · 2017-05-10T19:56:25Z

Can I take this one?

refack · 2017-05-10T20:10:14Z

@sebastianplesciuc you like these ones 😉 . dibs is yours.
P.S. I've been told that the use of common.PORT is fragile in macOS / freeBSD.
There's a general goal to use 0 as port (if possible) and then figure out which port the OS assigned.

refack · 2017-05-10T20:13:58Z

Ref: #12376

sebastianplesciuc · 2017-05-10T20:17:58Z

I think that's the case here as well.

I think there are two issues with this, one is that server.close() might be called before the client's callback which is the error on line 15 I believe.

Another issue might arise with port + 1. We have no guarantee that this is an empty port when this runs. A way of fixing this might be creating another server with port 0 and assigning it's port to the client socket in it's close callback. Thoughts? Is it guaranteed to be in use while the callback is running from net.Server close event?

refack · 2017-05-10T21:40:57Z

The close is definitely in a bad place
When net.Server.close call the cb the server will not answer anymore, so it's a good time to try to connect to it (to get an error)
Another option is to loop port + 1...5, if that's simpler.

Worse case we can move it to /sequential/ where it will run alone.

refack · 2017-05-12T20:20:21Z

@sebastianplesciuc You might appreciate this. I've been looking at this issue's brother #12951.
These tests are run in parallel first this one, then the other.
In the other one there's a server that's supposed to receive 6 connections, instead it received 7.
I wonder where that 7th request comes from 🤣

sebastianplesciuc · 2017-05-13T07:53:38Z

@refack I'm not sure I get it :) You're suggesting that the connection in that test comes from this one? How does that work?

sebastianplesciuc · 2017-05-13T09:45:55Z

@refack OOOOH! I think I get it now. So when that test fails, this test fails because it connects from this test to the server from that test. So fixing this test would fix that one?

refack · 2017-05-13T12:47:54Z

Yep

aqrln · 2017-05-14T00:50:05Z

@refack

P.S. I've been told that the use of common.PORT is fragile in macOS / freeBSD.
There's a general goal to use 0 as port (if possible) and then figure out which port the OS assigned.

Using 0 to choose a random port can be fragile on BSD systems (including macOS) too, because of how SO_REUSEADDR works in their implementations of TCP. Recently I observed test/parallel/test-http-extra-response failing on my machine trying to communicate with the server of an autocompletion plugin for Vim instead of the server created in the test itself. The problem is that when a process binds to port 0 with SO_REUSEADDR, the kernel may give it a port that another process is already listening on if one of them binds to a concrete address (like 127.0.0.1) and the other — to a wildcard one. The logic that the kernel uses to route requests between them is not very clear to me.

/cc @nodejs/testing

Trott · 2017-05-14T01:31:46Z

P.S. I've been told that the use of common.PORT is fragile in macOS / freeBSD.

Hmmm, this sounds like there's a misunderstanding somewhere. There's nothing macOS or FreeBSD specific about the common.PORT stuff as far as I know. And there's nothing fragile about common.PORT.

The reason we don't typically want to use common.PORT in parallel is to prevent this scenario:

Test A uses port 0 for something. Test B uses common.PORT for something. Test A starts to run first. The OS gives it an open port that just happens to be the same value as common.PORT will be when Test B runs. Test B starts to run but Test A is already using the port it requests via common.PORT and so Test B fails.

If Test B were in sequential rather than parallel, then there wouldn't be a problem.

The port collision scenario described above may sound unlikely, but it has been observed in CI results (or at least things that look suspiciously like it).

Trott · 2017-05-14T01:39:15Z

Also: common.PORT and common.PORT + 1 in sequential is safe (assuming no non-Node.js processes grab the ports) so you could just consider moving to sequential if the logic gets too hairy otherwise. It's probably not anyone's first choice for a solution, but it probably shouldn't be the last choice either :-D

Also also, and this starts to enter rant territory so feel free to tune out right now: There seems to be a reluctance to move tests to sequential out of concerns for making the test suite run longer. I know I've certainly been reluctant. I believe now that these concerns are massively overblown. First, a lot of CI hosts (most?) don't run any tests in parallel and these include our absolute far-and-away slowest hosts. So moving something to sequential will have no impact whatsoever on the test suite duration for the slowest hosts. And on the faster hosts, we're generally talking about shaving off something on the order of a tenth of a second. And a lot of that benefit is probably lost by the introduction of somewhat more complicated code to make it safe to run in parallel. So we're making the code more complex but not actually getting any benefit out of it. Or at least, that's my theory. It would be fun to compare how long the test suite takes if everything in parallel were moved to sequential. I might do that right now.

UPDATE (still in rant mode, so feel free to skip): Ran the test benchmark described above. Normal parallel test run on my computer was about 90 seconds. Running the same tests sequentially took about 250 seconds. That's pretty good, but it's worth noting that with 1333 tests, each test moved from parallel to sequential will cost about 120ms. The question becomes: Is the added code complexity worth saving 120ms on a test suite that (once you add in the addon tests and tests in sequential and so on) takes about 2.5 minutes to run. For any significant complexity increase, the answer is always going to be "no".

refack · 2017-05-15T16:42:34Z

There seems to be a reluctance to move tests to sequential ... each test moved from parallel to sequential will cost about 120ms

@Trott IMHO it's not just the perf cost; it's the implicit assumption that testing in parallel is more vigorous, hence more rigorous. which is kinda true. A good mechanism could be to add a fourth flag to parallel.status: TRY_SEQ, to flag the harness that for a particular test is it fail in parallel, decision should be relegated until the test is run alone.
[that was me being grandiloquent]

Fixed test-net-connect-local-error by moving the test from parallel to sequential. PR-URL: nodejs#12964 Fixes: nodejs#12950 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com>

Fixed test-net-connect-local-error by moving the test from parallel to sequential. PR-URL: #12964 Fixes: #12950 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com>

refack added macos Issues and PRs related to the macOS platform / OSX. test Issues and PRs related to the tests. net Issues and PRs related to the net subsystem. labels May 10, 2017

refack changed the title ~~test: flaky parallel/test-net-connect-local-error on macOS~~ test: flaky parallel/test-net-connect-local-error on macOS & freeBSD May 10, 2017

refack added the freebsd Issues and PRs related to the FreeBSD platform. label May 10, 2017

sebastianplesciuc mentioned this issue May 11, 2017

test: fix flaky test-net-connect-local-error #12964

Closed

3 tasks

refack mentioned this issue May 11, 2017

src: add HAVE_OPENSSL guard to crypto providers #12967

Closed

2 tasks

refack mentioned this issue May 15, 2017

test: flaky parallel/test-net-connect-options-allowhalfopen on macOS & freeBSD #12951

Closed

refack closed this as completed in cf30d5e May 16, 2017

This was referenced May 16, 2017

test: replace port in net immediate finish test #12996

Closed

test: improve net-immediate-finish-test #13062

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: flaky parallel/test-net-connect-local-error on macOS & freeBSD #12950

test: flaky parallel/test-net-connect-local-error on macOS & freeBSD #12950

refack commented May 10, 2017 •

edited

Loading

sebastianplesciuc commented May 10, 2017

refack commented May 10, 2017

refack commented May 10, 2017

sebastianplesciuc commented May 10, 2017

refack commented May 10, 2017

refack commented May 12, 2017

sebastianplesciuc commented May 13, 2017

sebastianplesciuc commented May 13, 2017

refack commented May 13, 2017

aqrln commented May 14, 2017

Trott commented May 14, 2017

Trott commented May 14, 2017 •

edited

Loading

refack commented May 15, 2017

test: flaky parallel/test-net-connect-local-error on macOS & freeBSD #12950

test: flaky parallel/test-net-connect-local-error on macOS & freeBSD #12950

Comments

refack commented May 10, 2017 • edited Loading

sebastianplesciuc commented May 10, 2017

refack commented May 10, 2017

refack commented May 10, 2017

sebastianplesciuc commented May 10, 2017

refack commented May 10, 2017

refack commented May 12, 2017

sebastianplesciuc commented May 13, 2017

sebastianplesciuc commented May 13, 2017

refack commented May 13, 2017

aqrln commented May 14, 2017

Trott commented May 14, 2017

Trott commented May 14, 2017 • edited Loading

refack commented May 15, 2017

refack commented May 10, 2017 •

edited

Loading

Trott commented May 14, 2017 •

edited

Loading