test: fix flaky test-net-connect-local-error #12964

sebastianplesciuc · 2017-05-11T06:28:09Z

Fixed test-net-connect-local-error by moving the test from parallel to sequential.
Reverted to commit https://github.com/nodejs/node/blob/eeae3bd07145a770209e4899a9d40f67109d3d01/test/parallel/test-net-connect-local-error.js. Added a few more assertions.

Fixes: #12950

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
commit message follows commit guidelines

Affected core subsystem(s)

test

refack

One small request, other than that, let's see what the CI sais

refack · 2017-05-11T23:54:24Z

test/parallel/test-net-connect-local-error.js

-    `${err.localAddress} !== ${common.localhostIPv4} in ${err}`
-  );
+getUnassignedPort(common.mustCall((unassignedPort) => {
+  assert(unassignedPort);


Use assert.ok, or even assert.strictEqual(typeof unassignedPort, 'number')

@refack Fixed! Thanks! I've also moved the getUnassignedPort call closer to where the value is actually used.

refack · 2017-05-11T23:58:05Z

CI: https://ci.nodejs.org/job/node-test-commit/9824/

santigimeno · 2017-05-12T08:20:51Z

test/parallel/test-net-connect-local-error.js

+  server.listen({port: 0}, common.mustCall(() => {
+    // When the server is closed this port will no longer be assigned
+    const unassignedPort = server.address().port;
+    server.close(common.mustCall(() => {


TBH I don't see a completely safe way of calling net.connect() to a free port. I would probably just move the original test (before the port changes) to sequential.
/cc @nodejs/testing @thefourtheye

@santigimeno thanks for the feedback
IMHO port + 1 will always be flaky even in sequential.
@nodejs/testing It seems like we need a way to find deterministically erring port for a few others tests as well... Re: #12996

FWIW For Windows (and acording to rfc793) the closed socket will enter TIME_WAIT state for 2*MSL and will not SYN,ACK and will not be reused by the OS.

Funny story: I've been looking at this issue's brother #12951.
These tests are run in parallel first this one, then the other.
In the other one there's a server that's supposed to receive 6 connections, instead it received 7.
I wonder where that 7th request comes from 🤣

IMHO port + 1 will always be flaky even in sequential.

I'm not sure I follow. Can you elaborate?
If you mean that there can be another test using common.PORT + 1 (or common.PORT for that matter), I agree, but I think we'll be fine as long as there's no test on sequential listening/binding to 0 port.

IMHO port + 1 will always be flaky even in sequential.

I think that's wrong, unless you're just arguing that some other process can always use that port. But I don't think we generally concern ourselves with that. I'm with @santigimeno: Moving it to sequential seems like the simpler and better option.

Thirded @santigimeno's suggestion, moving to sequential.

refack · 2017-05-12T16:02:57Z

Stress CI:
https://ci.nodejs.org/job/node-stress-single-test/1210/nodes=freebsd10-64/
https://ci.nodejs.org/job/node-stress-single-test/1211/nodes=win2016/

refack · 2017-05-12T20:45:36Z

test/parallel/test-net-connect-local-error.js

+        common.localhostIPv4,
+        `${err.localAddress} !== ${common.localhostIPv4} in ${err}`
+      );
+      server.close();


I just thought about it, you don't need getUnassignedPort. There is no need for a server to be alive for testing that a connection to an empty port will fail.
Do all the testing in the server.close() callback. Use the closed server's assigned port like you did in getUnassignedPort

@refack You're right. I've used 8080 like I did here since no connection is actually made and this issue suggests that in the future there will be a linting rule against using common.PORT in parallel tests.

I've changed it now, thanks!

refack · 2017-05-13T13:02:32Z

test/parallel/test-net-connect-local-error.js

+  const port = server.address().port;
+  server.close(common.mustCall(() => {
+    const client = net.connect({
+      port: 8080,


I think we should test other way around as well {port: port, localPort: 8080} (with a new client)

Actually also need to assert the other properties of err like

assert.strictEqual(err.syscall, 'connect'); assert.strictEqual(err.code, 'ECONNREFUSED'); assert.strictEqual(err.message, `connect ECONNREFUSED ${err.address}:${err.port} - Local (${err.localAddress}:${err.localPort)`);

refack · 2017-05-13T13:38:32Z

IMHO we have a good solution for the flakiness of test.
My comment are improvements, and could go into a different PR if you don't have the time.

refack · 2017-05-13T13:54:05Z

Stress on macOS: https://ci.nodejs.org/job/node-stress-single-test/1217/
Stress on freeBSD: https://ci.nodejs.org/job/node-stress-single-test/1218/nodes=freebsd10-64/

sebastianplesciuc · 2017-05-13T14:18:44Z

@refack I could work on the requested changes in this PR either today or tomorrow. It will need a new CI. Let me know how to proceed. If you need to land this fast, I can make the changes in another PR.

refack · 2017-05-13T14:26:42Z

It's the weekend, there's no rush, it you have the time and energy add the assertions and reversed connection to this PR.

sebastianplesciuc · 2017-05-13T14:56:49Z

@refack Made the changes! Thanks :)

Trott

I'm very uncomfortable with all of the PRs lately that add non-trivial amounts of code and complexity for tests where moving to sequential is the better solution. The marginal cost of having the test in sequential is negligible (maybe 150 ms on a few platforms?). Our slowest CI platforms don't benefit at all from having tests in parallel. (They run them sequentially anyway.) I'd much rather have simple, short, straightforward, easy-to-understand, easy-to-maintain tests. The time taken to do the whole reserve-a-port-then-close-the-server dance probably largely negates any benefit from having the test in parallel anyway.

santigimeno · 2017-05-13T16:49:10Z

@Trott I have to agree with you (even though I was at first supporting those kind of complex changes)

refack · 2017-05-13T17:44:36Z

I'm very uncomfortable with all of the PRs lately that add non-trivial amounts of code and complexity for tests where moving to sequential is the better solution. The marginal cost of having the test in sequential is negligible (maybe 150 ms on a few platforms?). Our slowest CI platforms don't benefit at all from having tests in parallel. (They run them sequentially anyway.) I'd much rather have simple, short, straightforward, easy-to-understand, easy-to-maintain tests. The time taken to do the whole reserve-a-port-then-close-the-server dance probably largely negates any benefit from having the test in parallel anyway.

The original test is flaky even sequentially.
I agree about non-trivial code in tests, so we removed the "reserve-a-port-then-close-the-server-dance". Most of what was added by e8eabd2 are extra assertions, and a new test case.
The only non trivial code change was moving the logic into the server.on('close') callback.

@Trott PTAL

~~P.S. If you'd have written the review as a comment I would given it~~ 👍 found it.

refack · 2017-05-13T17:48:43Z

New CI: https://ci.nodejs.org/job/node-test-commit/9867/
( I have a hunch it'll fail on Windows :( )

Trott · 2017-05-13T20:48:16Z

test/parallel/test-net-connect-local-error.js

@@ -3,25 +3,46 @@ const common = require('../common');
 const assert = require('assert');
 const net = require('net');

+const fixedPort = 8080;


Why are we hard-coding 8080 and not using common.PORT or common.PORT + 1 or whatever? Just to avoid moving to sequential?

This needs to change.

@Trott I thought to use it because of your comment on this: #12639

Also as I understood from the common.PORT in parallel tests issue, it was planned to use a linting rule against using common.PORT in parallel tests.

@sebastianplesciuc I think they convinced me to move the test to /sequential/ there it's Ok.

I thought to use it because of your comment on this: #12639

@sebastianplesciuc Not sure which comment you mean.

Also as I understood from the common.PORT in parallel tests issue, it was planned to use a linting rule against using common.PORT in parallel tests.

If/when that happens, eslint-disable comments can be used for any remaining valid common.PORT uses in parallel. Changing them now to accommodate a rule that may never come to pass is probably putting the cart before the horse.

Regardless, none of that applies if the test is moved to sequential. :-D

Lastly: I hope none of this is too frustrating for you. I appreciate all the work you're doing and I know it's not fun to get contradictory suggestions from people.

Oh, hooray, we're all kinda sorta on the same page (or getting there) after all. :-D

@sebastianplesciuc Oh, I think I see the test/comment you are referring to. In that case, other (intentional and for testing purposes) errors in the code prevent that port from ever being in use. If I understand what's going on in this test (and I may not!), that port (the one that is now 8080) does in fact get used. A connection is attempted there and ECONNREFUSED is expected, meaning nothing is listening on that port. So if something else is using that port, bad things happen. Again, I may be misunderstanding the test, but that's the way it seems to me. (Massively divided attention right now, apologies if I'm hurting more than I'm helping by participating.)

@Trott I understood why this should move to sequential. I'm not defending this, I understand why this is the case and I agree with your review. I just wanted to explain why I thought to use 8080 there.

Frankly, I'm not really sure what happens on every platform in a server's close callback. I just thought you guys might know and determine if this is an acceptable solution. I'm satisfied with the outcome and also I've learned some things along the way.

So, thanks for that :)

Trott · 2017-05-13T20:54:26Z

I think we should revert the changes in this file that were included in 94eed0f and move this file to sequential. That commit introduced port + 1 to replace common.PORT + 1 and that change is a bug.

port + 1 could be in use by another test (and is extremely likely to be because operating systems seem to often or always supply these ports in sequential order)
port + 1 could be an invalid port number if the operating system supplies port 65535 for port

I'm not sure what the nature is of the flakiness that's being seen, but that seems very likely to resolve it. (The first bullet point is the more important one in this regard. If another test running in parallel uses port 0 somewhere shortly after this test does, it is exceedingly likely to get port + 1 assigned resulting in a collision and flakiness in one or both tests.)

refack · 2017-05-13T21:13:22Z

port + 1 could be in use by another test (and is extremely likely to be because operating systems seem to often or always supply these ports in sequential order)

Yeah I found which one #12951 there a server receives 7 requests when the test clearly only issues 6 🤣

But this test needs some fixin' since it's flaky even sequentially (with the server.listen(), and server.close() run synchronously 🤦‍♂️ )

@sebastianplesciuc we need to rethink this test, it fails on windows :(, and there's the hard coded 8080 port. So anyway I agree we need to move the test to /sequential/

sebastianplesciuc · 2017-05-14T06:41:17Z

@refack I'll take a look at the code before the bind to 0 commit and try to make a PR with the move to sequential if that's ok. Should we close this PR?

refack · 2017-05-14T14:49:35Z

@refack I'll take a look at the code before the bind to 0 commit and try to make a PR with the move to sequential if that's ok. Should we close this PR?

~~IMHO we should take current changes with us to /sequential/, old test format was just 👎~~

refack · 2017-05-14T14:54:41Z

I'm not sure what the nature is of the flakiness that's being seen

@Trott this fragment

+const server = net.createServer();
+server.listen(0);
+const port = server.address().port;
const client = net.connect({		  const client = net.connect({
-  port: common.PORT + 1,		 +  port: port + 1,

port can be undefined in high load. reverting 94eed0f will solve that.

@sebastianplesciuc scratch my last comment, if you revert 94eed0f and move to /sequential/ that should give a nice test. (still can be this PR, discussion was good).

P.S. revert 94eed0f just on test/parallel/test-net-connect-local-error.js, yes?

Trott · 2017-05-14T19:01:28Z

port can be undefined in high load. reverting 94eed0f will solve that.

Moving the test to sequential also solves that. Add it to the list of reasons to avoid common.PORT in parallel tests.

Trott · 2017-05-14T19:02:09Z

P.S. revert 94eed0f just on test/parallel/test-net-connect-local-error.js, yes?

Agreed, not the whole commit, just the change to this file from that commit.

Trott · 2017-05-16T02:56:47Z

@Trott do you have any further comments? Our CI is green, and landing this will stop the false negative CIs on macOS & freeBSD...

@refack LGTM

Trott

LGTM if CI is green

Fixed test-net-connect-local-error by moving the test from parallel to sequential. PR-URL: nodejs#12964 Fixes: nodejs#12950 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com>

refack · 2017-05-16T03:03:25Z

Landed in cf30d5e

refack · 2017-05-16T03:09:03Z

Post land CI: https://ci.nodejs.org/job/node-test-commit/9910/

Fixed test-net-connect-local-error by moving the test from parallel to sequential. PR-URL: nodejs#12964 Fixes: nodejs#12950 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com>

refack · 2017-05-16T03:30:16Z

Relanded in 0c2edd2 (forgot the missing LF)

refack · 2017-05-16T03:41:13Z

Post reland CI: https://ci.nodejs.org/job/node-test-commit/9913/

Trott · 2017-05-16T04:02:22Z

Relanded in 0c2edd2 (forgot the missing LF)

Not a fan of running CI after landing. You could just push your fix to their branch and run CI against the PR with your fixes in place.

refack · 2017-05-16T04:04:46Z

Not a fan of running CI after landing. You could just push your fix to their branch and run CI against the PR with your fixes in place.

After landing I run against master (and follow up, reverting if needed)

gibfahn · 2017-05-16T15:12:49Z

Not a fan of running CI after landing. You could just push your fix to their branch and run CI against the PR with your fixes in place.

@Trott To be clear, I think what @refack does is make sure CI ran before landing, then run CI again after landing to make sure no last minute other conflicting PR might have caused an issue. If this is the case, it's an example of extra rigour around the release process, and I'm quite impressed that anyone makes the effort.

Trott · 2017-05-16T15:49:45Z

@Trott To be clear, I think what @refack does is make sure CI ran before landing, then run CI again after landing to make sure no last minute other conflicting PR might have caused an issue. If this is the case, it's an example of extra rigour around the release process, and I'm quite impressed that anyone makes the effort.

Ah! I see now. Yes, that's awesome. 👍 Thanks for the clarification.

Trott · 2017-05-17T13:03:24Z

Thinking a bit more on this, I would ask that you (and everyone) please please please at least run make jslint/vcbuild jslint before pushing to master.

Our docs ask that people run make test/vcbuild test before doing pushing to master, but I know not everyone (especially those who land a lot of pull requests) does that.

JS linting is comparatively fast and would catch most of the "oops, I shouldn't have pushed to master" things that seem to come up from time to time, including this one.

sebastianplesciuc · 2017-05-17T13:17:37Z

@Trott I apologize for not doing this. But I didn't expect the PR to land until after I've fixed the make test on my machine. As you can see above, I didn't tick the make -j4 test (UNIX), or vcbuild test (Windows) passes. Because it didn't pass, which made me think that you guys might give me some input on how to fix it, fixing it and commit the final version.

refack · 2017-05-17T13:27:14Z

Thinking a bit more on this, I would ask that you (and everyone) please please please at least run make jslint/vcbuild jslint before pushing to master.

This is on me. It was a known lint failure I said I'd fix before landing #12964 (comment). I broke my own rule and landed this after 10PM 🤦‍♂️
[edit] I wanted to land this ASAP because of all the false negatives on the CI [/edit]

Re: nodejs/build#705 IMHO we should strive to move all the automatable (read; boring, repetitive, and human-error prone) to the CI.

refack · 2017-05-17T13:39:57Z

P.S. a git hook that lints only git changed files:

#!/c/node/node
var cmd = require('child_process');
cmd.exec('git diff --cached --name-only --diff-filter=ACM | grep ".js$"', function (err, stdout) {
    if (stdout.length == 0) return;
    var args = stdout.split('\n');
    args.unshift('');
    args.pop();
    var cli = require("jshint/src/cli.js");
    cli.getBufferSize = function () { return 0; };
    cli.interpret(args);
});

gibfahn · 2017-05-17T13:59:26Z

P.S. a git hook that lints only git changed files:

This wouldn't catch things that are already committed right? make jslint should be pretty fast if you run it regularly (due to the caching) so it's probably worth running it on everything.

Trott · 2017-05-17T18:12:19Z

@sebastianplesciuc I was addressing people with commit bits on the repo. You didn't do anything wrong. (For that matter, @refack's mistake was minor,lots of folks have done it, and he was eager to fix CI.)

Everything's good. We can always improve though. Automation and git pre-commit hooks are both great things to apply here.

Fixed test-net-connect-local-error by moving the test from parallel to sequential. PR-URL: nodejs#12964 Fixes: nodejs#12950 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com>

Fixed test-net-connect-local-error by moving the test from parallel to sequential. PR-URL: #12964 Fixes: #12950 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com>

nodejs-github-bot added the test Issues and PRs related to the tests. label May 11, 2017

mscdex added the net Issues and PRs related to the net subsystem. label May 11, 2017

refack self-assigned this May 11, 2017

refack suggested changes May 11, 2017

View reviewed changes

santigimeno reviewed May 12, 2017

View reviewed changes

refack approved these changes May 12, 2017

View reviewed changes

refack mentioned this pull request May 12, 2017

src: whitelist new options for NODE_OPTIONS #13002

Merged

3 tasks

refack suggested changes May 12, 2017

View reviewed changes

refack reviewed May 13, 2017

View reviewed changes

refack approved these changes May 13, 2017

View reviewed changes

refack mentioned this pull request May 13, 2017

util,console: guard against overwritten util functions #13011

Closed

3 tasks

Trott requested changes May 13, 2017

View reviewed changes

Trott reviewed May 13, 2017

View reviewed changes

Trott approved these changes May 16, 2017

View reviewed changes

refack closed this May 16, 2017

This was referenced May 16, 2017

Investigate flaky sequential/test-net-connect-local-error on FreeBSD #13055

Closed

test: improve net-immediate-finish-test #13062

Closed

refack mentioned this pull request May 17, 2017

test: fix sequential test-net-connect-local-error #13064

Closed

3 tasks

jasnell mentioned this pull request May 28, 2017

8.0.0 Release Proposal #12220

Closed

gibfahn mentioned this pull request Jun 15, 2017

Auditing for 6.11.1 nodejs/Release#230

Closed

3 tasks

MylesBorins added the land-on-v6.x label Jun 22, 2017

MylesBorins mentioned this pull request Jul 18, 2017

v6.11.2 proposal #14356

Merged

refack removed their assignment Oct 20, 2018

test: fix flaky test-net-connect-local-error #12964

test: fix flaky test-net-connect-local-error #12964

Conversation

sebastianplesciuc commented May 11, 2017 • edited Loading

Checklist

Affected core subsystem(s)

refack left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

refack commented May 11, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

refack May 12, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

refack commented May 12, 2017 • edited Loading

refack May 12, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

refack May 13, 2017 • edited Loading

Choose a reason for hiding this comment

refack commented May 13, 2017

refack commented May 13, 2017

sebastianplesciuc commented May 13, 2017

refack commented May 13, 2017

sebastianplesciuc commented May 13, 2017

Trott left a comment

Choose a reason for hiding this comment

santigimeno commented May 13, 2017

refack commented May 13, 2017 • edited Loading

refack commented May 13, 2017

Trott May 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Trott commented May 13, 2017 • edited Loading

refack commented May 13, 2017

sebastianplesciuc commented May 14, 2017

refack commented May 14, 2017 • edited Loading

refack commented May 14, 2017 • edited by gibfahn Loading

Trott commented May 14, 2017

Trott commented May 14, 2017

Trott commented May 16, 2017

Trott left a comment

Choose a reason for hiding this comment

refack commented May 16, 2017

refack commented May 16, 2017

refack commented May 16, 2017

refack commented May 16, 2017

Trott commented May 16, 2017

refack commented May 16, 2017

gibfahn commented May 16, 2017

Trott commented May 16, 2017

Trott commented May 17, 2017

sebastianplesciuc commented May 17, 2017

refack commented May 17, 2017 • edited Loading

refack commented May 17, 2017

gibfahn commented May 17, 2017

Trott commented May 17, 2017

sebastianplesciuc commented May 11, 2017 •

edited

Loading

refack May 12, 2017 •

edited

Loading

refack commented May 12, 2017 •

edited

Loading

refack May 12, 2017 •

edited

Loading

refack May 13, 2017 •

edited

Loading

refack commented May 13, 2017 •

edited

Loading

Trott May 13, 2017 •

edited

Loading

Trott commented May 13, 2017 •

edited

Loading

refack commented May 14, 2017 •

edited

Loading

refack commented May 14, 2017 •

edited by gibfahn

Loading

refack commented May 17, 2017 •

edited

Loading