Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: ignore spurious 'EMFILE' #12698

Merged
merged 1 commit into from May 19, 2017

Conversation

@refack
Copy link
Member

commented Apr 27, 2017

As discussed in most recently on #10286 test/parallel/test-child-process-fork-regr-gh-2847.js is a little flaky
Last sighting was

c:\workspace\node-test-binary-windows\RUN_SUBSET\1\VS_VERSION\vcbt2015\label\win10\test\parallel\test-child-process-fork-regr-gh-2847.js:61
            throw err;
            ^

Error: write EMFILE
    at exports._errnoException (util.js:1026:11)
    at ChildProcess.target._send (internal/child_process.js:663:20)
    at ChildProcess.target.send (internal/child_process.js:547:19)
    at Worker.send (internal/cluster/worker.js:54:28)
    at Socket.<anonymous> (c:\workspace\node-test-binary-windows\RUN_SUBSET\1\VS_VERSION\vcbt2015\label\win10\test\parallel\test-child-process-fork-regr-gh-2847.js:33:14)
    at Object.onceWrapper (events.js:312:19)
    at emitNone (events.js:105:13)
    at Socket.emit (events.js:207:7)
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1102:10)

Possible explanation #3635 (comment)
So as a follow up to #5422 I suggest we also ignore EMFILE raised from the connection.

Ref: #3635
Ref: #5178
Ref: #5179
Ref: #4005
Ref: #5121
Ref: #5422
Ref: #12621 (comment)
Fixes: #10286

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

test

@refack

This comment has been minimized.

Copy link
Member Author

commented Apr 27, 2017

@Trott

This comment has been minimized.

Copy link
Member

commented Apr 27, 2017

@Trott
Copy link
Member

left a comment

Without a reasonable explanation as to why EMFILE is being raised, I'm not thrilled with the idea of adding it just to make the test pass. The other errors listed make sense to me based on the comment above, but EMFILE seems surprising. This could just be my own ignorance....

@refack

This comment has been minimized.

Copy link
Member Author

commented Apr 27, 2017

Sorry, I missed adding that reference: #3635 (comment)
P.S. The test passes most of the time without this change, so IMHO it's not forcing it to pass, but making is less flaky.

@gibfahn
Copy link
Member

left a comment

@@ -37,7 +37,8 @@ const server = net.createServer(function(s) {
// is still happening while the server has been closed.
s.on('error', function(err) {
if ((err.code !== 'ECONNRESET') &&
((err.code !== 'ECONNREFUSED'))) {
(err.code !== 'ECONNREFUSED') &&
(err.code !== 'EMFILE')) {

This comment has been minimized.

Copy link
@gibfahn

gibfahn Apr 27, 2017

Member

@refack could you put a brief two line explanation of why this error occurs in a comment above? When reading through tests mysteriously ignored errors are always a bit of code smell.

This comment has been minimized.

Copy link
@refack

refack May 19, 2017

Author Member

ack.
PTAL

@refack refack force-pushed the refack:more-for-10286 branch Apr 27, 2017

@refack

This comment has been minimized.

Copy link
Member Author

commented Apr 27, 2017

@Trott @gibfahn
I tried to document both the original purpose of this test, and the possible causes for the "EMFILE".
test/parallel/test-child-process-fork-regr-gh-2847.js has a very long history of being flaky.
As far as I can tell it's original purpose is to generate a very specific condition that pre #2847 used to cause a segfault, because this is based on asynchronous tasks, sometimes we just hit spurious connection errors.
It specific error was last seen on Windows, but I don't know if this is a Windows only scenario.

@refack

This comment has been minimized.

Copy link
Member Author

commented Apr 27, 2017

This exclusion would be a continuation to #5422
/cc @santigimeno

@Trott

This comment has been minimized.

Copy link
Member

commented Apr 27, 2017

Let's see if a stress test can reproduce the error (or other errors) and if load on the machine matters at all. If the answer here is "move to sequential", I'd rather do that than have a bunch of exceptions in the code. Maybe we can even remove an exception or two if we go that route. Or not. Let's see...

Stress test on win10 with just one test running at a time:
https://ci.nodejs.org/job/node-stress-single-test/1180/nodes=win10/console

Stress test on win10 with 100 tests running at a time:
https://ci.nodejs.org/job/node-stress-single-test/1181/nodes=win10/console

@Trott

This comment has been minimized.

Copy link
Member

commented Apr 28, 2017

Stress tests seem to show that EMFILE happens under load. Looks like moving the test to sequential may be the way to go therefore? Although I'd be interested in understanding why load causes EMFILE here. Maybe it's obvious once I look at the test, but I gotta log off right now....

@refack

This comment has been minimized.

Copy link
Member Author

commented Apr 28, 2017

Maybe windows is actually running out of file handles?
More probably high load causes the scheduler to be more aggressive, hence races.
Is there a way to test is sequentially but under external synthetic load?

IMHO if we want to just keep the original intention of the test (protect from regressing #2847) we could ignore all errors, except a SEGFAULT. If you think this test is useful for covering other cases, then it's worth the extra time...

@Trott

This comment has been minimized.

Copy link
Member

commented Apr 28, 2017

Hope I'm not being presumptuous or anything, but on my own fork/branch: I removed all the guard code except the one that I"m sure is needed, moved to sequential, opened #12713 but labeled it in progress and started a CI stress test using the ALL label at https://ci.nodejs.org/job/node-stress-single-test/1182/ to see if any other guard code needs to be restored.

I'm in favor of having this be specific to that regression. I like my tests narrowly focused, especially ones designed for oddball regressions. :-D

@refack

This comment has been minimized.

Copy link
Member Author

commented Apr 28, 2017

Why not have them both, one sequential and unfiltered, and one parallel with the filters? All well documented of course.

@gibfahn

This comment has been minimized.

Copy link
Member

commented Apr 28, 2017

Why not have them both, one sequential and unfiltered, and one parallel with the filters? All well documented of course.

Why would we? Moving a test to sequential just means it can't be run in parallel, which is a minor annoyance as it makes our test runs slightly slower. Duplicating the test seems unnecessary.

Is there a way to test is sequentially but under external synthetic load?

I don't think we have a built in way to do that, I normally just build Node in the background, seems to do the trick (not the Node I'm testing obviously 😁 ). Maybe running benchmarks at the same time would provide enough load?

@refack

This comment has been minimized.

Copy link
Member Author

commented Apr 28, 2017

@gibfahn sorry if I'm not RTFM but how do I trigger a stress test (like on the CI) locally?

@gibfahn

This comment has been minimized.

Copy link
Member

commented Apr 28, 2017

@refack tools/test.py --repeat 1000 parallel will run the parallel test suite 1000 times.

Probably worth doing something like:

tools/test.py --repeat 1000 -p tap parallel/test-child-process-fork-regr-gh-2847 | tee out.tap

and then grepping for not ok.

You could use Powershell's Tee-Object on Windows.

@refack

This comment has been minimized.

Copy link
Member Author

commented Apr 28, 2017

Why would we? Moving a test to sequential just means it can't be run in parallel, which is a minor annoyance as it makes our test runs slightly slower. Duplicating the test seems unnecessary.

@gibfahn: @Trott's premise is that in sequential mode the exception exclusions are unnecessary (#12713). Although they are obviously needed in parallel mode. So having them tested in both modes tests this scenario under two different conditions. "Can't have too much testing"?
Obviously these reasoning should be well documented within the test.

P.S. @Trott you had a typo (sequential}/test-child-process-fork-regr-gh-2847) so I'm re-runing: https://ci.nodejs.org/job/node-stress-single-test/1186/

@gibfahn

This comment has been minimized.

Copy link
Member

commented Apr 28, 2017

"Can't have too much testing"?

I disagree. Every test should be testing something useful, otherwise you end up with a huge backlog of tests, and it's not sure what a test is actually for or why it failed.

So having them tested in both modes tests this scenario under two different conditions.

But EMFILE is only thrown very occasionally, and by ignoring this error as well we're not actually testing the EMFILE exception. If we think we should see this EMFILE error under some conditions, then we should try to get a test that reliably reproduces it, and the test for that probably doesn't need to also test this regression.

@Trott

This comment has been minimized.

Copy link
Member

commented Apr 28, 2017

@refack tools/test.py --repeat 1000 parallel will run the parallel test suite 1000 times.

To stress test a parallel test locally, I also use -j to specify how many tests to run simultaneously.

$ tools/test.py --repeat=92 -j 92 parallel/test-foo-bar

@Trott Trott dismissed their stale review Apr 28, 2017

Just got EMFILE in a sequential run, and reading the explanation linked above, it seems to be an unexpected if infrequent thing to happen in this test. Clearing my objection.

@Trott

This comment has been minimized.

Copy link
Member

commented Apr 28, 2017

Argh, can't edit the above (I don't think) but I meant to say expected if infrequent rather than unexpected if infrequent.

test/parallel/test-child-process-fork-regr-gh-2847.js Outdated
@@ -35,9 +40,14 @@ const server = net.createServer(function(s) {

// Errors can happen if this connection
// is still happening while the server has been closed.
// https://github.com/nodejs/node/issues/3635#issuecomment-157714683
// If a previous request caused the worker __and__ the server to close
// a following "send" request could emit EMFILE, this is a sporius

This comment has been minimized.

Copy link
@gibfahn

gibfahn Apr 28, 2017

Member

sporius -> spurious

test/parallel/test-child-process-fork-regr-gh-2847.js Outdated
@@ -35,9 +40,14 @@ const server = net.createServer(function(s) {

// Errors can happen if this connection

This comment has been minimized.

Copy link
@gibfahn

gibfahn Apr 28, 2017

Member

Maybe:
- ECONNREFUSED or ECONNRESET errors can happen if this connection

test/parallel/test-child-process-fork-regr-gh-2847.js Outdated
@@ -35,9 +40,14 @@ const server = net.createServer(function(s) {

// Errors can happen if this connection
// is still happening while the server has been closed.
// https://github.com/nodejs/node/issues/3635#issuecomment-157714683
// If a previous request caused the worker __and__ the server to close

This comment has been minimized.

Copy link
@gibfahn

gibfahn Apr 28, 2017

Member

- If a previous request...

Basically can it be really clear which explanation is for which errors?

@gibfahn
Copy link
Member

left a comment

Mostly LGTM modulo nits

test/parallel/test-child-process-fork-regr-gh-2847.js Outdated
@@ -1,3 +1,7 @@
/* Description:
* Before #2847 a child process trying (asynchronously) to use a the closed

This comment has been minimized.

Copy link
@gibfahn

gibfahn Apr 28, 2017

Member

#2847 -> https://github.com/nodejs/node/pull/2847

a the -> a

test/parallel/test-child-process-fork-regr-gh-2847.js Outdated
@@ -1,3 +1,7 @@
/* Description:
* Before #2847 a child process trying (asynchronously) to use a the closed
* channel to it's creator, would have caused a segfault.

This comment has been minimized.

Copy link
@gibfahn

gibfahn Apr 28, 2017

Member

to connect to ?

test/parallel/test-child-process-fork-regr-gh-2847.js Outdated
@@ -1,3 +1,7 @@
/* Description:

This comment has been minimized.

Copy link
@gibfahn

gibfahn Apr 28, 2017

Member

Probably don't need the Description:

@refack refack force-pushed the refack:more-for-10286 branch Apr 28, 2017

@refack

This comment has been minimized.

Copy link
Member Author

commented Apr 28, 2017

@gibfahn, commands addressed PTAL.

@santigimeno
Copy link
Member

left a comment

LGTM with one nit

test/parallel/test-child-process-fork-regr-gh-2847.js Outdated
// is still happening while the server has been closed.
// https://github.com/nodejs/node/issues/3635#issuecomment-157714683
// ECONNREFUSED or ECONNRESET errors can happen if this connection can
// happen if this connection is still happening while the server has closed.

This comment has been minimized.

Copy link
@santigimeno

santigimeno Apr 28, 2017

Member

can happen if this connection is repeated twice

@jasnell
Copy link
Member

left a comment

Generally LGTM with a couple of nits

test/parallel/test-child-process-fork-regr-gh-2847.js Outdated
* Before https://github.com/nodejs/node/pull/2847 a child process
* trying (asynchronously) to use the closed channel to it's creator
* would have caused a segfault.
*/

This comment has been minimized.

Copy link
@jasnell

jasnell Apr 28, 2017

Member

Comment block should use the typical // style.

test/parallel/test-child-process-fork-regr-gh-2847.js Outdated
@@ -6,6 +11,7 @@ const assert = require('assert');
const cluster = require('cluster');
const net = require('net');


This comment has been minimized.

Copy link
@jasnell

jasnell Apr 28, 2017

Member

Unnecessary whitespace change

@refack refack force-pushed the refack:more-for-10286 branch Apr 28, 2017

@refack

This comment has been minimized.

Copy link
Member Author

commented Apr 28, 2017

Comments fixed, PTAL.

[joke]
You are all a bunch of nit-pickers
[/joke]
JK. Love you all 💌

@gibfahn
Copy link
Member

left a comment

You are all a bunch of nit-pickers

Speaking of which...

(I see it as a game, can you minimize the number of nits you'll get per PR)

@@ -1,3 +1,5 @@
// Before https://github.com/nodejs/node/pull/2847 a child process trying
// (asynchronously) to use the closed channel to it's creator caused a segfault.

This comment has been minimized.

Copy link
@gibfahn

gibfahn Apr 28, 2017

Member

it's -> its

@@ -1,3 +1,5 @@
// Before https://github.com/nodejs/node/pull/2847 a child process trying

This comment has been minimized.

Copy link
@gibfahn

gibfahn Apr 28, 2017

Member

Sorry, should be an issue, so:

https://github.com/nodejs/node/pull/2847->https://github.com/nodejs/node/issues/2847

Also can you order this as in the guide? Should look like:

'use strict';
const common = require('../common');

// Before https://github.com/nodejs/node/issues/2847 a child process trying
// (asynchronously) to use the closed channel to it's creator caused a segfault.
 
const assert = require('assert');

@refack refack force-pushed the refack:more-for-10286 branch 3 times, most recently Apr 28, 2017

test: ignore spurious 'EMFILE'
According to the explanation in #3635#issuecomment-157714683
And as a continuation to #5422 we also ignore EMFILE
"No more file descriptors are available,so no more files can be opened"

PR-URL: #12698
Fixes: #10286
Refs: #3635 (comment)
Refs: #5178
Refs: #5179
Refs: #4005
Refs: #5121
Refs: #5422
Refs: #12621 (comment)
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>

@refack refack force-pushed the refack:more-for-10286 branch to 6f21671 May 19, 2017

@refack refack merged commit 6f21671 into nodejs:master May 19, 2017

@refack

This comment has been minimized.

Copy link
Member Author

commented May 19, 2017

Landed in 6f21671

@refack

This comment has been minimized.

Copy link
Member Author

commented May 19, 2017

@refack refack deleted the refack:more-for-10286 branch May 19, 2017

@jasnell jasnell referenced this pull request May 28, 2017
@gibfahn gibfahn referenced this pull request Jun 15, 2017
2 of 3 tasks complete
MylesBorins added a commit that referenced this pull request Jul 17, 2017
test: ignore spurious 'EMFILE'
According to the explanation in #3635#issuecomment-157714683
And as a continuation to #5422 we also ignore EMFILE
"No more file descriptors are available,so no more files can be opened"

PR-URL: #12698
Fixes: #10286
Refs: #3635 (comment)
Refs: #5178
Refs: #5179
Refs: #4005
Refs: #5121
Refs: #5422
Refs: #12621 (comment)
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
@MylesBorins

This comment has been minimized.

Copy link
Member

commented Jul 17, 2017

I've landed this on v6.x... please lmk if this was a mistake

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.