New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: fix test-cluster-dgram-1 flakiness #8383

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@santigimeno
Member

santigimeno commented Sep 2, 2016

Checklist
  • make -j4 test (UNIX), or vcbuild test nosign (Windows) passes
  • commit message follows commit guidelines
Affected core subsystem(s)

test

Description of change

Check for the number of messages received in the exit event listener
instead of the disconnect listener.

Fixes: #8380

santigimeno added some commits Sep 2, 2016

test: fix test-cluster-dgram-1 flakiness
Check for the number of messages received in the `exit` event listener
instead of the `disconnect` listener.

Fixes: #8380
@cjihrig

This comment has been minimized.

Show comment
Hide comment
@cjihrig

cjihrig Sep 2, 2016

Contributor

LGTM

Contributor

cjihrig commented Sep 2, 2016

LGTM

@mhdawson

This comment has been minimized.

Show comment
Hide comment
@mhdawson

mhdawson Sep 2, 2016

Member

Test results on AIX

For the original before the refactor I got 0 failures out of 200 runs.

After the refactor I get 46/150 failures

With this fix the frequency of failures goes done to about 3/150 but it still fails consistently with: (note it says parallel/test-cluster-dgram-3 as opposed to parallel/test-cluster-dgram-1 simply because I copied the new version into a different file for testing).

Mismatched function calls. Expected 10, actual 0.
at worker (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-cluster-dgram-3.js:82:31)
at Object. (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-cluster-dgram-3.js:20:3)
at Module._compile (module.js:409:26)
at Object.Module._extensions..js (module.js:416:10)
at Module.load (module.js:343:32)
at Function.Module._load (module.js:300:12)
at Function.Module.runMain (module.js:441:10)
at startup (node.js:139:18)
at node.js:974:3
assert.js:89
throw new assert.AssertionError({
^
AssertionError: 0 === 10
at Worker. (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-cluster-dgram-3.js:70:14)
at Worker. (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/common.js:401:15)
at emitTwo (events.js:87:13)
at Worker.emit (events.js:172:7)
at ChildProcess. (cluster.js:364:14)
at ChildProcess.g (events.js:260:16)
at emitTwo (events.js:87:13)
at ChildProcess.emit (events.js:172:7)
at Process.ChildProcess._handle.onexit (internal/child_process.js:200:12)

So the next is that it makes things better but does not completely resolve the flakiness at least on AIX.

Member

mhdawson commented Sep 2, 2016

Test results on AIX

For the original before the refactor I got 0 failures out of 200 runs.

After the refactor I get 46/150 failures

With this fix the frequency of failures goes done to about 3/150 but it still fails consistently with: (note it says parallel/test-cluster-dgram-3 as opposed to parallel/test-cluster-dgram-1 simply because I copied the new version into a different file for testing).

Mismatched function calls. Expected 10, actual 0.
at worker (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-cluster-dgram-3.js:82:31)
at Object. (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-cluster-dgram-3.js:20:3)
at Module._compile (module.js:409:26)
at Object.Module._extensions..js (module.js:416:10)
at Module.load (module.js:343:32)
at Function.Module._load (module.js:300:12)
at Function.Module.runMain (module.js:441:10)
at startup (node.js:139:18)
at node.js:974:3
assert.js:89
throw new assert.AssertionError({
^
AssertionError: 0 === 10
at Worker. (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-cluster-dgram-3.js:70:14)
at Worker. (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/common.js:401:15)
at emitTwo (events.js:87:13)
at Worker.emit (events.js:172:7)
at ChildProcess. (cluster.js:364:14)
at ChildProcess.g (events.js:260:16)
at emitTwo (events.js:87:13)
at ChildProcess.emit (events.js:172:7)
at Process.ChildProcess._handle.onexit (internal/child_process.js:200:12)

So the next is that it makes things better but does not completely resolve the flakiness at least on AIX.

@santigimeno

This comment has been minimized.

Show comment
Hide comment
@santigimeno

santigimeno Sep 2, 2016

Member

@mhdawson I have pushed a fix. Can you try again? Thanks!

Member

santigimeno commented Sep 2, 2016

@mhdawson I have pushed a fix. Can you try again? Thanks!

@mhdawson

This comment has been minimized.

Show comment
Hide comment
@mhdawson

mhdawson Sep 2, 2016

Member

@santigimeno that seems to do the trick 0 failures out of 450 so LGTM.

Member

mhdawson commented Sep 2, 2016

@santigimeno that seems to do the trick 0 failures out of 450 so LGTM.

@mhdawson

This comment has been minimized.

Show comment
Hide comment
@jasnell

This comment has been minimized.

Show comment
Hide comment
@jasnell

jasnell Sep 2, 2016

Member

FWIW, given how this is being fixed, I could be wrong but it looks like the refactor didn't actually break the test as much as highlight a failure that had already been happening but hadn't been caught.

Member

jasnell commented Sep 2, 2016

FWIW, given how this is being fixed, I could be wrong but it looks like the refactor didn't actually break the test as much as highlight a failure that had already been happening but hadn't been caught.

@jasnell

This comment has been minimized.

Show comment
Hide comment
@jasnell

jasnell Sep 2, 2016

Member

LGTM

Member

jasnell commented Sep 2, 2016

LGTM

@jasnell

This comment has been minimized.

Show comment
Hide comment
@jasnell

jasnell Sep 2, 2016

Member

I'd say given the breakage that the changes in the test are causing in CI, if this is non-controversial we shouldn't need to wait the 48 hours to land. /cc @Trott

Member

jasnell commented Sep 2, 2016

I'd say given the breakage that the changes in the test are causing in CI, if this is non-controversial we shouldn't need to wait the 48 hours to land. /cc @Trott

mhdawson pushed a commit that referenced this pull request Sep 2, 2016

test: fix test-cluster-dgram-1 flakiness
Check for the number of messages received in the `exit` event listener
instead of the `disconnect` listener.

Fixes: #8380
PR-URL: #8383
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed By: James M Snell <jasnell@gmail.com>
@Trott

This comment has been minimized.

Show comment
Hide comment
@Trott

Trott Sep 2, 2016

Member

Agreed on the "landing sooner than 48 hours" suggestion.

Member

Trott commented Sep 2, 2016

Agreed on the "landing sooner than 48 hours" suggestion.

@mhdawson

This comment has been minimized.

Show comment
Hide comment
@mhdawson

mhdawson Sep 2, 2016

Member

landed as 2d2a2d7

Member

mhdawson commented Sep 2, 2016

landed as 2d2a2d7

@mhdawson mhdawson closed this Sep 2, 2016

@santigimeno

This comment has been minimized.

Show comment
Hide comment
@santigimeno

santigimeno Sep 2, 2016

Member

I understand the hurry, but I was counting on amending the commit message before merging because, after the fixup commit, the fix explanation was different.

Member

santigimeno commented Sep 2, 2016

I understand the hurry, but I was counting on amending the commit message before merging because, after the fixup commit, the fix explanation was different.

@Fishrock123 Fishrock123 referenced this pull request Sep 6, 2016

Closed

v6.6.0 pre-proposal #8428

Fishrock123 added a commit to Fishrock123/node that referenced this pull request Sep 8, 2016

test: fix test-cluster-dgram-1 flakiness
Check for the number of messages received in the `exit` event listener
instead of the `disconnect` listener.

Fixes: nodejs#8380
PR-URL: nodejs#8383
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed By: James M Snell <jasnell@gmail.com>

Fishrock123 added a commit that referenced this pull request Sep 9, 2016

test: fix test-cluster-dgram-1 flakiness
Check for the number of messages received in the `exit` event listener
instead of the `disconnect` listener.

Fixes: #8380
PR-URL: #8383
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed By: James M Snell <jasnell@gmail.com>
@MylesBorins

This comment has been minimized.

Show comment
Hide comment
@MylesBorins

MylesBorins Sep 30, 2016

Member

This does not land cleanly in LTS. Added dont-land label. Please feel free to manually backport

Member

MylesBorins commented Sep 30, 2016

This does not land cleanly in LTS. Added dont-land label. Please feel free to manually backport

santigimeno added a commit to santigimeno/node that referenced this pull request Oct 15, 2016

test: fix test-cluster-dgram-1 flakiness
Check for the number of messages received in the `exit` event listener
instead of the `disconnect` listener.

Fixes: nodejs#8380
Ref: nodejs#8383
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed By: James M Snell <jasnell@gmail.com>
@santigimeno

This comment has been minimized.

Show comment
Hide comment
@santigimeno

santigimeno Oct 15, 2016

Member

@thealphanerd backport to 4.x here: #9109

Member

santigimeno commented Oct 15, 2016

@thealphanerd backport to 4.x here: #9109

MylesBorins added a commit that referenced this pull request Oct 24, 2016

test: fix test-cluster-dgram-1 flakiness
Check for the number of messages received in the `exit` event listener
instead of the `disconnect` listener.

Fixes: #8380
Ref: #8383
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed By: James M Snell <jasnell@gmail.com>

MylesBorins added a commit that referenced this pull request Oct 26, 2016

test: fix test-cluster-dgram-1 flakiness
Check for the number of messages received in the `exit` event listener
instead of the `disconnect` listener.

Fixes: #8380
Ref: #8383
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment