Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate flaky test-worker-exit-code on Windows CI #25847

Closed
Trott opened this issue Jan 31, 2019 · 4 comments
Closed

Investigate flaky test-worker-exit-code on Windows CI #25847

Trott opened this issue Jan 31, 2019 · 4 comments
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. windows Issues and PRs related to the Windows platform. worker Issues and PRs related to Worker support.

Comments

@Trott
Copy link
Member

Trott commented Jan 31, 2019

https://ci.nodejs.org/job/node-test-binary-windows/23427/COMPILED_BY=vs2017,RUNNER=win2008r2-vs2017,RUN_SUBSET=0/console

test-rackspace-win2008r2-x64-4

07:23:03 not ok 535 parallel/test-worker-exit-code
07:23:03   ---
07:23:03   duration_ms: 120.167
07:23:03   severity: fail
07:23:03   exitcode: 1
07:23:03   stack: |-
07:23:03     timeout
07:23:03     ok - 0 exited with 42
07:23:03     ok - 1 exited with 42
07:23:03     ok - 2 exited with 0
07:23:03     ok - 4 exited with 99
07:23:03     Error: ok
07:23:03         at Object.exitWithOneOnUncaught [as func] (c:\workspace\node-test-binary-windows\test\fixtures\process-exit-code-cases.js:47:9)
07:23:03         at MessagePort.parentPort.once (c:\workspace\node-test-binary-windows\test\parallel\test-worker-exit-code.js:23:54)
07:23:03         at Object.onceWrapper (events.js:285:13)
07:23:03         at MessagePort.emit (events.js:197:13)
07:23:03         at MessagePort.onmessage (internal/worker/io.js:68:8)
07:23:03     ok - 3 exited with 1
07:23:03     ok - 5 exited with 0
07:23:03     ok - 6 exited with 97
07:23:03     Error: ok
07:23:03         at Object.changeCodeInExitWithUncaught [as func] (c:\workspace\node-test-binary-windows\test\fixtures\process-exit-code-cases.js:93:9)
07:23:03         at MessagePort.parentPort.once (c:\workspace\node-test-binary-windows\test\parallel\test-worker-exit-code.js:23:54)
07:23:03         at Object.onceWrapper (events.js:285:13)
07:23:03         at MessagePort.emit (events.js:197:13)
07:23:03         at MessagePort.onmessage (internal/worker/io.js:68:8)
07:23:03     Error: ok
07:23:03         at Object.exitWithZeroInExitWithUncaught [as func] (c:\workspace\node-test-binary-windows\test\fixtures\process-exit-code-cases.js:107:9)
07:23:03         at MessagePort.parentPort.once (c:\workspace\node-test-binary-windows\test\parallel\test-worker-exit-code.js:23:54)
07:23:03         at Object.onceWrapper (events.js:285:13)
07:23:03         at MessagePort.emit (events.js:197:13)
07:23:03         at MessagePort.onmessage (internal/worker/io.js:68:8)
07:23:03     ok - 7 exited with 98
07:23:03     ok - 8 exited with 0
07:23:03   ...
@Trott Trott added windows Issues and PRs related to the Windows platform. flaky-test Issues and PRs related to the tests with unstable failures on the CI. worker Issues and PRs related to Worker support. labels Jan 31, 2019
@Trott Trott changed the title Investigate flaky test-worker-exit-code on Windwos CI Investigate flaky test-worker-exit-code on Windows CI Jan 31, 2019
@Trott
Copy link
Member Author

Trott commented Jan 31, 2019

https://ci.nodejs.org/job/node-test-binary-windows/23456/COMPILED_BY=vs2017,RUNNER=win2008r2-vs2017,RUN_SUBSET=0/console

test-rackspace-win2008r2-x64-5

0:09:57 not ok 536 parallel/test-worker-exit-code
10:09:57   ---
10:09:57   duration_ms: 120.103
10:09:57   severity: fail
10:09:57   exitcode: 1
10:09:57   stack: |-
10:09:57     timeout
10:09:57     ok - 0 exited with 42
10:09:57     ok - 1 exited with 42
10:09:57     ok - 2 exited with 0
10:09:57     Error: ok
10:09:57         at Object.exitWithOneOnUncaught [as func] (c:\workspace\node-test-binary-windows\test\fixtures\process-exit-code-cases.js:47:9)
10:09:57         at MessagePort.parentPort.once (c:\workspace\node-test-binary-windows\test\parallel\test-worker-exit-code.js:23:54)
10:09:57         at Object.onceWrapper (events.js:285:13)
10:09:57         at MessagePort.emit (events.js:197:13)
10:09:57         at MessagePort.onmessage (internal/worker/io.js:68:8)
10:09:57     ok - 3 exited with 1
10:09:57     ok - 4 exited with 99
10:09:57     ok - 5 exited with 0
10:09:57     ok - 6 exited with 97
10:09:57     Error: ok
10:09:57         at Object.changeCodeInExitWithUncaught [as func] (c:\workspace\node-test-binary-windows\test\fixtures\process-exit-code-cases.js:93:9)
10:09:57         at MessagePort.parentPort.once (c:\workspace\node-test-binary-windows\test\parallel\test-worker-exit-code.js:23:54)
10:09:57         at Object.onceWrapper (events.js:285:13)
10:09:57         at MessagePort.emit (events.js:197:13)
10:09:57         at MessagePort.onmessage (internal/worker/io.js:68:8)
10:09:57     Error: ok
10:09:57         at Object.exitWithZeroInExitWithUncaught [as func] (c:\workspace\node-test-binary-windows\test\fixtures\process-exit-code-cases.js:107:9)
10:09:57         at MessagePort.parentPort.once (c:\workspace\node-test-binary-windows\test\parallel\test-worker-exit-code.js:23:54)
10:09:57         at Object.onceWrapper (events.js:285:13)
10:09:57         at MessagePort.emit (events.js:197:13)
10:09:57         at MessagePort.onmessage (internal/worker/io.js:68:8)
10:09:57     ok - 7 exited with 98
10:09:57     ok - 8 exited with 0
10:09:57   ...

@Trott
Copy link
Member Author

Trott commented Jan 31, 2019

@nodejs/platform-windows @nodejs/workers @nodejs/testing

@addaleax
Copy link
Member

I ran a stress test on LinuxOne + 3 Windows-es: https://ci.nodejs.org/job/node-stress-single-test/2145/

Looks like this might be only flaky on win2008 (25/1000) + win2012 (10/1000) but not win2016?

This is the 4th flaky test revolving around exiting workers on Windows, besides #25702, #24005, #23873. If this is an OS-specific problem that occurs only on older versions of Windows, I’m not sure what the best strategy to figure this out is?

@gireeshpunathil
Copy link
Member

my thinking (not proven with evidence) is that these systems present variance of latency in terms of thread life cycle, disk access and scheduling leading to race at different capacity; not necessarily anything to do with functional differences (which would have caused failures to be more consistent).

I will spend some time next week attempting to recreate one or more these.

addaleax added a commit to addaleax/node that referenced this issue Feb 3, 2019
The Windows ETW code is not written to be compatible with multi-threading,
and in particular it relies on global state like a single static
`uv_async_t`. Adding that to multiple threads would corrupt the
corresponding loops' handle queues.

This addresses the flakiness of at least `test-worker-exit-code` and
very likely other flaky tests that relate to Worker threads on Windows as well.

Fixes: nodejs#25847
Fixes: nodejs#25702
Fixes: nodejs#24005
Fixes: nodejs#23873
addaleax added a commit that referenced this issue Feb 6, 2019
The Windows ETW code is not written to be compatible with multi
threading, and in particular it relies on global state like a
single static `uv_async_t`. Adding that to multiple threads
would corrupt the corresponding loops' handle queues.

This addresses the flakiness of at least
`test-worker-exit-code` and very likely other flaky tests that
relate to Worker threads on Windows as well.

Fixes: #25847
Fixes: #25702
Fixes: #24005
Fixes: #23873

PR-URL: #25907
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. windows Issues and PRs related to the Windows platform. worker Issues and PRs related to Worker support.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants