Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate flaky test-vm-timeout.js #6727

Closed
Trott opened this issue May 13, 2016 · 12 comments
Closed

Investigate flaky test-vm-timeout.js #6727

Trott opened this issue May 13, 2016 · 12 comments
Labels
test Issues and PRs related to the tests. vm Issues and PRs related to the vm subsystem. windows Issues and PRs related to the Windows platform.

Comments

@Trott
Copy link
Member

Trott commented May 13, 2016

  • Version: master
  • Platform: CentOS 5 64-bit
  • Subsystem: test vm

Example failure: https://ci.nodejs.org/job/node-test-commit-linux/3362/nodes=centos5-64/tapTestReport/test.tap-1025/

not ok 1025 test-vm-timeout.js
# FATAL ERROR: v8::FromJust Maybe value is Nothing.
@Trott Trott added vm Issues and PRs related to the vm subsystem. test Issues and PRs related to the tests. labels May 13, 2016
bnoordhuis added a commit to bnoordhuis/io.js that referenced this issue May 13, 2016
Print a C backtrace on fatal errors to make it easier to debug issues
like nodejs#6727.
@mscdex
Copy link
Contributor

mscdex commented Jun 18, 2016

This same test recently failed on Windows, but with a different error:

not ok 267 parallel/test-vm-timeout
# vm.js:35
#     return realRunInContext.call(this, contextifiedSandbox, options);
#                             ^
# Error: Script execution interrupted.
#     at Error (native)
#     at ContextifyScript.Script.runInContext (vm.js:35:29)
#     at ContextifyScript.Script.runInNewContext (vm.js:41:15)
#     at Object.exports.runInNewContext (vm.js:72:17)
#     at context.runInVM (C:\workspace\node-test-binary-windows\RUN_SUBSET\3\VS_VERSION\vcbt2015\label\win10\test\parallel\test-vm-timeout.js:29:10)
#     at evalmachine.<anonymous>:1:1
#     at ContextifyScript.Script.runInContext (vm.js:35:29)
#     at ContextifyScript.Script.runInNewContext (vm.js:41:15)
#     at Object.exports.runInNewContext (vm.js:72:17)
#     at C:\workspace\node-test-binary-windows\RUN_SUBSET\3\VS_VERSION\vcbt2015\label\win10\test\parallel\test-vm-timeout.js:32:6

@addaleax
Copy link
Member

That error message is due to #6635, but it doesn’t seem to make sense that it fails in this particular way in test-vm-timeout.js. I’m trying to reproduce this locally, but no success on Linux (x64 Ubuntu 16.04) so far.

@Trott Trott added the windows Issues and PRs related to the Windows platform. label Jun 19, 2016
@Trott
Copy link
Member Author

Trott commented Jun 19, 2016

This one has gotten relentless on Windows recently. /cc @nodejs/platform-windows @nodejs/testing

@Trott
Copy link
Member Author

Trott commented Jun 20, 2016

Looks like there's some hope/intention that #6734 will provide more information, but it looks like it currently prints the exact same backtrace we already have: https://ci.nodejs.org/job/node-test-binary-windows/2575/RUN_SUBSET=3,VS_VERSION=vcbt2015,label=win10/tapTestReport/test.tap-267/

@bnoordhuis
Copy link
Member

@Trott It's not for augmenting JS stack traces (although I may do that in a follow-up PR) but it's for getting a meaningful C++ stack trace on fatal errors (e.g. the FATAL ERROR: v8::FromJust Maybe value is Nothing. you posted.)

Trott added a commit to Trott/io.js that referenced this issue Jun 22, 2016
@Trott
Copy link
Member Author

Trott commented Jun 22, 2016

@addaleax Your comment about #6635 refers to the Windows issue (with Script execution interrupted message) and not the Linux issue (with FATAL ERROR: v8::FromJust Maybe value is Nothing.), right?

You've probably noticed this but just in case: It's failing consistently this way on one (but only one!) of the Windows variations on CI and doesn't seem to fail this way anywhere else.

@Trott
Copy link
Member Author

Trott commented Jun 22, 2016

@nodejs/build Am I correct to observe that the Windows issue here is happening exclusively on the Azure hosts and never on the Rackspace hosts? Is there anything unusual about the Azure hosts that might cause a SIGINT to show up on Azure but not Rackspace, perhaps in some unexpected way? (Like maybe the terminal mode is in raw mode on Rackspace but not on Azure or something like that? I'm reaching here, but desperate times call for desperate uninformed rambling on GitHub. It's the law.)

@addaleax
Copy link
Member

@Trott There’s a good chance the Windows issue is the Linux issue, my PR just changed the error message (because the assumption is that an interrupted script + no timeout indicated = received SIGINT). No part of test-vm-timeout installs any SIGINT handler, though.

@addaleax
Copy link
Member

@Trott Could you try running a stress test or something with the outer timeout (line 32) increased from 100 to some significantly higher value, 100000 or so? It should not make the test run any longer than it does.

Trott added a commit to Trott/io.js that referenced this issue Jun 22, 2016
PR-URL: nodejs#7359
Refs: nodejs#6727
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Joao Reis <reis@janeasystems.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
@joaocgreis
Copy link
Member

This fails only in Windows 10 machines, which we have only in Azure and are quite slow. This fails with any compiles (actually, VCBT, the one used by normal CI for W10, seems to be the only one that makes it not fail sometimes). (Experiments here for reference)

@addaleax @Trott the test still fails with 150ms but I didn't see a failure with 200ms (tested a few times in one CI machine). We could rise this timeout to 1000ms or so. Would that make the test incorrect?

@Trott
Copy link
Member Author

Trott commented Jun 22, 2016

@joaocgreis I'm not familiar with this part of the code base so I definitely defer to @addaleax or @bnoordhuis on whether that's still a valid test. Seems it to me.

Ignorant speculation on my part, but maybe on a sufficiently slow Windows machine, the 10ms timeout and the 100ms timeout fire out of order (because it takes more than 100ms to get out of the current tick and now both timeouts are ready to fire?). So the outer vm times out first, and the inner vm hasn't yet set its timedout boolean to true or whatever, and it has code along the lines of:

if (timedout) {
    /* code that should run here but does not */ 
} else {
    /* code that actually runs here but shouldn't and assumes SIGINT */ 
}

@addaleax
Copy link
Member

addaleax commented Jun 22, 2016

@Trott I can’t reproduce the bug locally (neither on Windows nor on Linux), at least with test-vm-timeout.js, but yeah, that makes a lot of sense, I should have a PR ready soon.

addaleax added a commit to addaleax/node that referenced this issue Jun 23, 2016
Likely fix the flaky parallel/test-vm-timeout. Increase the outer
timeout in the test checking for nested timeouts with `vm` scripts
so that its firing won’t interfere with the inner timeout.

Fixes: nodejs#6727
Fishrock123 pushed a commit that referenced this issue Jun 27, 2016
PR-URL: #7359
Refs: #6727
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Joao Reis <reis@janeasystems.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Fishrock123 pushed a commit that referenced this issue Jun 27, 2016
Likely fix the flaky parallel/test-vm-timeout. Increase the outer
timeout in the test checking for nested timeouts with `vm` scripts
so that its firing won’t interfere with the inner timeout.

Fixes: #6727
PR-URL: #7373
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
bnoordhuis added a commit to bnoordhuis/io.js that referenced this issue Jun 29, 2016
Print a C backtrace on fatal errors to make it easier to debug issues
like nodejs#6727.

PR-URL: nodejs#6734
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Fishrock123 pushed a commit that referenced this issue Jul 5, 2016
PR-URL: #7359
Refs: #6727
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Joao Reis <reis@janeasystems.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Fishrock123 pushed a commit that referenced this issue Jul 5, 2016
Likely fix the flaky parallel/test-vm-timeout. Increase the outer
timeout in the test checking for nested timeouts with `vm` scripts
so that its firing won’t interfere with the inner timeout.

Fixes: #6727
PR-URL: #7373
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Fishrock123 pushed a commit that referenced this issue Jul 5, 2016
Print a C backtrace on fatal errors to make it easier to debug issues
like #6727.

PR-URL: #6734
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
MylesBorins pushed a commit that referenced this issue Jul 11, 2016
PR-URL: #7359
Refs: #6727
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Joao Reis <reis@janeasystems.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
MylesBorins pushed a commit that referenced this issue Jul 12, 2016
PR-URL: #7359
Refs: #6727
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Joao Reis <reis@janeasystems.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
addaleax added a commit to addaleax/node that referenced this issue Jul 12, 2016
Likely fix the flaky parallel/test-vm-timeout. Increase the outer
timeout in the test checking for nested timeouts with `vm` scripts
so that its firing won’t interfere with the inner timeout.

Fixes: nodejs#6727
PR-URL: nodejs#7373
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
MylesBorins pushed a commit that referenced this issue Jul 12, 2016
Likely fix the flaky parallel/test-vm-timeout. Increase the outer
timeout in the test checking for nested timeouts with `vm` scripts
so that its firing won’t interfere with the inner timeout.

Fixes: #6727
PR-URL: #7373
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
MylesBorins pushed a commit that referenced this issue Jul 12, 2016
PR-URL: #7359
Refs: #6727
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Joao Reis <reis@janeasystems.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
MylesBorins pushed a commit that referenced this issue Jul 12, 2016
Likely fix the flaky parallel/test-vm-timeout. Increase the outer
timeout in the test checking for nested timeouts with `vm` scripts
so that its firing won’t interfere with the inner timeout.

Fixes: #6727
PR-URL: #7373
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
MylesBorins pushed a commit that referenced this issue Jul 14, 2016
PR-URL: #7359
Refs: #6727
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Joao Reis <reis@janeasystems.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
MylesBorins pushed a commit that referenced this issue Jul 14, 2016
Likely fix the flaky parallel/test-vm-timeout. Increase the outer
timeout in the test checking for nested timeouts with `vm` scripts
so that its firing won’t interfere with the inner timeout.

Fixes: #6727
PR-URL: #7373
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
MylesBorins pushed a commit that referenced this issue Jul 14, 2016
PR-URL: #7359
Refs: #6727
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Joao Reis <reis@janeasystems.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
MylesBorins pushed a commit that referenced this issue Jul 14, 2016
Likely fix the flaky parallel/test-vm-timeout. Increase the outer
timeout in the test checking for nested timeouts with `vm` scripts
so that its firing won’t interfere with the inner timeout.

Fixes: #6727
PR-URL: #7373
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Issues and PRs related to the tests. vm Issues and PRs related to the vm subsystem. windows Issues and PRs related to the Windows platform.
Projects
None yet
Development

No branches or pull requests

5 participants