Investigate flaky sequential/test-performance #23291

Trott · 2018-10-06T13:24:00Z

https://ci.nodejs.org/job/node-test-binary-arm/3710/RUN_SUBSET=4,label=pi1-docker/console

00:40:41 not ok 330 sequential/test-performance
00:40:41   ---
00:40:41   duration_ms: 5.141
00:40:41   severity: fail
00:40:41   exitcode: 1
00:40:41   stack: |-
00:40:41     { name: 'node',
00:40:41       entryType: 'node',
00:40:41       startTime: 0,
00:40:41       duration: { around: 4609.206676006317 },
00:40:41       nodeStart: { around: 0 },
00:40:41       v8Start: { around: 0 },
00:40:41       bootstrapComplete: { around: 4517.247398018837 },
00:40:41       environment: { around: 0 },
00:40:41       loopStart: -1,
00:40:41       loopExit: -1 }
00:40:41     assert.js:349
00:40:41         throw err;
00:40:41         ^
00:40:41     
00:40:41     AssertionError [ERR_ASSERTION]: bootstrapComplete: 1233.9948890209198 >= 1111
00:40:41         at checkNodeTiming (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:92:7)
00:40:41         at Object.<anonymous> (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:104:1)
00:40:41         at Module._compile (internal/modules/cjs/loader.js:703:30)
00:40:41         at Object.Module._extensions..js (internal/modules/cjs/loader.js:714:10)
00:40:41         at Module.load (internal/modules/cjs/loader.js:613:32)
00:40:41         at tryModuleLoad (internal/modules/cjs/loader.js:552:12)
00:40:41         at Function.Module._load (internal/modules/cjs/loader.js:544:3)
00:40:41         at Function.Module.runMain (internal/modules/cjs/loader.js:756:12)
00:40:41         at startup (internal/bootstrap/node.js:303:19)
00:40:41         at bootstrapNodeJSCore (internal/bootstrap/node.js:865:3)
00:40:41   ...

Looks like a case of testing elapsed time (as a proxy for performance?). Is this a case of the test being unreliable or is it a case of the test flagging something we care about? Test doesn't have a comment explaining itself but it was added in 67269fd / #14680 and there's context there.

/ping @jasnell

The text was updated successfully, but these errors were encountered:

Trott · 2018-10-06T13:24:29Z

@nodejs/testing @nodejs/performance

joyeecheung · 2018-10-06T19:23:28Z

(The performance team is for changes related to /benchmark, not the perf hooks)

Trott · 2018-11-04T01:41:06Z

https://ci.nodejs.org/job/node-test-commit-smartos/21373/nodes=smartos17-64/console

00:52:03 not ok 2418 sequential/test-performance
00:52:03   ---
00:52:03   duration_ms: 2.224
00:52:03   severity: fail
00:52:03   exitcode: 1
00:52:03   stack: |-
00:52:03     { name: 'node',
00:52:03       entryType: 'node',
00:52:03       startTime: 0,
00:52:03       duration: { around: 1893.748501777649 },
00:52:03       nodeStart: { around: 0 },
00:52:03       v8Start: { around: 0 },
00:52:03       bootstrapComplete: { around: 1887.9314036369324 },
00:52:03       environment: { around: 0 },
00:52:03       loopStart: -1,
00:52:03       loopExit: -1 }
00:52:03     assert.js:351
00:52:03         throw err;
00:52:03         ^
00:52:03     
00:52:03     AssertionError [ERR_ASSERTION]: bootstrapComplete: 1796.7190556526184 >= 1111
00:52:03         at checkNodeTiming (/home/iojs/build/workspace/node-test-commit-smartos/nodes/smartos17-64/test/sequential/test-performance.js:92:7)
00:52:03         at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-smartos/nodes/smartos17-64/test/sequential/test-performance.js:104:1)
00:52:03         at Module._compile (internal/modules/cjs/loader.js:722:30)
00:52:03         at Object.Module._extensions..js (internal/modules/cjs/loader.js:733:10)
00:52:03         at Module.load (internal/modules/cjs/loader.js:620:32)
00:52:03         at tryModuleLoad (internal/modules/cjs/loader.js:560:12)
00:52:03         at Function.Module._load (internal/modules/cjs/loader.js:552:3)
00:52:03         at Function.Module.runMain (internal/modules/cjs/loader.js:775:12)
00:52:03         at startup (internal/bootstrap/node.js:308:19)
00:52:03         at bootstrapNodeJSCore (internal/bootstrap/node.js:878:3)
00:52:03   ...

Trott · 2018-11-16T22:57:41Z

I want to run a stress test on smartos 17 but it doesn't look like it's in a choice in our stress test job on CI. Can someone add it? @nodejs/build

Fishrock123 · 2019-01-23T18:57:46Z

Happened again in #25656
https://ci.nodejs.org/job/node-test-binary-arm/5663/RUN_SUBSET=2,label=pi1-docker/testReport/(root)/test/sequential_test_performance/

Failed on a Raspberry PI 1 I think:

{ name: 'node',
  entryType: 'node',
  startTime: 0,
  duration: { around: 2265.783347129822 },
  nodeStart: { around: 0 },
  v8Start: { around: 0 },
  bootstrapComplete: { around: 2176.2094011306763 },
  environment: { around: 0 },
  loopStart: -1,
  loopExit: -1 }
{ name: 'node',
  entryType: 'node',
  startTime: 0,
  duration: { around: 3541.9893350601196 },
  nodeStart: { around: 0 },
  v8Start: { around: 0 },
  bootstrapComplete: { around: 2176.2094011306763 },
  environment: { around: 0 },
  loopStart: { around: 2176.2094011306763 },
  loopExit: -1 }
{ name: 'node',
  entryType: 'node',
  startTime: 0,
  duration: { around: 3574.7279500961304 },
  nodeStart: { around: 0 },
  v8Start: { around: 0 },
  bootstrapComplete: { around: 2176.2094011306763 },
  environment: { around: 0 },
  loopStart: { around: 2176.2094011306763 },
  loopExit: { around: 3575.0439462661743 } }
assert.js:351
    throw err;
    ^

AssertionError [ERR_ASSERTION]: bootstrapComplete: 1085.6422290802002 >= 1068
    at checkNodeTiming (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:92:7)
    at Timeout.setTimeout [as _onTimeout] (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:119:5)
    at listOnTimeout (timers.js:324:15)
    at processTimers (timers.js:268:5)

richardlau · 2019-01-23T19:23:30Z

I want to run a stress test on smartos 17 but it doesn't look like it's in a choice in our stress test job on CI. Can someone add it? @nodejs/build

@Trott Not sure if still required, since this was from a few months back, but I've just added it (seemed like a reasonable thing to have there): https://ci.nodejs.org/view/Stress/job/node-stress-single-test/jobConfigHistory/showDiffFiles?timestamp1=2018-12-07_11-55-34&timestamp2=2019-01-23_14-21-08

Fishrock123 · 2019-01-23T20:58:22Z

For some reason this has affected all three runs of #25656 - does this just always happen on master right now?

Fishrock123 · 2019-01-24T01:32:50Z

Failing on my other rebased PR too.

Trott · 2019-01-24T07:05:56Z

Looks like it's barely-but-relatively-consistently failing due to arbitrary-ish timing constraints. The test lacks a comment explaining its purpose which is unfortunate, given it's bland name open to multiple interpretations. I guess it's testing perf_hooks generally?

What is this delta and delay stuff? Should this use common.platformTimeout() to adjust some values on slower machines? I mean, that's a band-aid and not a fix, but I'll certainly take it. Another workaround-but-not-a-solution-but-whatever-close-enough might be "change 1000 to 2000" or "change 1000 to 500". ¯\(ツ)/¯

pinging folks who have authored lines in the test: @BridgeAR @jasnell @TimothyGu @apapirovski @addaleax

Trott · 2019-01-24T14:27:13Z

Happened for the first time on node-daily-master today so something seems to have happened in the last 24 hours to cause this to be perma-red (or perma-ish-red) on Pi 1, it would seem.

@nodejs/testing

https://ci.nodejs.org/job/node-test-binary-arm/5673/RUN_SUBSET=3,label=pi1-docker/console

test-requireio_continuationlabs-debian9-armv6l_pi1p-1

00:44:32 not ok 342 sequential/test-performance
00:44:32   ---
00:44:32   duration_ms: 3.785
00:44:32   severity: fail
00:44:32   exitcode: 1
00:44:32   stack: |-
00:44:32     { name: 'node',
00:44:32       entryType: 'node',
00:44:32       startTime: 0,
00:44:32       duration: { around: 2250.0830249786377 },
00:44:32       nodeStart: { around: 0 },
00:44:32       v8Start: { around: 0 },
00:44:32       bootstrapComplete: { around: 2159.4409494400024 },
00:44:32       environment: { around: 0 },
00:44:32       loopStart: -1,
00:44:32       loopExit: -1 }
00:44:32     { name: 'node',
00:44:32       entryType: 'node',
00:44:32       startTime: 0,
00:44:32       duration: { around: 3522.481032371521 },
00:44:32       nodeStart: { around: 0 },
00:44:32       v8Start: { around: 0 },
00:44:32       bootstrapComplete: { around: 2159.4409494400024 },
00:44:32       environment: { around: 0 },
00:44:32       loopStart: { around: 2159.4409494400024 },
00:44:32       loopExit: -1 }
00:44:32     { name: 'node',
00:44:32       entryType: 'node',
00:44:32       startTime: 0,
00:44:32       duration: { around: 3554.94070148468 },
00:44:32       nodeStart: { around: 0 },
00:44:32       v8Start: { around: 0 },
00:44:32       bootstrapComplete: { around: 2159.4409494400024 },
00:44:32       environment: { around: 0 },
00:44:32       loopStart: { around: 2159.4409494400024 },
00:44:32       loopExit: { around: 3555.3136978149414 } }
00:44:32     assert.js:351
00:44:32         throw err;
00:44:32         ^
00:44:32     
00:44:32     AssertionError [ERR_ASSERTION]: bootstrapComplete: 1075.962013244629 >= 1073
00:44:32         at checkNodeTiming (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:92:7)
00:44:32         at Timeout.setTimeout [as _onTimeout] (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:119:5)
00:44:32         at listOnTimeout (timers.js:324:15)
00:44:32         at processTimers (timers.js:268:5)
00:44:32   ...

Failed by just 2ms or so?

Trott · 2019-01-24T14:38:11Z

This gets weird. https://ci.nodejs.org/job/node-test-binary-arm/5610/ was green. Hooray? But looking at the fanned jobs for Pi1, nowhere was sequential/test-performance run. And the PR (#25612) doesn't remove it or anything. The test is still in master and that PR is against master. ?????!!!!!! @nodejs/build

joyeecheung · 2019-01-24T14:40:31Z

I am not entirely sure about what the test is actually doing but I guess it was failing because:

node/test/sequential/test-performance.js

Line 17 in d40ed71

const inited = performance.now();

Is too far away from the bootstrap complete milestone? Then there could be two remedies:

Get this value even earlier, do not get it only after common is required as common is not exactly small
Make the milestone somewhat more accurate: process: split execution entry points into main scripts #25667 handles part of that, but it's hard to say at what point we should consider the bootstrap as complete, unless we take care of every branches involved during the whole bootstrap process and place the milestones at the precise points for every one of the possible conditions.

Trott · 2019-01-24T14:47:23Z

FWIW, the test seems to be passing again, at least for now, on Pi 1. See https://ci.nodejs.org/job/node-test-binary-arm/5677/RUN_SUBSET=3,label=pi1-docker/console for example. Not sure for how long, but I guess it's encouraging that it's not a big block fo red for the moment...

gireeshpunathil · 2019-01-25T07:11:33Z

my take after reviewing the test case is:

though the value in this line

node/test/sequential/test-performance.js

Line 59 in d40ed71

let timeoutDelay = 111; // An extra of 111 ms for the first call.

seems arbitrary to me, as this test has passed over time make it an empirically proven value that represents an acceptable limit to the bootstrap latency.
if that is agreed, then it is worthwhile to see recent commits that increased the boot time.
if that commit (or commits) holds merit on its own, then accept that, and move the baseline to accommodate that by computing a new value for timeoutDelay

gireeshpunathil · 2019-01-25T08:12:21Z

this is what I did:

ran the test in a linux box (test does not fail here).
measured the actual time for it to reach the assertion point.
so for the bootstrapcomplete milestone I got 45 millis as the latency.
reduced the timeoutDelay until it failed.
deemed it as bad, and did biect.
and embarked on d3806f9, beyond which the latency is only 35 millis.

these are small numbers in my fast linux box, and able to accommodate in the very conservative number of 1111 millis in the test. However, in the failing arm I guess the baseline as well as the increase in bootup time due to the said commit could be significantly large I guess.

So:

does the bootup time increase due to d3806f9 justifiable? if not fix it,
if yes, how much it increases in arm, and is that acceptable? if not fix it,
if yes, accommodate this in the test

I guess the test is doing its intented work, and we should only be changing the heuristics, not the test logic itself.

/cc @joyeecheung

joyeecheung · 2019-01-25T08:46:58Z

@gireeshpunathil In theory d3806f9 should’ve decreased the boot time because it delays the console extension until those are actually needed when the inspector console is accessed (and no longer done unconditionally during bootstrap), maybe a benchmark run. of misc/startup would help identifying that.

gireeshpunathil · 2019-01-25T09:24:26Z

@joyeecheung - yes,reading the commit message I also got the same impression - this commit as a change to boost the boot time.

unfortunately I don't know how to run a benchmark test!

Trott · 2019-01-25T13:34:31Z

@gireeshpunathil In theory d3806f9 should’ve decreased the boot time because it delays the console extension until those are actually needed when the inspector console is accessed (and no longer done unconditionally during bootstrap), maybe a benchmark run. of misc/startup would help identifying that.

I suppose it's possible that the increase might be a peculiarity/limitation of perf_hooks rather than a real increase? @jasnell @TimothyGu

It's not clear to me that this test is actually trying to flag bootstrap performance issues as much as it is trying to confirm that perf_hooks module basically works. In that case, the fix might be #25695.

Speaking of perf_hooks, I guess the module should theoretically pass the tests at https://github.com/web-platform-tests/wpt/tree/master/performance-timeline. Those are the tests for the API spec that perf_hooks implements. It may be possible that another solution is to remove this test and replace with those. (I don't actually know.)

Trott · 2019-01-25T13:37:43Z

I guess I also don't feel like using Pi 1 as our performance canary is a great approach. I'm inclined to treat this as a "bare functionality of perf_hooks" test rather than a "watch performance in Node.js" test.

This is supported IMO by the fact that the test was added with the initial implementation of perf_hooks in 67269fd7f33 and not added after implementation as a performance test. We should probably rename it test-perf-hooks or something like that to avoid this confusion. @jasnell

mhdawson · 2019-01-25T20:58:36Z

@Trott I don't know about it passing as its failed twice in attempts for me today.

sam-github · 2019-01-25T22:22:56Z

Fwiw, I also got 2 failures in a row for #25688 as well, and going through the build history, @ofrobots and @danbev have also gotten failures on it today.

PR-URL: nodejs#25695 Fixes: nodejs#23291 Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>

Trott · 2019-01-26T03:03:36Z

Fixed by d1d357d

gireeshpunathil · 2019-01-26T03:13:50Z

ok, now that #25695 is landed and this one closed it makes little difference to raise my opinion; but raising it anyways.

these points:

failing Pi are slow machines with little ram and a slow chip
the test is not supposed to validate performance

would have been considerations at the time when the test was introduced IMO. Now that it showed us something which is indicative of a change in timing in terms of latency in the boot sequences. So the question is:

should we ignore what we saw and amend the test to do what it is designed to test,
or
should we first make an effort to investigate what this failure means and implies?

we have many instances where unrelated tests threw light on serious issues in the code; issues associated in #25007 are latest examples - none of those were meant to catch race conditions; yet those tests were used to reproduce issues, troubleshoot, fix and finally validate the correctness of the fix.

my summary is that the test threw something to us - by design or by chance; amending the code block that suppresses the issue without looking at the root cause of the failure may be like silently ignoring an error.

Trott · 2019-01-26T04:21:46Z

should we ignore what we saw and amend the test to do what it is designed to test,
or
should we first make an effort to investigate what this failure means and implies?

I guess ideally we'd do both. I don't want to leave it broken, causing people to either not be able to land valid changes or else get into the habit of ignoring failing test suites.

But as you are suggesting, I also want to understand why the test started failing so a determination can be made if it is a sign of a larger problem.

I think the issue this brought up is probably more appropriate for a benchmark than a test. (And maybe there already is a benchmark in benchmark/misc/startup.js?) Perhaps we can, at some point, introduce nightly performance benchmarks that will fail if things fall below a certain threshold or if there is a drop in performance larger than a certain percentage. (I'd like to see something like that for coverage stats too.)

Trott · 2019-01-26T04:24:02Z

But as you are suggesting, I also want to understand why the test started failing so a determination can be made if it is a sign of a larger problem.

Perhaps the thing to do is to open a new issue about the slower startup. Because I guess we are talking about two different things anyway. One issue is that a test is failing when it should not be failing. That's this issue. It's about fixing the test. But the second issue, which this one uncovered, is why is startup slower than it used to be? These questions are related, but they can (and probably should?) be considered separately? Like, it's possible (as we've done here) to fix the first issue without successfully addressing the second.

PR-URL: #25695 Fixes: #23291 Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>

Trott added test Issues and PRs related to the tests. performance Issues and PRs related to the performance of Node.js. labels Oct 6, 2018

Trott pinned this issue Jan 24, 2019

MylesBorins mentioned this issue Jan 25, 2019

v11.8.0 proposal #25687

Merged

apapirovski mentioned this issue Jan 25, 2019

test: fix sequential/test-performance delay #25695

Closed

3 tasks

gireeshpunathil mentioned this issue Jan 25, 2019

test: exit sequence sanity tests #25083

Merged

2 tasks

richardlau mentioned this issue Jan 25, 2019

Failure on ARM - sequential/test-performance - possibly flaky? #25704

Closed

sam-github mentioned this issue Jan 25, 2019

[v11.x backport] Update openssl to 1.1.1a #25688

Closed

4 tasks

Trott closed this as completed Jan 26, 2019

Trott unpinned this issue Jan 26, 2019

gireeshpunathil mentioned this issue Jan 26, 2019

Further investigate flaky sequential/test-performance #25719

Closed

ofrobots mentioned this issue Jan 28, 2019

tls: fix malloc mismatch in SSL_set_tlsext_status_ocsp_resp call #25706

Closed

2 tasks

refack added the flaky-test Issues and PRs related to the tests with unstable failures on the CI. label Jan 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate flaky sequential/test-performance #23291

Investigate flaky sequential/test-performance #23291

Trott commented Oct 6, 2018 •

edited

Loading

Trott commented Oct 6, 2018

joyeecheung commented Oct 6, 2018

Trott commented Nov 4, 2018

Trott commented Nov 16, 2018

Fishrock123 commented Jan 23, 2019

richardlau commented Jan 23, 2019 •

edited

Loading

Fishrock123 commented Jan 23, 2019

Fishrock123 commented Jan 24, 2019

Trott commented Jan 24, 2019

Trott commented Jan 24, 2019

Trott commented Jan 24, 2019 •

edited

Loading

joyeecheung commented Jan 24, 2019 •

edited

Loading

Trott commented Jan 24, 2019

gireeshpunathil commented Jan 25, 2019

gireeshpunathil commented Jan 25, 2019 •

edited

Loading

joyeecheung commented Jan 25, 2019 •

edited

Loading

gireeshpunathil commented Jan 25, 2019

Trott commented Jan 25, 2019

Trott commented Jan 25, 2019

mhdawson commented Jan 25, 2019

sam-github commented Jan 25, 2019

Trott commented Jan 26, 2019

gireeshpunathil commented Jan 26, 2019

Trott commented Jan 26, 2019

Trott commented Jan 26, 2019

Investigate flaky sequential/test-performance #23291

Investigate flaky sequential/test-performance #23291

Comments

Trott commented Oct 6, 2018 • edited Loading

Trott commented Oct 6, 2018

joyeecheung commented Oct 6, 2018

Trott commented Nov 4, 2018

Trott commented Nov 16, 2018

Fishrock123 commented Jan 23, 2019

richardlau commented Jan 23, 2019 • edited Loading

Fishrock123 commented Jan 23, 2019

Fishrock123 commented Jan 24, 2019

Trott commented Jan 24, 2019

Trott commented Jan 24, 2019

Trott commented Jan 24, 2019 • edited Loading

joyeecheung commented Jan 24, 2019 • edited Loading

Trott commented Jan 24, 2019

gireeshpunathil commented Jan 25, 2019

gireeshpunathil commented Jan 25, 2019 • edited Loading

joyeecheung commented Jan 25, 2019 • edited Loading

gireeshpunathil commented Jan 25, 2019

Trott commented Jan 25, 2019

Trott commented Jan 25, 2019

mhdawson commented Jan 25, 2019

sam-github commented Jan 25, 2019

Trott commented Jan 26, 2019

gireeshpunathil commented Jan 26, 2019

Trott commented Jan 26, 2019

Trott commented Jan 26, 2019

Trott commented Oct 6, 2018 •

edited

Loading

richardlau commented Jan 23, 2019 •

edited

Loading

Trott commented Jan 24, 2019 •

edited

Loading

joyeecheung commented Jan 24, 2019 •

edited

Loading

gireeshpunathil commented Jan 25, 2019 •

edited

Loading

joyeecheung commented Jan 25, 2019 •

edited

Loading