Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf_hooks: add event loop delay sampler #25378

Closed
wants to merge 2 commits into
base: master
from

Conversation

@jasnell
Copy link
Member

jasnell commented Jan 7, 2019

Inspired by (and code borrowed from) https://github.com/mafintosh/event-loop-delay

/cc @mafintosh @mcollina

Adds a simple event loop delay sampler to perf_hooks.

See included test for example use.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
@jasnell

This comment has been minimized.

Copy link
Member Author

jasnell commented Jan 7, 2019

Show resolved Hide resolved doc/api/perf_hooks.md Outdated
Show resolved Hide resolved doc/api/perf_hooks.md Outdated

@Trott Trott added semver-minor and removed semver-minor labels Jan 7, 2019

Show resolved Hide resolved lib/perf_hooks.js Outdated
Show resolved Hide resolved lib/perf_hooks.js Outdated
Show resolved Hide resolved src/node_perf.cc Outdated
Show resolved Hide resolved src/node_perf.cc Outdated
Show resolved Hide resolved src/node_perf.cc Outdated
Show resolved Hide resolved src/node_perf.cc Outdated
Show resolved Hide resolved src/node_perf.cc Outdated
Show resolved Hide resolved src/node_perf.h Outdated
@mcollina
Copy link
Member

mcollina left a comment

LGTM

@addaleax
Copy link
Member

addaleax left a comment

(just requesting changes to make sure this doesn’t land as-is)

Show resolved Hide resolved doc/api/perf_hooks.md Outdated
@sam-github

This comment has been minimized.

Copy link
Member

sam-github commented Jan 8, 2019

I could use some convincing that this time-based sampling approach is useful, or that I'm missing the point of the code.

Loop delays tend to be irregular, and to only block a single "turn" of the loop, don't they? Won't this approach miss those delays?

It seems to be much more useful to measure every loop, and track max and min (and possibly average) over some time interval.

Show resolved Hide resolved src/node_perf.cc Outdated
Show resolved Hide resolved src/node_perf.cc Outdated
@jasnell

This comment has been minimized.

Copy link
Member Author

jasnell commented Jan 8, 2019

@sam-github ... @mafintosh can likely better describe the rationale behind this particular algorithm (given that he's the one that wrote it :-) ...) ... For the most part, however, this is giving the accumulated delay if the length of time between iterations exceeds the threshold.

@jasnell

This comment has been minimized.

Copy link
Member Author

jasnell commented Jan 8, 2019

I've added some documentation. Essentially, this measures the accumulated delay if the previous turn of the event loop takes longer than a threshold determined by the resolution.

@sam-github ... what you're suggesting is a different way of measuring that we can also do relatively easily.

Show resolved Hide resolved doc/api/perf_hooks.md Outdated
Show resolved Hide resolved doc/api/perf_hooks.md Outdated
@jasnell

This comment has been minimized.

Copy link
Member Author

jasnell commented Jan 8, 2019

Attempted to clarify a different way

@jasnell jasnell force-pushed the jasnell:event-loop-delay branch from 597b295 to 4e8dacd Jan 9, 2019

@jasnell

This comment has been minimized.

Copy link
Member Author

jasnell commented Jan 9, 2019

Show resolved Hide resolved doc/api/perf_hooks.md Outdated
Show resolved Hide resolved doc/api/perf_hooks.md Outdated
Show resolved Hide resolved doc/api/perf_hooks.md Outdated

The accumulated event loop delay (`delay`) is determined by:

* Calculating a `threshold` in milliseconds as `5e6 + (resolution * 1e6)`.

This comment has been minimized.

@sam-github

sam-github Jan 9, 2019

Member

If my unit conversions are right, isn't this an obscure way of saying the threshold is 1.5 times the resolution?

This comment has been minimized.

@jasnell

jasnell Jan 30, 2019

Author Member

Yep

@sam-github

This comment has been minimized.

Copy link
Member

sam-github commented Jan 9, 2019

Essentially, this measures the accumulated delay

Does it, though? The docs say it samples at resolution milliseconds, so it only measures periodically, making it easy to miss, for example, occaisonal calls to sync APIs.

Its not at all clear to me why the threshold is scaled from the sampling period, or why the sampling period is called resolution.

@mafintosh -- comments?

@jasnell

This comment has been minimized.

Copy link
Member Author

jasnell commented Jan 9, 2019

Does it, though? The docs say it samples at resolution milliseconds, so it only measures periodically, making it easy to miss, for example, occaisonal calls to sync APIs.

Well, yes, using a sampling approach means the data will miss some things but it also allows it to work with significantly less overhead. We certainly could also do the min/max/average calculation on every event loop tick but at a definitely higher overhead cost.

If you think that it's worthwhile, I can implement both approaches configurable via the options.

// Sampling-method
performance.startEventLoopTimer({ resolution: 10 })

// Per-event-tick method
performance.startEventLoopTimer({ resolution: 0 })

We can collect the min, max, rolling average in addition to the accumulated delay.

@addaleax

This comment has been minimized.

Copy link
Member

addaleax commented Jan 28, 2019

@nodejs/diagnostics Any thoughts on this?

@devsnek

This comment has been minimized.

Copy link
Member

devsnek commented Jan 28, 2019

why is this on the performance object instead of the perf_hooks export?

@gireeshpunathil

This comment has been minimized.

Copy link
Member

gireeshpunathil commented Jan 29, 2019

I guess the most common production workloads will only be interested to see a pattern (average delay over time), its steady state values, and outliers if any; not necessarily a high precision data that is accurate at the loop iteration level. So sampling method that strikes a balance between useful information and performance efficiency looks good to me.

On the contrary, having so many options to configure may lead to:

  • lot more usability issues, mis-configurations and questions
  • unwanted comparison at micro level only to spend time on debugging false positives

In short, I am good with the changes as is.

@jasnell

This comment has been minimized.

Copy link
Member Author

jasnell commented Feb 6, 2019

Build failed on AIX and Windows for different reasons: CI: https://ci.nodejs.org/job/node-test-pull-request/20628/

@jasnell

This comment has been minimized.

Copy link
Member Author

jasnell commented Feb 6, 2019

@Flarna ... here are the results I get with your test:

james@dev:~/node/node$ ./node ~/tmp/t.js
start idle
h.min=10.07616ms
h.max=16.105471ms
h.mean=10.146332405953991ms
h.stddev=0.11367857455522092ms
james@dev:~/node/node$ ./node ~/tmp/t.js
start idle
h.min=10.067968ms
h.max=10.944511ms
h.mean=10.145753201623815ms
h.stddev=0.03507365367821823ms
james@dev:~/node/node$ ./node ~/tmp/t.js 1
start 1
h.min=9.05216ms
h.max=17.530879ms
h.mean=10.016954057448231ms
h.stddev=0.2987147925425346ms
james@dev:~/node/node$ ./node ~/tmp/t.js 1
start 1
h.min=9.003008ms
h.max=62.324735ms
h.mean=10.29134337405628ms
h.stddev=1.6001447495641974ms
james@dev:~/node/node$ ./node ~/tmp/t.js 10
start 10
h.min=9.035776ms
h.max=20.316159ms
h.mean=10.031161535785953ms
h.stddev=0.4252783480022525ms
james@dev:~/node/node$ ./node ~/tmp/t.js 10
start 10
h.min=9.043968ms
h.max=18.481151ms
h.mean=10.072717543836076ms
h.stddev=0.6447975217412186ms
james@dev:~/node/node$ ./node ~/tmp/t.js 100
start 100
h.min=9.13408ms
h.max=15.114239ms
h.mean=10.001436340113372ms
h.stddev=0.11477844280713213ms
james@dev:~/node/node$ ./node ~/tmp/t.js 100
start 100
h.min=6.115328ms
h.max=18.104319ms
h.mean=10.018922060808553ms
h.stddev=0.35788105433811207ms
james@dev:~/node/node$ ./node ~/tmp/t.js 1000
start 1000
h.min=9.076736ms
h.max=18.497535ms
h.mean=10.023860705882354ms
h.stddev=0.39576909826986406ms
james@dev:~/node/node$

@jasnell jasnell force-pushed the jasnell:event-loop-delay branch from 8ace147 to c1a2587 Feb 7, 2019

@jasnell

This comment has been minimized.

Copy link
Member Author

jasnell commented Feb 7, 2019

@nodejs/platform-windows @refack ...

Getting the following failure building on Windows (vs2017-x86)

20:37:39      Creating library c:\workspace\node-compile-windows\Release\node.lib and object c:\workspace\node-compile-windows\Release\node.exp
20:37:39 histogram.lib(hdr_histogram.obj) : error LNK2001: unresolved external symbol __BitScanReverse64 [c:\workspace\node-compile-windows\node.vcxproj]
20:37:39 c:\workspace\node-compile-windows\Release\node.exe : fatal error LNK1120: 1 unresolved externals [c:\workspace\node-compile-windows\node.vcxproj]

Bit of poking didn't reveal anything specific, but building on Windows is not my strong suit. Any ideas? (I was under the impression that the BitScanReverse64 is available on x86 but... possibly not?)

Build is successful on win-vs2017

@richardlau

This comment has been minimized.

Copy link
Member

richardlau commented Feb 7, 2019

@nodejs/platform-windows @refack ...

Getting the following failure building on Windows (vs2017-x86)

20:37:39      Creating library c:\workspace\node-compile-windows\Release\node.lib and object c:\workspace\node-compile-windows\Release\node.exp
20:37:39 histogram.lib(hdr_histogram.obj) : error LNK2001: unresolved external symbol __BitScanReverse64 [c:\workspace\node-compile-windows\node.vcxproj]
20:37:39 c:\workspace\node-compile-windows\Release\node.exe : fatal error LNK1120: 1 unresolved externals [c:\workspace\node-compile-windows\node.vcxproj]

Bit of poking didn't reveal anything specific, but building on Windows is not my strong suit. Any ideas?

Build is successful on win-vs2017

https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanreverse-bitscanreverse64?view=vs-2017 suggests _BitScanReverse64 is only supported on ARM and x64.

jasnell added some commits Jan 7, 2019

perf_hooks: implement histogram based api
Add a sampling-based event loop delay monitor.

```js
const { monitorEventLoopDelay } = require('perf_hooks');

const h = monitorEventLoopDelay();

h.enable();

h.disable();

console.log(h.percentiles);
console.log(h.min);
console.log(h.max);
console.log(h.mean);
console.log(h.stddev);
console.log(h.percentile(50));
```

@jasnell jasnell force-pushed the jasnell:event-loop-delay branch from b281780 to 386f06b Feb 7, 2019

@jasnell

This comment has been minimized.

Copy link
Member Author

jasnell commented Feb 7, 2019

Turns out there's a patch @mcollina didn't tell me about ;-D and since he's hanging out in Hawaii right now I gotta give him a hard time about it :-)

New CI: https://ci.nodejs.org/job/node-test-pull-request/20631/

@Flarna

This comment has been minimized.

Copy link
Member

Flarna commented Feb 7, 2019

@jasnell I wonder a little bit about your test results as it seems that 1000ms tasks in the event loop would be not detected as that. Most likely caused by my test script which didn't allow to set busy time on command line, just "test modes". Updated script and retested also in other environments.

If event loop is busy with tasks longer then the 10ms sample interval it's stable detected in all setups. But measurement of an idle node.js app results in worse results compared to a node.js app busy with 10ms tasks even on the physical linux box.

Which size of "detectable delays" had you in mind here? Maybe the docs should include some hints on the reachable resolution?
I think some testing on other platforms should be also done to check if there are other limitations.
Or is the sample I use to simple and results in misleading interpretations from my side?

The VM on my private notebook is for sure not a relevant production setup (maybe delays are caused by CPU sleep states,...) so I think these results are not that important.

Linux VM on my old private notebook

start NaN
h.min=10.027008ms
h.max=161.349631ms
h.mean=13.187902367634125ms
h.stddev=5.629075695044025ms
start NaN
h.min=10.018816ms
h.max=32.292863ms
h.mean=11.474917361897475ms
h.stddev=2.308021168909449ms
start 0
h.min=9.232384ms
h.max=56.688639ms
h.mean=10.035087593038822ms
h.stddev=0.9816034647940286ms
start 0
h.min=9.33888ms
h.max=15.638527ms
h.mean=10.002511242161441ms
h.stddev=0.1461050645134329ms
start 10
h.min=10.010624ms
h.max=31.686655ms
h.mean=10.068520733131923ms
h.stddev=0.6172102376239719ms
start 10
h.min=10.002432ms
h.max=25.968639ms
h.mean=10.043363193569993ms
h.stddev=0.42167699874933723ms
start 100
h.min=100.007936ms
h.max=111.935487ms
h.mean=100.13551274666666ms
h.stddev=0.9452201842299545ms
start 100
h.min=100.007936ms
h.max=112.394239ms
h.mean=100.11191978666666ms
h.stddev=0.8513607073454165ms

Linux VM on my work PC

start NaN
h.min=10.010624ms
h.max=18.563071ms
h.mean=11.065823704905938ms
h.stddev=0.7107936502571325ms
start NaN
h.min=10.018816ms
h.max=15.474687ms
h.mean=11.278423034223392ms
h.stddev=0.759150300218026ms
start 0
h.min=9.0112ms
h.max=19.365887ms
h.mean=10.017919914495659ms
h.stddev=0.36377668699754595ms
start 0
h.min=9.125888ms
h.max=20.742143ms
h.mean=10.02488607486631ms
h.stddev=0.4253760072540978ms
start 10
h.min=10.010624ms
h.max=18.874367ms
h.mean=10.07886864516129ms
h.stddev=0.4418468141594953ms
start 10
h.min=10.010624ms
h.max=21.381119ms
h.mean=10.092213232839837ms
h.stddev=0.5521641804813799ms
start 100
h.min=100.007936ms
h.max=104.660991ms
h.mean=100.09597269333332ms
h.stddev=0.43897114293681383ms
start 100
h.min=100.007936ms
h.max=111.083519ms
h.mean=100.12764842666667ms
h.stddev=0.7444034454799928ms

Native Linux on a collegues PC

start NaN
h.min=10.027008ms
h.max=13.008895ms
h.mean=10.126542735989196ms
h.stddev=0.07465781045266873ms
start NaN
h.min=10.0352ms
h.max=19.038207ms
h.mean=10.136660093274756ms
h.stddev=0.17323677977705812ms
start 0
h.min=9.289728ms
h.max=10.092543ms
h.mean=10.000237177725909ms
h.stddev=0.015177624171784065ms
start 0
h.min=9.29792ms
h.max=10.166271ms
h.mean=9.999846562187397ms
h.stddev=0.016481384728421315ms
start 10
h.min=10.002432ms
h.max=12.623871ms
h.mean=10.012928427378965ms
h.stddev=0.08814869597128885ms
start 10
h.min=10.002432ms
h.max=12.967935ms
h.mean=10.012034905206942ms
h.stddev=0.08011965038652569ms
start 100
h.min=100.007936ms
h.max=103.350271ms
h.mean=100.05949098666666ms
h.stddev=0.228135157621938ms
start 100
h.min=100.007936ms
h.max=102.760447ms
h.mean=100.05796181333334ms
h.stddev=0.20355572563408403ms

Updated Script

const { monitorEventLoopDelay } = require('perf_hooks');

let delay = +process.argv[2];
const duration = 1000 * 30;
const ns2ms = 1000 * 1000;

function busy() {
  if (delay > 0) {
    const now = process.hrtime();
    let d;
    do {
      d = process.hrtime(now);
    } while (d[1] / ns2ms < delay);
  }
  setImmediate(busy);
}

const h = monitorEventLoopDelay({ resolution: 10 });

function endMeasurement() {
  h.disable();
  console.log(`h.min=${h.min / ns2ms}ms`);
  console.log(`h.max=${h.max / ns2ms}ms`);
  console.log(`h.mean=${h.mean / ns2ms}ms`);
  console.log(`h.stddev=${h.stddev / ns2ms}ms`);
  process.exit();
}

function start() {
  console.log(`start ${delay}`);
  h.enable();
  if (delay >= 0) {
    busy();
  }
}

start();
setTimeout(endMeasurement, duration);

@jasnell

This comment has been minimized.

Copy link
Member Author

jasnell commented Feb 7, 2019

Resume CI due to unrelated failure: https://ci.nodejs.org/job/node-test-pull-request/20637/

@jasnell

This comment has been minimized.

Copy link
Member Author

jasnell commented Feb 7, 2019

@nodejs/build ... any ideas here?

10:26:58 Makefile:449: recipe for target 'clear-stalled' failed
10:26:58 make[1]: *** [clear-stalled] Error 123
10:26:58 Makefile:533: recipe for target 'run-ci' failed
10:26:58 make: *** [run-ci] Error 2
10:26:59 Build step 'Conditional steps (multiple)' marked build as failure
10:26:59 Performing Post build task...
10:26:59 Match found for : : True
10:26:59 Logical operation result is TRUE
10:26:59 Running script  : #/bin/bash

https://ci.nodejs.org/job/node-test-commit-arm/21996/nodes=ubuntu1604-arm64/console

Resolved

@sam-github

This comment has been minimized.

Copy link
Member

sam-github commented Feb 7, 2019

Like @mcollina, #25378 (comment), I agree that loop health is critical for production monitoring.

@jasnell There hasn't been any attempt at convincing me that sampling has been seen effective in the wild, see #25378 (comment) (no comment from mafintosh), and this question is open, #25378 (comment)

I've found @bnoordhuis 's approach of measuring every loop, but only recording min/max/avg to be low overhead (its a timer/check, a comparison, and some math per loop, the overhead is dwarfed by the actual code that runs per loop, and most loops are not going to exceed the min/max anyhow), and very useful.

But, if I'm such a big fan of an alternative approach, I should submit my own code to do it :-). Its you who are doing the work here, and any loop health stats are better than none. This approach doesn't prevent someone trying another approach in the future, so I've no objection.

@addaleax

This comment has been minimized.

Copy link
Member

addaleax commented Feb 8, 2019

@jasnell jasnell added the semver-minor label Feb 8, 2019

jasnell added a commit that referenced this pull request Feb 8, 2019

perf_hooks: implement histogram based api
Add a sampling-based event loop delay monitor.

```js
const { monitorEventLoopDelay } = require('perf_hooks');

const h = monitorEventLoopDelay();

h.enable();

h.disable();

console.log(h.percentiles);
console.log(h.min);
console.log(h.max);
console.log(h.mean);
console.log(h.stddev);
console.log(h.percentile(50));
```

PR-URL: #25378
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>

jasnell added a commit that referenced this pull request Feb 8, 2019

deps: float fix for building HdrHistogram on Win x86
From:
mcollina/native-hdr-histogram@c63e971:

PR-URL: #25378
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@jasnell

This comment has been minimized.

Copy link
Member Author

jasnell commented Feb 8, 2019

Landed in bcdd228 and d999b55

@jasnell jasnell closed this Feb 8, 2019

addaleax added a commit that referenced this pull request Feb 8, 2019

perf_hooks: implement histogram based api
Add a sampling-based event loop delay monitor.

```js
const { monitorEventLoopDelay } = require('perf_hooks');

const h = monitorEventLoopDelay();

h.enable();

h.disable();

console.log(h.percentiles);
console.log(h.min);
console.log(h.max);
console.log(h.mean);
console.log(h.stddev);
console.log(h.percentile(50));
```

PR-URL: #25378
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>

addaleax added a commit that referenced this pull request Feb 8, 2019

deps: float fix for building HdrHistogram on Win x86
From:
mcollina/native-hdr-histogram@c63e971:

PR-URL: #25378
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>

@targos targos referenced this pull request Feb 14, 2019

Merged

v11.10.0 proposal #26098

@freenice12 freenice12 referenced this pull request Feb 20, 2019

Merged

translate v11.10.0 #808

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.