process.hrtime() unreliable? #13102

Trott · 2017-05-18T16:58:49Z

Version: v8.0.0-pre
Platform: ubuntu1604-32
Subsystem: process

From https://ci.nodejs.org/job/node-test-commit-linux/9926/nodes=ubuntu1604-32/consoleText:

not ok 1431 sequential/test-benchmark-http
  ---
  duration_ms: 5.128
  severity: fail
  stack: |-
    
    http/bench-parser.js
    http/bench-parser.js n=1 len=1: 2,785.3291144881678
    
    http/check_invalid_header_char.js
    http/check_invalid_header_char.js n=1 key="": 2,725.8503290101344
    http/check_invalid_header_char.js n=1 key="1": 3,342.2013074691513
    http/check_invalid_header_char.js n=1 key="\t\t\t\t\t\t\t\t\t\tFoo bar baz": 2,301.1676124465553
    http/check_invalid_header_char.js n=1 key="keep-alive": 1,714.8099561866056
    http/check_invalid_header_char.js n=1 key="close": 1,836.9656266991933
    http/check_invalid_header_char.js n=1 key="gzip": 2,570.79321824749
    http/check_invalid_header_char.js n=1 key="20091": 2,455.1023163890354
    http/check_invalid_header_char.js n=1 key="private": 3,152.505611459989
    http/check_invalid_header_char.js n=1 key="text/html; charset=utf-8": 2,783.9023629763255
    http/check_invalid_header_char.js n=1 key="text/plain": 1,932.2065992584191
    http/check_invalid_header_char.js n=1 key="Sat, 07 May 2016 16:54:48 GMT": 2,396.380506882405
    http/check_invalid_header_char.js n=1 key="SAMEORIGIN": 1,960.0387303653122
    http/check_invalid_header_char.js n=1 key="en-US": 468.65297418893743
    http/check_invalid_header_char.js n=1 key="Here is a value that is really a folded header value\r\n  this should be      supported, but it is not currently": 2,780.4655611535595
    http/check_invalid_header_char.js n=1 key="中文呢": 2,113.7403692704424
    http/check_invalid_header_char.js n=1 key="foo\nbar": 2,560.6424139688165
    http/check_invalid_header_char.js n=1 key="�": 2,448.3100539852367
    
    http/check_is_http_token.js
    http/check_is_http_token.js n=1 key="TCN": 2,627.61677785865
    http/check_is_http_token.js n=1 key="ETag": 2,592.439926685799
    http/check_is_http_token.js n=1 key="date": 3,146.6628067602906
    http/check_is_http_token.js n=1 key="Vary": 2,676.5090827335725
    http/check_is_http_token.js n=1 key="server": 1,933.1665654976161
    http/check_is_http_token.js n=1 key="Server": 2,698.0431093328007
    http/check_is_http_token.js n=1 key="status": 3,234.8536067000286
    http/check_is_http_token.js n=1 key="version": 3,105.6575764069403
    http/check_is_http_token.js n=1 key="Expires": 3,122.765271102867
    http/check_is_http_token.js n=1 key="alt-svc": 2,156.3156328570753
    http/check_is_http_token.js n=1 key="location": 1,954.7093835823962
    /home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604-32/benchmark/common.js:200
        throw new Error('insufficient time precision for short benchmark');
        ^
    
    Error: insufficient time precision for short benchmark
        at Benchmark.end (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604-32/benchmark/common.js:200:11)
        at main (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604-32/benchmark/http/check_is_http_token.js:51:9)
        at Benchmark.process.nextTick (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604-32/benchmark/common.js:34:28)
        at _combinedTickCallback (internal/process/next_tick.js:95:7)
        at process._tickCallback (internal/process/next_tick.js:161:9)
        at Function.Module.runMain (module.js:607:11)
        at startup (bootstrap_node.js:144:16)
        at bootstrap_node.js:561:3
    assert.js:92
      throw new AssertionError({
      ^
    
    AssertionError [ERR_ASSERTION]: 1 === 0
        at ChildProcess.child.on (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604-32/test/sequential/test-benchmark-http.js:32:10)
        at emitTwo (events.js:125:13)
        at ChildProcess.emit (events.js:213:7)
        at Process.ChildProcess._handle.onexit (internal/child_process.js:197:12)
  ...

Relevant code

Quick and easy

test/benchmark/check_is_http_token.js: Pretty straightforward, but worth checking for anything funky.
lib/_http_common.js: The _checkIsHttpToken() function is also straightforward, but worth looking at just in case...

Slightly more involved

benchmark/common.js: Benchmark.prototype.start() and Benchmark.prototype.end() are pretty straightforward and seem unlikely sources of the problem. The throw in the stack trace above is on line 200 (currently). This is the "what in the world is going on?!?!?!" part. process.hrtime() is called in one function and saved to this._time. Then at a later time, process.hrtime(this._time) is called and the result is [0, 0]. ???? (I don't see anything funky going on with this but maybe there's a subtlety I'm missing?)

Deeper dive

lib/internal/process.js: setup_hrtime()
src/node.cc: Hrtime()
deps/uv/src/unix/linux-core.c: uv__hrtime()

Anything at all going on in the above that could somehow result in this odd behavior?

Possible causes

I'm having trouble coming up with an explanation that seems likely, but it's gotta be something, so...

Somehow, this fairly fast machine running Ubuntu 16.04 nonetheless lacks nanosecond precision.
Some subtle bug in the benchmark common.js that causes this._time to be set incorrectly.
Small but infrequently triggered bug in the bit-shifting arithmetic in the Deeper dive items listed above?
Bug in Linux
Expected behavior in Linux unexpected in our benchmark/core code. (Maybe calling clock_gettime() too many times in fast succession might return the same time even though it's supposed to have nanosecond precision in this situation?)
Bug/expected behavior specific to some container/virtual machine magic Digital Ocean is using?

Any other ideas?

/cc @nodejs/benchmarking @mscdex @joyeecheung @jasnell @nodejs/build @addaleax @saghul

Refs: #12934

The text was updated successfully, but these errors were encountered:

evanlucas · 2017-05-18T17:05:16Z

FWIW, I had this test fail for me locally on OS X yesterday...

bnoordhuis · 2017-05-18T17:11:56Z

Expected behavior in Linux unexpected in our benchmark/core code. (Maybe calling clock_gettime() too many times in fast succession might return the same time even though it's supposed to have nanosecond precision in this situation?)

That's probably it. clock_gettime() has a precision of up to one nanosecond but in practice it will be much coarser, possibly no more granular than ~1 millisecond, depending on the clock source.

sam-github · 2017-05-18T17:12:12Z

Just because the units of measurement is nano-seconds doesn't mean that its precise to anything like that. I think it used to be still limited by the kernel tick, 1 or 10 milliseconds IIRC (I quite likely do not). @Trott are you running Linux in a VM? I recall @bnoordhuis saying that VMs were emulating the RDTSC instruction causing process.hrtime() to be slow and unreliable in VMs, any chance you are running into that? EDIT: Speak of the devil... hi Ben!

Trott · 2017-05-18T23:07:36Z

That's probably it. clock_gettime() has a precision of up to one nanosecond but in practice it will be much coarser, possibly no more granular than ~1 millisecond, depending on the clock source.

@bnoordhuis The first process.hrtime() happens in Benchmark.prototype.start() and the second one happens in Benchmark.prototype.end(). Is it realistic that a function invocation followed by a little bit of code and then another function invocation could all happen in a duration such that process.hrtime() reveals no elapsed time? (Not contradicting you; an honest question!)

@Trott are you running Linux in a VM?

@sam-github The failure is from test-digitalocean-ubuntu1604-x86-1 in our CI infrastructure. I assume it's a virtual machine at Digital Ocean, but @jbergstroem or someone on @nodejs/build would be able to say with certainty.

Trott · 2017-05-18T23:10:20Z

If it's both likely and acceptable that running in a Linux VM can result in a reported elapsed time of 0 in this situation, then at least I think I have a clear solution. We already have an "allow 0 operations" option for these tests in the benchmark common module, so I could allow for this situation if that option is in use.

Trott · 2017-05-18T23:11:51Z

FWIW, I had this test fail for me locally on OS X yesterday...

@evanlucas I don't suppose you recall if it was a timeout or if it threw the error shown here or if t was something else, do you? I tried everything I could think of to get my macOS machine to trip this error and gave up.

evanlucas · 2017-05-18T23:15:49Z

Ah no never mind it was a timeout. Sorry for the noise.

Imprecision in process.hrtime() in some situations can result in a zero duration being used as a denominator in benchmark tests. This would almost certainly never happen in real benchmarks. It is only likely in very short benchmarks like the type we run in our test suite to just make sure that the benchmark code is runnable. So, if the environment variable that we use in tests to indicate "allow ludicrously short benchmarks" is set, convert a zero duration for a benchmark to 1 nano-second. Fixes: nodejs#13102

Imprecision in process.hrtime() in some situations can result in a zero duration being used as a denominator in benchmark tests. This would almost certainly never happen in real benchmarks. It is only likely in very short benchmarks like the type we run in our test suite to just make sure that the benchmark code is runnable. So, if the environment variable that we use in tests to indicate "allow ludicrously short benchmarks" is set, convert a zero duration for a benchmark to 1 nano-second. PR-URL: #13110 Fixes: #13102 Fixes: #12433 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>

Trott added test Issues and PRs related to the tests. benchmark Issues and PRs related to the benchmark subsystem. libuv Issues and PRs related to the libuv dependency or the uv binding. process Issues and PRs related to the process subsystem. labels May 18, 2017

Trott mentioned this issue May 18, 2017

benchmark: allow no duration in benchmark tests #13110

Closed

2 tasks

Trott closed this as completed in c3067b5 May 22, 2017

gschier mentioned this issue Dec 18, 2017

[Bug] Inaccurate response time measurement Kong/insomnia#668

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

process.hrtime() unreliable? #13102

process.hrtime() unreliable? #13102

Trott commented May 18, 2017 •

edited

Loading

evanlucas commented May 18, 2017

bnoordhuis commented May 18, 2017

sam-github commented May 18, 2017 •

edited

Loading

Trott commented May 18, 2017

Trott commented May 18, 2017

Trott commented May 18, 2017

evanlucas commented May 18, 2017

process.hrtime() unreliable? #13102

process.hrtime() unreliable? #13102

Comments

Trott commented May 18, 2017 • edited Loading

Relevant code

Quick and easy

Slightly more involved

Deeper dive

Possible causes

evanlucas commented May 18, 2017

bnoordhuis commented May 18, 2017

sam-github commented May 18, 2017 • edited Loading

Trott commented May 18, 2017

Trott commented May 18, 2017

Trott commented May 18, 2017

evanlucas commented May 18, 2017

Trott commented May 18, 2017 •

edited

Loading

sam-github commented May 18, 2017 •

edited

Loading