Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process.hrtime() unreliable? #13102

Closed
Trott opened this issue May 18, 2017 · 7 comments
Closed

process.hrtime() unreliable? #13102

Trott opened this issue May 18, 2017 · 7 comments
Labels
benchmark Issues and PRs related to the benchmark subsystem. libuv Issues and PRs related to the libuv dependency or the uv binding. process Issues and PRs related to the process subsystem. test Issues and PRs related to the tests.

Comments

@Trott
Copy link
Member

Trott commented May 18, 2017

  • Version: v8.0.0-pre
  • Platform: ubuntu1604-32
  • Subsystem: process

From https://ci.nodejs.org/job/node-test-commit-linux/9926/nodes=ubuntu1604-32/consoleText:

not ok 1431 sequential/test-benchmark-http
  ---
  duration_ms: 5.128
  severity: fail
  stack: |-
    
    http/bench-parser.js
    http/bench-parser.js n=1 len=1: 2,785.3291144881678
    
    http/check_invalid_header_char.js
    http/check_invalid_header_char.js n=1 key="": 2,725.8503290101344
    http/check_invalid_header_char.js n=1 key="1": 3,342.2013074691513
    http/check_invalid_header_char.js n=1 key="\t\t\t\t\t\t\t\t\t\tFoo bar baz": 2,301.1676124465553
    http/check_invalid_header_char.js n=1 key="keep-alive": 1,714.8099561866056
    http/check_invalid_header_char.js n=1 key="close": 1,836.9656266991933
    http/check_invalid_header_char.js n=1 key="gzip": 2,570.79321824749
    http/check_invalid_header_char.js n=1 key="20091": 2,455.1023163890354
    http/check_invalid_header_char.js n=1 key="private": 3,152.505611459989
    http/check_invalid_header_char.js n=1 key="text/html; charset=utf-8": 2,783.9023629763255
    http/check_invalid_header_char.js n=1 key="text/plain": 1,932.2065992584191
    http/check_invalid_header_char.js n=1 key="Sat, 07 May 2016 16:54:48 GMT": 2,396.380506882405
    http/check_invalid_header_char.js n=1 key="SAMEORIGIN": 1,960.0387303653122
    http/check_invalid_header_char.js n=1 key="en-US": 468.65297418893743
    http/check_invalid_header_char.js n=1 key="Here is a value that is really a folded header value\r\n  this should be      supported, but it is not currently": 2,780.4655611535595
    http/check_invalid_header_char.js n=1 key="中文呢": 2,113.7403692704424
    http/check_invalid_header_char.js n=1 key="foo\nbar": 2,560.6424139688165
    http/check_invalid_header_char.js n=1 key="�": 2,448.3100539852367
    
    http/check_is_http_token.js
    http/check_is_http_token.js n=1 key="TCN": 2,627.61677785865
    http/check_is_http_token.js n=1 key="ETag": 2,592.439926685799
    http/check_is_http_token.js n=1 key="date": 3,146.6628067602906
    http/check_is_http_token.js n=1 key="Vary": 2,676.5090827335725
    http/check_is_http_token.js n=1 key="server": 1,933.1665654976161
    http/check_is_http_token.js n=1 key="Server": 2,698.0431093328007
    http/check_is_http_token.js n=1 key="status": 3,234.8536067000286
    http/check_is_http_token.js n=1 key="version": 3,105.6575764069403
    http/check_is_http_token.js n=1 key="Expires": 3,122.765271102867
    http/check_is_http_token.js n=1 key="alt-svc": 2,156.3156328570753
    http/check_is_http_token.js n=1 key="location": 1,954.7093835823962
    /home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604-32/benchmark/common.js:200
        throw new Error('insufficient time precision for short benchmark');
        ^
    
    Error: insufficient time precision for short benchmark
        at Benchmark.end (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604-32/benchmark/common.js:200:11)
        at main (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604-32/benchmark/http/check_is_http_token.js:51:9)
        at Benchmark.process.nextTick (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604-32/benchmark/common.js:34:28)
        at _combinedTickCallback (internal/process/next_tick.js:95:7)
        at process._tickCallback (internal/process/next_tick.js:161:9)
        at Function.Module.runMain (module.js:607:11)
        at startup (bootstrap_node.js:144:16)
        at bootstrap_node.js:561:3
    assert.js:92
      throw new AssertionError({
      ^
    
    AssertionError [ERR_ASSERTION]: 1 === 0
        at ChildProcess.child.on (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604-32/test/sequential/test-benchmark-http.js:32:10)
        at emitTwo (events.js:125:13)
        at ChildProcess.emit (events.js:213:7)
        at Process.ChildProcess._handle.onexit (internal/child_process.js:197:12)
  ...

Relevant code

Quick and easy

  • test/benchmark/check_is_http_token.js: Pretty straightforward, but worth checking for anything funky.
  • lib/_http_common.js: The _checkIsHttpToken() function is also straightforward, but worth looking at just in case...

Slightly more involved

  • benchmark/common.js: Benchmark.prototype.start() and Benchmark.prototype.end() are pretty straightforward and seem unlikely sources of the problem. The throw in the stack trace above is on line 200 (currently). This is the "what in the world is going on?!?!?!" part. process.hrtime() is called in one function and saved to this._time. Then at a later time, process.hrtime(this._time) is called and the result is [0, 0]. ???? (I don't see anything funky going on with this but maybe there's a subtlety I'm missing?)

Deeper dive

  • lib/internal/process.js: setup_hrtime()
  • src/node.cc: Hrtime()
  • deps/uv/src/unix/linux-core.c: uv__hrtime()

Anything at all going on in the above that could somehow result in this odd behavior?

Possible causes

I'm having trouble coming up with an explanation that seems likely, but it's gotta be something, so...

  • Somehow, this fairly fast machine running Ubuntu 16.04 nonetheless lacks nanosecond precision.
  • Some subtle bug in the benchmark common.js that causes this._time to be set incorrectly.
  • Small but infrequently triggered bug in the bit-shifting arithmetic in the Deeper dive items listed above?
  • Bug in Linux
  • Expected behavior in Linux unexpected in our benchmark/core code. (Maybe calling clock_gettime() too many times in fast succession might return the same time even though it's supposed to have nanosecond precision in this situation?)
  • Bug/expected behavior specific to some container/virtual machine magic Digital Ocean is using?

Any other ideas?

/cc @nodejs/benchmarking @mscdex @joyeecheung @jasnell @nodejs/build @addaleax @saghul

Refs: #12934

@Trott Trott added test Issues and PRs related to the tests. benchmark Issues and PRs related to the benchmark subsystem. libuv Issues and PRs related to the libuv dependency or the uv binding. process Issues and PRs related to the process subsystem. labels May 18, 2017
@evanlucas
Copy link
Contributor

FWIW, I had this test fail for me locally on OS X yesterday...

@bnoordhuis
Copy link
Member

Expected behavior in Linux unexpected in our benchmark/core code. (Maybe calling clock_gettime() too many times in fast succession might return the same time even though it's supposed to have nanosecond precision in this situation?)

That's probably it. clock_gettime() has a precision of up to one nanosecond but in practice it will be much coarser, possibly no more granular than ~1 millisecond, depending on the clock source.

@sam-github
Copy link
Contributor

sam-github commented May 18, 2017

Just because the units of measurement is nano-seconds doesn't mean that its precise to anything like that. I think it used to be still limited by the kernel tick, 1 or 10 milliseconds IIRC (I quite likely do not). @Trott are you running Linux in a VM? I recall @bnoordhuis saying that VMs were emulating the RDTSC instruction causing process.hrtime() to be slow and unreliable in VMs, any chance you are running into that? EDIT: Speak of the devil... hi Ben!

@Trott
Copy link
Member Author

Trott commented May 18, 2017

That's probably it. clock_gettime() has a precision of up to one nanosecond but in practice it will be much coarser, possibly no more granular than ~1 millisecond, depending on the clock source.

@bnoordhuis The first process.hrtime() happens in Benchmark.prototype.start() and the second one happens in Benchmark.prototype.end(). Is it realistic that a function invocation followed by a little bit of code and then another function invocation could all happen in a duration such that process.hrtime() reveals no elapsed time? (Not contradicting you; an honest question!)

@Trott are you running Linux in a VM?

@sam-github The failure is from test-digitalocean-ubuntu1604-x86-1 in our CI infrastructure. I assume it's a virtual machine at Digital Ocean, but @jbergstroem or someone on @nodejs/build would be able to say with certainty.

@Trott
Copy link
Member Author

Trott commented May 18, 2017

If it's both likely and acceptable that running in a Linux VM can result in a reported elapsed time of 0 in this situation, then at least I think I have a clear solution. We already have an "allow 0 operations" option for these tests in the benchmark common module, so I could allow for this situation if that option is in use.

@Trott
Copy link
Member Author

Trott commented May 18, 2017

FWIW, I had this test fail for me locally on OS X yesterday...

@evanlucas I don't suppose you recall if it was a timeout or if it threw the error shown here or if t was something else, do you? I tried everything I could think of to get my macOS machine to trip this error and gave up.

@evanlucas
Copy link
Contributor

Ah no never mind it was a timeout. Sorry for the noise.

Trott added a commit to Trott/io.js that referenced this issue May 19, 2017
Imprecision in process.hrtime() in some situations can result in a zero
duration being used as a denominator in benchmark tests. This would
almost certainly never happen in real benchmarks. It is only likely in
very short benchmarks like the type we run in our test suite to just
make sure that the benchmark code is runnable.

So, if the environment variable that we use in tests to indicate "allow
ludicrously short benchmarks" is set, convert a zero duration for
a benchmark to 1 nano-second.

Fixes: nodejs#13102
@Trott Trott closed this as completed in c3067b5 May 22, 2017
jasnell pushed a commit that referenced this issue May 23, 2017
Imprecision in process.hrtime() in some situations can result in a zero
duration being used as a denominator in benchmark tests. This would
almost certainly never happen in real benchmarks. It is only likely in
very short benchmarks like the type we run in our test suite to just
make sure that the benchmark code is runnable.

So, if the environment variable that we use in tests to indicate "allow
ludicrously short benchmarks" is set, convert a zero duration for
a benchmark to 1 nano-second.

PR-URL: #13110
Fixes: #13102
Fixes: #12433
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
jasnell pushed a commit that referenced this issue May 23, 2017
Imprecision in process.hrtime() in some situations can result in a zero
duration being used as a denominator in benchmark tests. This would
almost certainly never happen in real benchmarks. It is only likely in
very short benchmarks like the type we run in our test suite to just
make sure that the benchmark code is runnable.

So, if the environment variable that we use in tests to indicate "allow
ludicrously short benchmarks" is set, convert a zero duration for
a benchmark to 1 nano-second.

PR-URL: #13110
Fixes: #13102
Fixes: #12433
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark Issues and PRs related to the benchmark subsystem. libuv Issues and PRs related to the libuv dependency or the uv binding. process Issues and PRs related to the process subsystem. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants