New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timers: run nextTicks after each immediate and timer #22842

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
@apapirovski
Member

apapirovski commented Sep 13, 2018

In order to better match browser behaviour, run nextTicks (and subsequently the microtask queue) after each individual Timer and Immediate, rather than after the whole list is processed. The current behaviour is somewhat of a performance micro-optimization and also partly dictated by how timer handles were implemented.

This will have a significant impact on how promises are resolved/rejected (due to changes to the microtask queue timing) which is IMO a reasonably important change.

@Fishrock123 already started some work on this in #22316 but I spun this off into a separate PR as I had good understanding of what needed to happen to actually pass the tests.

I still likely need to update the docs and write a few tests. This is also definitely semver-major. This will also need a CitGM run.

Fixes: #22257

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
@apapirovski

This comment has been minimized.

Member

apapirovski commented Sep 13, 2018

Ok, looks like test/parallel/test-domain-with-abort-on-uncaught-exception.js might be made flaky by this change. I'll have to dig into that test. Doesn't fail for me locally and it succeeded in the full CI hence the flaky assumption.

Stress test: https://ci.nodejs.org/job/node-stress-single-test/2014/ and against master: https://ci.nodejs.org/job/node-stress-single-test/2015/

Edit: Also parallel/test-child-process-ipc-next-tick might be made flaky by this. I'm guessing that these are both related to the change in timers.

@mscdex

This comment has been minimized.

Contributor

mscdex commented Sep 13, 2018

Can we get some benchmarks to show just what kind of performance hit we should expect from this?

@apapirovski

This comment has been minimized.

Member

apapirovski commented Sep 13, 2018

@mscdex Yup, I'm hoping to first figure out if we can get all the tests to pass and CitGM to work before I go down the road of trying to figure out performance and maybe optimize that.

@joaocgreis

This comment has been minimized.

Member

joaocgreis commented Sep 13, 2018

@apapirovski had to abort the CitGM run to restart Jenkins, two workers didn't finish. Can you restart it if needed? I started the master run after rebooting, link above is correct.

@@ -264,7 +264,7 @@ function listOnTimeout(list, now) {
debug('timeout callback %d', msecs);
var diff, timer;
var diff, ran, timer;

This comment has been minimized.

@addaleax

addaleax Sep 13, 2018

Member

ran what? :)

This comment has been minimized.

@apapirovski

apapirovski Sep 14, 2018

Member

I kno, I kno 😆 I'll make it more descriptive. But to answer the question: an iteration of the loop right after it.

This comment has been minimized.

@Fishrock123

Fishrock123 Sep 17, 2018

Member

= false?

@apapirovski

This comment has been minimized.

Member

apapirovski commented Sep 14, 2018

There are currently 3 test failures. They are:

  1. test/sequential/test-http-keepalive-maxsockets.js (see #16777)
    This is a flaky test that was moved to sequential. I'll dig in but I assume this test has a weird interaction with when nextTicks are executed. I think this test is flaky and needs fixing, rather than it highlighting anything wrong with this PR.

  2. test/parallel/test-child-process-ipc-next-tick.js
    This one might be showcasing a real bug. I will need to confirm.

  3. test/parallel/test-domain-with-abort-on-uncaught-exception.js
    I don't even know where to start but again, seems like potentially a real bug.

@apapirovski

This comment has been minimized.

Member

apapirovski commented Sep 14, 2018

CI: https://ci.nodejs.org/job/node-test-pull-request/17179/
CitGM: https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/1544/

I believe that all 3 test failures above were related to incorrect error resolution inside processImmediates. I had to make some changes for the way the new nextTicks are run.

@starkwang

This comment has been minimized.

Contributor

starkwang commented Sep 14, 2018

Should we add some documents about this behavior change?

@apapirovski

This comment has been minimized.

Member

apapirovski commented Sep 14, 2018

I think we're in the clear on tests now so here's benchmarks:

https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/238/

I will be working on new tests for this behaviour and documentation later today or tomorrow. I'll also follow up with perf improvements as necessary.

Results:

 timers/immediate.js type='breadth1' n=5000000                         *     -3.25 %       ±2.61%  ±3.44%  ±4.41%
 timers/immediate.js type='breadth4' n=5000000                        **     -2.74 %       ±1.74%  ±2.30%  ±2.95%
 timers/immediate.js type='breadth' n=5000000                        ***     -7.91 %       ±2.09%  ±2.75%  ±3.53%
 timers/immediate.js type='clear' n=5000000                           **     -2.48 %       ±1.64%  ±2.17%  ±2.78%
 timers/immediate.js type='depth1' n=5000000                         ***     -3.80 %       ±1.39%  ±1.84%  ±2.36%
 timers/immediate.js type='depth' n=5000000                            *     -1.19 %       ±1.18%  ±1.55%  ±1.99%
 timers/set-immediate-breadth-args.js n=5000000                      ***     -5.80 %       ±1.83%  ±2.41%  ±3.09%
 timers/set-immediate-breadth.js n=10000000                          ***     -6.23 %       ±1.88%  ±2.48%  ±3.18%
 timers/set-immediate-depth-args.js n=5000000                        ***     -2.43 %       ±0.90%  ±1.19%  ±1.53%
 timers/timers-breadth.js n=5000000                                  ***     -7.07 %       ±3.33%  ±4.39%  ±5.64%
 timers/timers-cancel-pooled.js n=5000000                                    -5.77 %       ±7.09%  ±9.35% ±12.01%
 timers/timers-cancel-unpooled.js direction='end' n=1000000                   9.24 %      ±10.52% ±13.87% ±17.81%
 timers/timers-cancel-unpooled.js direction='start' n=1000000                 2.14 %       ±5.43%  ±7.16%  ±9.19%
 timers/timers-depth.js n=1000                                         *     -0.06 %       ±0.06%  ±0.08%  ±0.10%
 timers/timers-insert-pooled.js n=5000000                                    -1.71 %       ±2.13%  ±2.81%  ±3.61%
 timers/timers-insert-unpooled.js direction='end' n=1000000                  -0.99 %       ±1.39%  ±1.84%  ±2.36%
 timers/timers-insert-unpooled.js direction='start' n=1000000                -2.18 %       ±2.78%  ±3.66%  ±4.70%
 timers/timers-timeout-pooled.js n=10000000                          ***    -38.68 %       ±2.94%  ±3.88%  ±4.99%
 timers/timers-timeout-unpooled.js n=1000000                         ***    -37.78 %       ±2.35%  ±3.11%  ±3.99%

Given the perf improvements here #20555, I think we can live with the timers-timeout regressions but I'll be doing some work to minimize it.

let refCount = 0;
if (queue !== outstandingQueue) {
queue.head = queue.tail = null;
immediateInfo[kHasOutstanding] = 1;

This comment has been minimized.

@Fishrock123

Fishrock123 Sep 17, 2018

Member

Maybe we could leave a comment here about how this = null thing works? it's... pretty difficult to read/understand.

This comment has been minimized.

@devsnek

devsnek Sep 17, 2018

Member

we should disallow that in the lint rules. please split it into two.

if (ran)
runNextTicks();
else
ran = true;

This comment has been minimized.

@Fishrock123

Fishrock123 Sep 17, 2018

Member

Ditto on comments here and above where we use the same pattern, easy to misunderstand

@Fishrock123

Seems fine to me, but it certainly confusing, implementation-wise.

We should probably CITGM this, it seems slightly ecosystem-risky to me? Don't think it's necessarily major by default though.

Edit: Re - benchmarks; maybe we can do some extra optimization there. I'll try to take profiles.

@Fishrock123

This comment has been minimized.

Member

Fishrock123 commented Sep 17, 2018

oof, this change is.. kinda expensive.

Here are the numbers I got for timers/timers-timeout-pooled.js n=10000000

  • master: 14,202,832.177390624
  • this patch: 5,709,151.565629168

Profiles:

@apapirovski

This comment has been minimized.

Member

apapirovski commented Sep 17, 2018

^ that seems worse than the benchmark CI run. But it's not generally surprising, it now has to run nextTicks on each timer which necessitates crossing over to C++ for each timer. The question is how frequent of an occurrence this is and whether this is a sacrifice we're willing to make for correctness.

@mscdex

This comment has been minimized.

Contributor

mscdex commented Sep 18, 2018

Perhaps running these changes against some real-world scenarios (e.g. acmeair, etc.) might help answer some of this?

@Fishrock123

This comment has been minimized.

Member

Fishrock123 commented Sep 18, 2018

The question is how frequent of an occurrence this is

Promise-based code is the most likely candidate here. the question is, do people really do lots of async promise stuff in timeouts? I don't really know but I do feel like I usually don't do much async stuff in timeouts.

@devsnek devsnek referenced this pull request Sep 19, 2018

Merged

implement queueMicrotask #22951

3 of 3 tasks complete
@apapirovski

This comment has been minimized.

Member

apapirovski commented Sep 28, 2018

I'm planning to revisit this again over the weekend. Hopefully tidy up some of the perf and get some clearer practical results.

@apapirovski

This comment has been minimized.

Member

apapirovski commented Oct 13, 2018

@sam-github

This comment has been minimized.

Member

sam-github commented Oct 13, 2018

I think this is a good idea, it simplifies the API and makes it much more consistent.

nodejs/nodejs.org#1804 for example describes some confusion about when the tick queue is cleared, and some attempts to make the docs describe the behaviour.

Does this PR clear the tick queue only after timer callbacks, or does it do so after immediates are run, too? Both would be useful, IMO.

@Fishrock123

This comment has been minimized.

Member

Fishrock123 commented Oct 13, 2018

@sam-github this also includes Immediates.

@apapirovski

This comment has been minimized.

Member

apapirovski commented Oct 14, 2018

New perf results. IMO good enough:

confidence improvement accuracy (*)    (**)   (***)
 timers/immediate.js type='breadth1' n=5000000                       ***     -4.80 %       ±2.64%  ±3.48%  ±4.47%
 timers/immediate.js type='breadth4' n=5000000                               -1.39 %       ±2.01%  ±2.65%  ±3.41%
 timers/immediate.js type='breadth' n=5000000                          *     -2.69 %       ±2.57%  ±3.40%  ±4.36%
 timers/immediate.js type='clear' n=5000000                           **      2.90 %       ±1.87%  ±2.47%  ±3.17%
 timers/immediate.js type='depth1' n=5000000                          **     -1.96 %       ±1.30%  ±1.71%  ±2.20%
 timers/immediate.js type='depth' n=5000000                                  -0.40 %       ±0.96%  ±1.27%  ±1.63%
 timers/set-immediate-breadth-args.js n=5000000                      ***     -5.49 %       ±1.54%  ±2.03%  ±2.61%
 timers/set-immediate-breadth.js n=10000000                          ***     -4.73 %       ±2.01%  ±2.65%  ±3.40%
 timers/set-immediate-depth-args.js n=5000000                         **     -1.75 %       ±1.16%  ±1.53%  ±1.97%
 timers/timers-breadth.js n=5000000                                   **     -6.61 %       ±4.47%  ±5.89%  ±7.57%
 timers/timers-cancel-pooled.js n=5000000                                    -2.89 %       ±6.57%  ±8.66% ±11.12%
 timers/timers-cancel-unpooled.js direction='end' n=1000000                   3.15 %      ±10.51% ±13.86% ±17.80%
 timers/timers-cancel-unpooled.js direction='start' n=1000000                -0.42 %       ±5.45%  ±7.19%  ±9.24%
 timers/timers-depth.js n=1000                                        **      0.08 %       ±0.05%  ±0.07%  ±0.08%
 timers/timers-insert-pooled.js n=5000000                              *     -2.09 %       ±1.99%  ±2.62%  ±3.36%
 timers/timers-insert-unpooled.js direction='end' n=1000000                  -1.10 %       ±1.53%  ±2.01%  ±2.59%
 timers/timers-insert-unpooled.js direction='start' n=1000000                 1.89 %       ±2.45%  ±3.23%  ±4.15%
 timers/timers-timeout-pooled.js n=10000000                          ***    -23.54 %       ±3.88%  ±5.12%  ±6.59%
 timers/timers-timeout-unpooled.js n=1000000                         ***    -24.73 %       ±3.06%  ±4.04%  ±5.20%
@apapirovski

This comment has been minimized.

Member

apapirovski commented Oct 14, 2018

I’ll update docs and write some tests for all of this on my flight.

@devsnek

This comment has been minimized.

Member

devsnek commented Oct 14, 2018

@Fishrock123 if you really want to optimize it we could use %RunMicrotasks(), obviously it wouldn't be very compatible with chakra and our linting and our coverage and etc

@mscdex

This comment has been minimized.

Contributor

mscdex commented Oct 14, 2018

I think having a lot of expiring timers is not an uncommon scenario. Imagine a server that handles a lot of connections, many of which may be idle. If there's a recurring timeout on each connection, that could easily be affected by the changes in this PR.

@apapirovski

This comment has been minimized.

Member

apapirovski commented Oct 15, 2018

I think having a lot of expiring timers is not an uncommon scenario. Imagine a server that handles a lot of connections, many of which may be idle. If there's a recurring timeout on each connection, that could easily be affected by the changes in this PR.

To be affected, you would need close to tens of millions of timeouts per second. Given how many requests per second a Node.js server can handle, that doesn't seem likely.

Realistically speaking, a 25% decrease on performance of, say, 10,000 timeouts per second is pretty much completely meaningless given that it works out to like 2.5ms.


More to the point, I find the standard you hold every PR to to be untenable. The notion that we should ignore other performance improvements we made (well, in this case — I) and just compare to master is ridiculous.

@Fishrock123
timers/timers-timeout-nexttick.js n=5000000                         ***     70.80 %       ±4.74%  ±6.26%  ±8.04%

Ok there is the real side of the story IMO. nextTicks processed alongside these are now much more efficient, and in practice, there are likely nextTicks emitted in a timeout from any number of APIs used, and certainly from anything that is a stream and calls destroy(), which sockets do.

timers: run nextTicks after each immediate and timer
In order to better match the browser behaviour, run nextTicks (and
subsequently the microtask queue) after each individual Timer and
Immediate, rather than after the whole list is processed. The
current behaviour is somewhat of a performance micro-optimization
and also partly dictated by how timer handles were implemented.
@apapirovski

This comment has been minimized.

Member

apapirovski commented Oct 16, 2018

@jasnell would be good to pull this in to 11 if at all possible. This improves compatibility with the browsers and significantly impacts the microtask queue, and consequently promises. I would also like to avoid having the first release of this be a future LTS line.

@apapirovski

This comment has been minimized.

Member

apapirovski commented Oct 16, 2018

Why is travis checking 9d7895c? Can anyone explain?

@apapirovski apapirovski added this to the 11.0.0 milestone Oct 16, 2018

@jasnell

This comment has been minimized.

Member

jasnell commented Oct 17, 2018

@apapirovski ... with only a week to go until 11 is out, that's rather risky, it's a fairly significant change. I'm not necessarily opposed to it tho. @nodejs/tsc... any objections to including this in 11 if it lands in time?

];
console.log(lines);

This comment has been minimized.

@jasnell

jasnell Oct 17, 2018

Member

Is the console.log(lines) part of the test? If so, please leave a comment indicating so, if no, please remove it :-)

This comment has been minimized.

@BridgeAR

BridgeAR Oct 17, 2018

Member

I am pretty sure I had a PR that improved the logging of this in case of a failure. Currently it's hard to know what failed without this. I think I closed that one by accident when I wanted to land it 🤣 So I'll open another one for that.

This comment has been minimized.

@apapirovski

apapirovski Oct 18, 2018

Member

Nope. I’ll remove.

];
console.log(lines);

This comment has been minimized.

@BridgeAR

BridgeAR Oct 17, 2018

Member

I am pretty sure I had a PR that improved the logging of this in case of a failure. Currently it's hard to know what failed without this. I think I closed that one by accident when I wanted to land it 🤣 So I'll open another one for that.

@jasnell

This comment has been minimized.

Member

jasnell commented Oct 17, 2018

Note, this will have to land by Saturday in order to make it in to 11.0.0

@Trott

This comment has been minimized.

Member

Trott commented Oct 18, 2018

Why is travis checking 9d7895c? Can anyone explain?

@apapirovski Last time this came up, the explanation was #23307 (comment). That issue has since been fixed. Not sure why it failed here unless the Travis run was triggered some other way than as a pull request. Which I guess seems unlikely....

@Trott

This comment has been minimized.

Member

Trott commented Oct 18, 2018

(I'll try to fix it by having Travis skip the commit message linting if the environment variable is not populated.)

Trott added a commit to Trott/io.js that referenced this pull request Oct 18, 2018

tools: do not lint commit message if var undefined
Check that $TRAVIS_COMMIT_RANGE is set before trying to lint commit
messages in Travis CI.

Refs: nodejs#23572 (comment)
Refs: nodejs#22842 (comment)

@Trott Trott referenced this pull request Oct 18, 2018

Closed

tools: do not lint commit message if var undefined #23725

2 of 2 tasks complete

Trott added a commit to Trott/io.js that referenced this pull request Oct 18, 2018

timers: run nextTicks after each immediate and timer
In order to better match the browser behaviour, run nextTicks (and
subsequently the microtask queue) after each individual Timer and
Immediate, rather than after the whole list is processed. The
current behaviour is somewhat of a performance micro-optimization
and also partly dictated by how timer handles were implemented.

PR-URL: nodejs#22842
Fixes: nodejs#22257
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Gus Caplan <me@gus.host>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
@Trott

This comment has been minimized.

Member

Trott commented Oct 18, 2018

Landed in e7af983

@Trott Trott closed this Oct 18, 2018

refack added a commit that referenced this pull request Oct 18, 2018

tools: do not lint commit message if var undefined
Check that $TRAVIS_COMMIT_RANGE is set before trying to lint commit
messages in Travis CI.

Refs: #23572 (comment)
Refs: #22842 (comment)

PR-URL: #23725
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Matheus Marchini <mat@mmarchini.me>

addaleax added a commit that referenced this pull request Oct 20, 2018

tools: do not lint commit message if var undefined
Check that $TRAVIS_COMMIT_RANGE is set before trying to lint commit
messages in Travis CI.

Refs: #23572 (comment)
Refs: #22842 (comment)

PR-URL: #23725
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Matheus Marchini <mat@mmarchini.me>

@apapirovski apapirovski referenced this pull request Oct 21, 2018

Closed

11.0.0 proposal #23230

0 of 4 tasks complete

jasnell added a commit that referenced this pull request Oct 21, 2018

timers: run nextTicks after each immediate and timer
In order to better match the browser behaviour, run nextTicks (and
subsequently the microtask queue) after each individual Timer and
Immediate, rather than after the whole list is processed. The
current behaviour is somewhat of a performance micro-optimization
and also partly dictated by how timer handles were implemented.

PR-URL: #22842
Fixes: #22257
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Gus Caplan <me@gus.host>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>

jasnell added a commit that referenced this pull request Oct 21, 2018

tools: do not lint commit message if var undefined
Check that $TRAVIS_COMMIT_RANGE is set before trying to lint commit
messages in Travis CI.

Refs: #23572 (comment)
Refs: #22842 (comment)

PR-URL: #23725
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Matheus Marchini <mat@mmarchini.me>

MylesBorins added a commit that referenced this pull request Oct 30, 2018

tools: do not lint commit message if var undefined
Check that $TRAVIS_COMMIT_RANGE is set before trying to lint commit
messages in Travis CI.

Refs: #23572 (comment)
Refs: #22842 (comment)

PR-URL: #23725
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Matheus Marchini <mat@mmarchini.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment