Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UV_RUN_ONCE breaks the usual order of timers / poll in the event loop #3686

Closed
mmomtchev opened this issue Jul 14, 2022 · 14 comments
Closed

Comments

@mmomtchev
Copy link
Contributor

  • Version: 1.43.0 (Node.js 19 branch)
  • Platform: Linux ubuntu 5.4.0-110-generic 124-Ubuntu SMP Thu Apr 14 19:46:19 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

This code:

if (mode == UV_RUN_ONCE) {

is meant to run when uv__io_poll() returns without having called any I/O so that it can run at least one of the timers.

However it makes no such checks at all - in fact it is simply a second entry point for running the timers.

This deviates from the normal flow where all timers run, then all sockets are polled.

In particular this otherwise valid Node.js unit test becomes flaky if used on an event loop with UV_RUN_ONCE - where the order can be timers - sockets or timers - timers - sockets: https://github.com/nodejs/node/blob/main/test/sequential/test-timers-block-eventloop.js

Is this really a bug, or is it simply undefined behaviour (on which Node.js has come to rely)?

mmomtchev added a commit to mmomtchev/node that referenced this issue Jul 14, 2022
Refs: libuv/libuv#3686

Waiting to see if this is supposed to be defined behavior
mmomtchev added a commit to mmomtchev/node that referenced this issue Jul 15, 2022
Refs: libuv/libuv#3686

Waiting to see if this is supposed to be defined behavior
mmomtchev added a commit to mmomtchev/node that referenced this issue Jul 16, 2022
Refs: libuv/libuv#3686

Waiting to see if this is supposed to be defined behavior
@bnoordhuis
Copy link
Member

The commit log of commit f6d8ba3 explains why it's necessary to run timers.

I think it'd be okay to change the current behavior as long as something runs. In the special case of uv__io_poll() timing out, that something are the expired timers.

@vtjnash
Copy link
Member

vtjnash commented Jul 18, 2022

I would possibly argue we could remove the timers from the top of the loop and make them unconditional at the end, since (except for the first iteration) it currently just runs the timers twice in a row (for UV_RUN_ONCE mode). But needs testing

@bnoordhuis
Copy link
Member

That's probably an observable change in behavior. Node's documentation is pretty explicit about what phases run when so chances are good that change would break someone's code.

I guess Node's test suite would uncover that since it's got pretty exhaustive event loop coverage.

@vtjnash
Copy link
Member

vtjnash commented Jul 19, 2022

My proposed theory is that the last phase of the loop and the first phase of the next loop are (or at least should be) indistinguishable to a sensible consumer, since they differ only by the placement of the while condition check

@mmomtchev
Copy link
Contributor Author

@vtjnash @bnoordhuis I just ran all of Node.js' unit tests after moving uv__run_timers after uv__io_poll both through the normal event loop and through my modified UV_RUN_ONCE event loop and they all seem to work except nodejs/node#43680 which goes from its current flaky state to reliable failure.
Still, I am afraid that a behavioural change of the event loop for the sake of elegance will be a tough sell to the Node.js team - but if you can convince them, it is probably the best solution. Otherwise I will probably add a return value to uv__io_poll with a single timer to run.

@bnoordhuis
Copy link
Member

Worst case we can always revert such a change but I'd feel kind of bad if someone ends up spending half a day debugging a timing issue because we were a little too cavalier. Apart from that no strong opinion.

@vtjnash
Copy link
Member

vtjnash commented Jul 20, 2022

Breaking a test that is already broken doesn't sound too bad. But we can make it a v2 fix, if it seems to breaking now.

mmomtchev added a commit to mmomtchev/node that referenced this issue Jul 30, 2022
mmomtchev added a commit to mmomtchev/node that referenced this issue Jul 30, 2022
Refs: libuv/libuv#3686

Waiting to see if this is supposed to be defined behavior
mmomtchev added a commit to mmomtchev/node that referenced this issue Jul 30, 2022
mmomtchev added a commit to mmomtchev/node that referenced this issue Aug 20, 2022
Refs: libuv/libuv#3686

Waiting to see if this is supposed to be defined behavior
mmomtchev added a commit to mmomtchev/node that referenced this issue Aug 31, 2022
Refs: libuv/libuv#3686

Waiting to see if this is supposed to be defined behavior
mmomtchev added a commit to mmomtchev/node that referenced this issue Aug 31, 2022
mmomtchev added a commit to mmomtchev/node that referenced this issue Aug 31, 2022
Refs: libuv/libuv#3686

Waiting to see if this is supposed to be defined behavior
mmomtchev added a commit to mmomtchev/node that referenced this issue Aug 31, 2022
@stale
Copy link

stale bot commented Sep 21, 2022

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

@stale stale bot added the stale label Sep 21, 2022
@mmomtchev
Copy link
Contributor Author

I will submit a PR

@stale stale bot removed the stale label Sep 21, 2022
mmomtchev added a commit to mmomtchev/node that referenced this issue Sep 22, 2022
Refs: libuv/libuv#3686

Waiting to see if this is supposed to be defined behavior
mmomtchev added a commit to mmomtchev/node that referenced this issue Sep 22, 2022
mmomtchev added a commit to mmomtchev/node that referenced this issue Oct 10, 2022
Refs: libuv/libuv#3686

Waiting to see if this is supposed to be defined behavior
mmomtchev added a commit to mmomtchev/node that referenced this issue Oct 10, 2022
mmomtchev added a commit to mmomtchev/node that referenced this issue Oct 10, 2022
Refs: libuv/libuv#3686

Waiting to see if this is supposed to be defined behavior
mmomtchev added a commit to mmomtchev/node that referenced this issue Oct 10, 2022
mmomtchev added a commit to mmomtchev/node that referenced this issue Oct 10, 2022
Refs: libuv/libuv#3686

Waiting to see if this is supposed to be defined behavior
@mmomtchev
Copy link
Contributor Author

@bnoordhuis @vtjnash I made a dummy Node.js PR with this change - there is one ASAN test that timed out and I do not know if it is related or not - I found other PRs where this exact test timing out in the ASAN build was the only error but it is not known as a flaky test. I will try investigating further.

Still, I think that this change will make lots of people very nervous and I will definitely not be able to push this through. I am still hesitating if the best way to fix this is not to simply return the expired timer and then to run it.

@mmomtchev
Copy link
Contributor Author

Here is my raw libuv repro:

int timer_check_double_call_called = 0;
static void timer_check_double_call(uv_timer_t* handle) {
  ASSERT(1 != timer_check_double_call_called);
  timer_check_double_call_called = 1;
}

TEST_IMPL(timer_no_double_call) {
  uv_timer_t timer_handle;
  const uint64_t timeout_ms = 10;

  ASSERT(0 == uv_timer_init(uv_default_loop(), &timer_handle));
  ASSERT(0 ==
         uv_timer_start(&timer_handle, timer_check_double_call, timeout_ms, timeout_ms));
  uv_sleep(timeout_ms * 2);
  ASSERT(1 == uv_run(uv_default_loop(), UV_RUN_ONCE));
  ASSERT(1 == timer_check_double_call_called);

  uv_close((uv_handle_t*)&timer_handle, NULL);

  MAKE_VALGRIND_HAPPY();
  return 0;
}

@mmomtchev
Copy link
Contributor Author

#3791 is the alternative solution - it appears very invasive but in fact it has less consequences - instead of modifying the order of the event loop, it tracks if any of the event loop routines did any work, and then double calls the timer only if no work was done during the current call.

@mmomtchev
Copy link
Contributor Author

I am proposing #3791 as a less intrusive change - but should you decide otherwise - I can proceed with exchanging the event loop order - which is a two-line change that is somewhat riskier.

Ping me once you have set your priorities straight.

trevnorris added a commit to trevnorris/libuv that referenced this issue Mar 16, 2023
The maximum number of times timers should run when uv_run() is called
with UV_RUN_ONCE and UV_RUN_NOWAIT is 1. Do that by conditionally
calling timers before entering the while loop when called with
UV_RUN_DEFAULT.

The reason to always run timers at the end of the while loop, instead of
at the beginning, is to help enforce the conceptual event loop model.
Which starts when entering the event provider (e.g. calling poll).

Other than only allowing timers to be processed once per uv_run()
execution, the only other noticeable change this will show is if all the
following are true:
* uv_run() is called with UV_RUN_NOWAIT or UV_RUN_ONCE.
* An event is waiting to be received when poll is called.
* Execution time between the call to uv_timer_start() and entering the
  while loop is longer than the timeout.

If all these are true, then timers that would have executed before
entering the event provider will now be executed afterward.

Fixes: libuv#3686
Co-authored-by: Momtchil Momtchev <momtchil@momtchev.com>
@saghul
Copy link
Member

saghul commented Dec 18, 2023

Sorry to come here after all this time, but this change has broken my app (txiki.js) and I wonder if others could be affected too.

Here is how my code works in a nutshell:

  • There are 2 unref'd handles that check if there are promises to run (1 prepare, 1 check)
  • If there are any promises to run they start an idle handle, which is ref'd
  • Timers run on a uv_timer_t

After this change, timers run last, and thus my check handle hasn't run as the last thing, which is what I used to keep the loop alive by starting the idle handle. This means the loop could exit early, if new promises were created on a timer, since check handles no longer run afterwards.

I think this change in behavior is wrong.

saghul added a commit to saghul/txiki.js that referenced this issue Dec 18, 2023
In libuv/libuv#3686 libuv changed when timers
run respective to check handles.

Change the way we run the loop by running it again if necessary, untill
there is no more work to do of the loop has been stopped.

Fixes: #443
saghul added a commit to saghul/txiki.js that referenced this issue Dec 18, 2023
In libuv/libuv#3686 libuv changed when timers
run respective to check handles.

Change the way we run the loop by running it again if necessary, untill
there is no more work to do of the loop has been stopped.

Fixes: #443
saghul added a commit to saghul/txiki.js that referenced this issue Dec 18, 2023
In libuv/libuv#3686 libuv changed when timers
run respective to check handles.

Change the way we run the loop by running it again if necessary, untill
there is no more work to do of the loop has been stopped.

Fixes: #443
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants