Skip to content

<condition_variable>: wait_for with a Predicate used with floating point duration is extremely unsafe #6088

@CookiePLMonster

Description

@CookiePLMonster

Describe the bug

First of all: I don't know with absolute certainty if this is considered a bug, or surprising-yet-Standard behavior. MSVC STL and Clang STL are both affected by the same issue, so I am hoping to use this issue report as means of starting a conversation about the validity of wait_for's implementation.

This particular issue was observed with condition variables, but it affects everything using _To_absolute_time(). This makes std::this_thread::sleep_for just as likely to be affected.

This correctly-looking way of using a std::condition_variable:

condVar.wait_for(lock, std::chrono::duration<float>(TimeInSecs), predicate);

compiles with no warnings or errors on /W4, but can lead to non-obvious issues with the condition variable timing out prematurely due to floating point precision issues.

Following the chain of events:

  1. wait_for gets called.
  2. wait_for with a Predicate is implemented with wait_until:
    return wait_until(_Lck, _To_absolute_time(_Rel_time), _STD move(_Pred));
  3. _To_absolute_time tries to use a common type to store the result of std::steady_clock::now() + _Rel_time:
    decltype(_Now + _Rel_time) _Abs_time = _Now; // return common type

This, however, leads to highly negative side effects if _Rel_time is a floating-point duration. The result type of adding a long long (which now() is) and float is a float, but that type simply cannot precisely store a high enough value of now(), which is a system uptime in micro- or nano-seconds.

With high enough uptimes, _Abs_time <= _Now, and so the condition variable ends up timing out immediately if the predicate fails. This effectively turns this usage of CV into a busy loop, and that is exactly what we've observed in production.

Using std::chrono::duration<double> avoids this problem at least in the short term, as double has enough precision to store high values of std::steady_clock::now(). However, it's only a workaround, and a true fix involves not using any floating-point types for delays, regardless of width.

Command-line test case

N/A

Expected behavior

I expect .wait_for(lock, std::chrono::duration<float>(x), predicate) and sleep_for(std::chrono::duration<float>(x)) to do one of the two things:

  1. Emit a descriptive warning for the developer.
  2. Correctly delay execution without premature timeouts due to floating point precision issues.

STL version

The issue was initially observed on a 14.34 compiler with this STL version:
#define _MSVC_STL_UPDATE 202208L
but looking at the repository, the implementation is still identical in the latest main branch.

Additional context

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions