Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor timers (#11807) 2nd try #11850

Merged
merged 12 commits into from Sep 6, 2023
Merged

Conversation

Swiftb0y
Copy link
Member

So, this is #11807 reopened after I reset it out of the branch. Unfortunately this PR does suffer a significant issue: Due to Qt API limitations, PerformanceTimer::difference can not provide the same resolution as previously.

Fortunately, the only real consumer of that function is VSyncThread::fromTimerToNextSyncMicros. Unfortunately, the resolution provided by Qt (Milliseconds) does not suffice in the context of frame timings:

int VSyncThread::fromTimerToNextSyncMicros(const PerformanceTimer& timer) {
int difference = static_cast<int>(m_timer.difference(timer).toIntegerMicros());
// int math is fine here, because we do not expect times > 4.2 s
return difference + m_waitToSwapMicros;
}

I'd prefer to just get rid of PerformanceTimer::difference since it doesn't really have much to do with Timers anyways and just use std::chrono::time_point in the VSyncThread with the appropriate std::chrono clock. The alternative would be to drop the QElapsedTimer commit and just use the std::chrono-based implementation of PerformanceTimer. wdyt? @daschuer

Marking PR as draft until we discussed a solution and the PR is ready to be merged.

@JoergAtGithub
Copy link
Member

I've somewhere a branch where I changed VSyncThread::fromTimerToNextSyncMicros to an absolute time reference. This absolute time reference was needed in the sounddevice implementation anyway, because Abeton-Link needs a jitter-free absolute clock reference. And the filtering to get this jitter-free clock requires an absolute monotonic clock.

@Swiftb0y
Copy link
Member Author

great. could you try to dig that up? Then we can merge it independently as a per-requisite.

@JoergAtGithub
Copy link
Member

JoergAtGithub commented Aug 21, 2023

I found it: https://github.com/JoergAtGithub/mixxx/tree/visualPlaypositionAbsoluteTime
The clock filtering code itself is more refined in #10999.
But the code is not in ready for review state - it was more an experiment, if this jitter free clock makes a visual difference in the waveforms.

@Swiftb0y
Copy link
Member Author

Mhmm okay, so what's your preferred plan of action? Just implement a more basic std::chrono-based mechanism in the vsync branch independent of your visualPlaypositionAbsoluteTime branch?

@JoergAtGithub
Copy link
Member

The Ableton Link code uses the following platform specific monotonic clock implementations:
https://github.com/Ableton/link/blob/master/include/ableton/platforms/darwin/Clock.hpp
https://github.com/Ableton/link/blob/master/include/ableton/platforms/linux/Clock.hpp
https://github.com/Ableton/link/blob/master/include/ableton/platforms/windows/Clock.hpp
My visualPlaypositionAbsoluteTime branch used these as clock reference. In general this code should work with any monotonic clock reference, which retruns std::chrono::microseconds.

@Swiftb0y
Copy link
Member Author

right, so lets just trust the stdlib implementors and use std::chrono::steady_clock. I don't like the interface of the ableton clock implementations.

@daschuer
Copy link
Member

I am confused. From https://doc.qt.io/qt-6/qelapsedtimer.html#nsecsElapsed It looks like that QElapsedTimer gives us a nanoseconds resolution if possible.

Before this commit, Qt uses the desired low level API as before:
qt/qtbase@c9f4c0d#diff-8523988969292961dc389bb18a73fbe5e2328ee0fe49798c14b61850ed270c97

Does this involve a regression?

@daschuer
Copy link
Member

I think Yes :-( it uses the "std::chrono::steady_clock" which is know to may have a low resolution: https://stackoverflow.com/questions/29551126/does-steady-clock-have-only-10ms-resolution-on-windows-cygwin
std::chrono::high_resolution_clock seems to be the better choice. But we don't have a guaranteed monotonic and overflow behavior.

Such high level APIs break at any time just like the QElapsedTimer in QT 6.6 because we rely on undefined behavior.

What was the issue with our original Implementation? Can we just revert to the original low level API call with guaranteed behaviors and fix the issues there?

Alternative we may copy the QT < 6.6 implementation to our source tree and extend it by a "nsecsTo" or "durationTo()"

@Swiftb0y
Copy link
Member Author

I am confused. From https://doc.qt.io/qt-6/qelapsedtimer.html#nsecsElapsed It looks like that QElapsedTimer gives us a nanoseconds resolution if possible.

Yes, but it only gives us the elapsed time at the time of calling. In other words, it only gives us the duration between when the timer was started and at the time of calling nsecsElapsed. PerformanceTimer::difference(rhs) instead gives us the difference between when this was started and rhs was started. We can not implement that ourselves because QElapsedTimer does not let us access the start time. Instead QElapsedTimer::msecsTo implements that, but only with millisecond precision for some reason. This gives us two/three options:

  1. Use our own timer implementation, not bound by the API limitations of QElapsedTimer.
  2. remove PerformanceTimer::difference
  3. implement QElapsedTimer::nsecsTo upstream. Useless to us right now because it would only be picked up by Qt6 anyways.

Right now, I'm currently opting for 2. which entails a std::chrono based rewrite of vsyncthread...

@Swiftb0y
Copy link
Member Author

Swiftb0y commented Aug 21, 2023

What was the issue with our original Implementation? Can we just revert to the original low level API call with guaranteed behaviors and fix the issues there?

Nothing much, I just wanted to clean it up because nobody has looked at it since the last 10 years (and probably won't for another 10).

I think Yes :-( it uses the "std::chrono::steady_clock" which is know to may have a low resolution: https://stackoverflow.com/questions/29551126/does-steady-clock-have-only-10ms-resolution-on-windows-cygwin

thats unfortunate, but in proper implementation this should be checkable statically: static_assert(std::ratio_greater_equal_v<std::chrono::steady::clock::period, std::micro>);
Though in the case of cygwin it probably doesn't matter because its too low quality of an implementation anyways.

Such high level APIs break at any time just like the QElapsedTimer in QT 6.6 because we rely on undefined behavior.

Can you elaborate on the undefined behavior you are eluding to?

@daschuer
Copy link
Member

This gives us two/three options:

  1. I am in favor for our own timer using the low level API, because this way we have a guaranteed behavior at build time.
  2. We need difference() aka durationTo() because all other ns precision functions requires 2 x now()
  3. Fix QTs 6.6 QElapsedTimer would be a nice project, to make Mixxx needs visible. There are some precision issues to fix, but since they pick std::chrono::steady_clock the use case is probably not the high precision we want. I don't see an issue for waiting, because we can copy the files to our source tree like before.

Can you elaborate on the undefined behavior you are eluding to?

The standard doe not define a resolution nor an overflow behavior. Yes, it provides a API to peek it like you proposed, but that need to be done at runtime, because it depends on the OS and std implementation.
I am worrying about std::chrono clocks, because there is no high_resolution_clock that is also guaranteed monotonic.

Given that we have a reliable low level API code, it does not make much sense to give away control to glue layers and than take the efforts to investigate if the implementation suits our needs. This is too much work IMHO but can of cause be done.

So I would either tidy our own established PerformanceTimer or copy the QElapsedTimer before QT6.6 instead and add the missing function.

@Swiftb0y
Copy link
Member Author

Swiftb0y commented Aug 22, 2023

The standard doe not define a resolution nor an overflow behavior. Yes, it provides a API to peek it like you proposed, but that need to be done at runtime, because it depends on the OS and std implementation.

Why can't we do this at compile time? We know the target OS and the target std lib at compile time. After all, thats how the platform specific APIs are switched out at compile time too (using #ifdefs).

@Swiftb0y
Copy link
Member Author

We need difference() aka durationTo() because all other ns precision functions requires 2 x now()

not really. could even be kept just as concise with a small helper function.

I am worrying about std::chrono clocks, because there is no high_resolution_clock that is also guaranteed monotonic.

Right, but high_resolution_clock is monotonic on windows and steady_clock is high resolution enough on macOS and linux (which can be asserted at compile time).

@Swiftb0y
Copy link
Member Author

Swiftb0y commented Aug 22, 2023

We need difference() aka durationTo() because all other ns precision functions requires 2 x now()

I still don't like difference since it has nothing to do with the actual time difference being managed by QElapsedTimer. It simply compares their starting time.

While reworking vsyncthread and in turn visualplayposition, we're passing a PerformanceTimer object through multiple layers (possibly threads even (unguarded!!!)) just in order to eventually call difference on it. Simply passing a timestamp would be sooo much easier and safer, simply better.


there is no conceptual reason for difference to exist on a timer class directly. Abusing QElapsedTimer or PerformanceTimer for managing timestamps is wrong!

@daschuer
Copy link
Member

Why can't we do this at compile time?

Because we use the same binary in different OS versions.
The used functions are not constexpr. Sure, we can look up implementatiin detail and assume behavior and than assert them at runtime. It's just that when have an established solution already using the low level API.

steady_clock is high resolution enough

What is the resolution on Linux. What was the resolution of our low level implementation. I don't think it is reasonable to pick a timer with a lower resolution and a glue layer that shifts the overflow artificialy. We should aim for the fastest solution with maximum precision.

I still don't like "difference()' Qt 6.6 has durationTo() which fulfils the same use case. Maybe we can introduce that.

PerformanceTimer is nothing else than a timestamp. It can be passed around as you like. durationTo() is just a renamed operator-() implementation.

@Swiftb0y
Copy link
Member Author

Okay, so lets come to a compromise. We keep the previous low-level implementation but implement the high-level interface using std::chrono primitives.

PerformanceTimer is nothing else than a timestamp. It can be passed around as you like. durationTo() is just a renamed operator-() implementation.

Under the hood its a timestamp yes, but it does not have the value semantics you would expect from a timestamp. As such, we should avoid passing it around as such!

What is the resolution on Linux. What was the resolution of our low level implementation.

Let me investigate, I doubt there is any difference with libstdc++'s timer implementation.

@Swiftb0y
Copy link
Member Author

This is my crude test, suggesting that the difference between steady_clock and high_resolution_clock is none under linux (I assume thats what gcc runs on on compiler explorer). https://compiler-explorer.com/z/ex6o3srTY

@daschuer
Copy link
Member

daschuer commented Aug 22, 2023

You test measures the duration of the loop. I have modified it to measure the duration of two consecutive timer calls compared to the low level API: https://compiler-explorer.com/z/1PnaTqWGE

I think we see that we have ns resolution in both cases.

Okay, so lets come to a compromise. We keep the previous low-level implementation but implement the high-level interface using std::chrono primitives.

Thank you, that works for me.

As such, we should avoid passing it around as such!

The trick here is, that we compare the highest resolution value that will naturally overflow. If you store the value scaled as a chrono ns type you loose it. That's why std::chrono::time_point exists passing that around is equivalent to our original solution with difference() or the Qts durationTo() So I am afraid we do not get around that or similar.

@Swiftb0y
Copy link
Member Author

I think we see that we have ns resolution in both cases.

Right... so, using chrono clocks instead of the lowlevel ones is an option after all?

That's why std::chrono::time_point exists passing that around is equivalent to our original solution with difference() or the Qts durationTo() So I am afraid we do not get around that or similar.

Right, my point is that we should not compare and pass around Timers but instead std::chrono::time_points as those have explicit value semantics. The solutions are equivalent in practice, but abusing Timers as time_stamps is more brittle (since timers can be invalid when they haven't been started yet).

@Swiftb0y
Copy link
Member Author

fyi, digging deeper in the low-level APIs, there is no guarantee over the precision of timers on macOS either as far as I can tell... I still think we just should use std::chrono::steady_clock. If we need to ensure some precision at runtime, lets just port the previous demo code to a unit test... Yes the precision is undefined, but its not the same UB where the compiler will break our code if we rely on it. Platforms which have a monotonic clock whose resolution is too low, we simply can't support (as already the case with the current PerformanceTimer implementation, evident by the dummy implementation).

// default implementation (no hi-perf timer) does nothing
void PerformanceTimer::start()
{
}
mixxx::Duration PerformanceTimer::elapsed() const
{
return mixxx::Duration::fromNanos(0);
}
mixxx::Duration PerformanceTimer::restart() const
{
return mixxx::Duration::fromNanos(0);
}
mixxx::Duration PerformanceTimer::difference(const PerformanceTimer& timer) const
{
return mixxx::Duration::fromNanos(0);
}
#endif

@daschuer
Copy link
Member

Right... so, using chrono clocks instead of the lowlevel ones is an option after all?

What are your reasons to use it?

The solutions are equivalent in practice, but abusing Timers as time_stamps is more brittle (since timers can be invalid when they haven't been started yet).

Since our PerformanceTimer is basically a time stamp I do not see an abuse here. But of cause we can warp the timestamp part of PerformanceTime into a PerformanceTimeStamp class or such and pass that around instead.

std::chrono::time_points as those have explicit value semantics.

I don't get that point. You need always build the difference to get a human readable value. That is exactly the same with
PerformanceTimer::difference().

We have good experience with the low level timers and no reported issues with it. So we need some good reasons to replace that with a additional abstraction layer that takes more cpu time, developer and review resources and introduces additional dependencies we can't control.

@Swiftb0y
Copy link
Member Author

What are your reasons to use it?

Less code, less maintenance. Makes it easier to benefit from the modern abstractions.

Since our PerformanceTimer is basically a time stamp I do not see an abuse here. But of cause we can warp the timestamp part of PerformanceTime into a PerformanceTimeStamp class or such and pass that around instead.

Yes, I'd be more okay with a method that extracts a chrono::time_stamp from a timer which can then be passed around.

@daschuer
Copy link
Member

daschuer commented Aug 23, 2023

What are your reasons to use it?

Less code, less maintenance. Makes it easier to benefit from the modern abstractions.

Don't touching the code without an issue is the minimum of maintenance.

We don't make use of the modern abstraction or have any benefit or demands.

I think creating chrono::time_stamp from get_time() is exactly what std::chrono::steady_clock doe under the hood. So there is no reason to code it. Unfortunately this does not comes without any cost. But in our case there is no reason for a common abstraction layer, so lets not spend this cost.

If you like to use the modern abstraction in places where precision and performance is not a curlpit, you can already use plain QElapsedTimer or the Chrono functions. There is no reason to wrap them. I consider using a Mixxx function more dificult, than a QT or Std API function

Did you consider to spit out PerformanceTimeStamp form PerformanceTimer? For my understanding that targets your concern the best. I fully agree that passing something around that is called "Timer" looks odd.

Something like this:

class PerformanceTimeStamp
{
public:
    PerformanceTimerStamp() {
      t1 = 0;
#if defined(Q_OS_UNIX) && !defined(Q_OS_MAC)
      t2 = 0;
#endif
    };

    mixxx::Duration PerformanceTimeStamp::operator-(const PerformanceTimeStamp& time_stamp) const {
#if defined(Q_OS_UNIX) && !defined(Q_OS_MAC)
        return mixxx::Duration::fromNanos((t1 - time_stamp.t1) * Q_INT64_C(1000000000) + t2 - time_stamp.t2);
#else
        return mixxx::Duration::fromNanos(absoluteToNSecs(t1 - time_stamp.t1));
#endif
    }

private:
    qint64 t1;
#if defined(Q_OS_UNIX) && !defined(Q_OS_MAC)
    qint64 t2;
#endif
};

For the rest we can use QElapsedTimer directly. 

@Swiftb0y
Copy link
Member Author

I think creating chrono::time_stamp from get_time() is exactly what std::chrono::steady_clock doe under the hood. So there is no reason to code it. Unfortunately this does not comes without any cost.

What cost?

Did you consider to spit out PerformanceTimeStamp form PerformanceTimer?

Why not use std::chrono::time_stamp instead of reinventing the wheel?

@Swiftb0y
Copy link
Member Author

Swiftb0y commented Aug 31, 2023

I doubt that this will work on modern CPUs with power management - they vary the clock speed depending on CPU load and core temperature.

I agree. That test will get brittle really fast. I've already observed in the timers-benchmark branch that the median duration of subsequent now() calls is 200 nanoseconds!!! Thats why I set the minimum resolution the initial benchmark to such a high bound.

That said, I did spend/waste some time on my only attempt at implement the benchmark you suggested in a robust manner. I'll push the PoC later. Nevermind. That approach hinged on google benchmark exposing cycleclock which they don't and I don't want to vendor that.

Copy link
Member

@daschuer daschuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now.
Before merge you nee to remove the Draft state.

@Swiftb0y
Copy link
Member Author

Swiftb0y commented Sep 6, 2023

Thank you. Just tidied up the test a little and amended.

@Swiftb0y
Copy link
Member Author

Swiftb0y commented Sep 6, 2023

ofcourse, ancient AppleClang has no C++20 ranges support...

@Swiftb0y
Copy link
Member Author

Swiftb0y commented Sep 6, 2023

all green, no draft. lets go

Copy link
Member

@daschuer daschuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. LGTM.

@JoergAtGithub JoergAtGithub merged commit 8a16387 into mixxxdj:main Sep 6, 2023
13 checks passed
@Swiftb0y
Copy link
Member Author

Swiftb0y commented Sep 6, 2023

Thank you. And lets please reduce the arguing over unnecessary concerns in the future.

@Swiftb0y
Copy link
Member Author

Swiftb0y commented Sep 6, 2023

I will also take the liberty to remove the steady_clock test if it turns out to be brittle.

@Swiftb0y Swiftb0y deleted the refactor/timers branch September 6, 2023 17:38
@Swiftb0y Swiftb0y mentioned this pull request Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants