Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[linux] SystemClock is not reliable shortly after waking from sleep #4375

Open
jamshark70 opened this issue Apr 5, 2019 · 7 comments

Comments

Projects
None yet
3 participants
@jamshark70
Copy link
Contributor

commented Apr 5, 2019

Short summary

sclang and scsynth follow std::chrono::high_resolution_clock.

In Linux, it appears that the high-resolution clock is, in fact, not monotonic. I've observed time jumps of several hundred milliseconds in an instant, when using SC shortly after waking my system from sleep.

Nobody noticed this in some 15+ years of the Linux version existing, probably because there have never been any serious efforts to sync SC with other software. Now we have LinkClock, so the problem becomes more visible.

Environment

  • SuperCollider version: 3.10.2
  • Operating system: Ubuntu Studio 18.04

Steps to reproduce

I can reproduce a discontinuity in SystemClock times fairly reliably this way. There is not a specific amount of time until the problem happens. It may be a couple of minutes, it may be 15-30 minutes.

  1. The OS session is suspended. sclang should not be running at the time of putting the computer to sleep.

  2. Wake the computer from sleep.

  3. Launch SC.

  4. Start running some patterns. Eventually there will be a brief period of "late" messages from scsynth. In my experience, it recovers quickly.

Alternately, use a 3.11 test build and run a LinkClock alongside an external, non-SC Link peer.

TempoClock.default = t = LinkClock.new.latency_(0.2);
// run some patterns

When the problem happens, the SC LinkClock will desynchronize from the other peers. I've personally witnessed this behavior (SC LinkClock vs Ableton Live). The only way to recover is to reboot the interpreter (reinitializing the SC clocks).

Expected vs. actual behavior

Expected: 1 second = 1 second, always, within accepted tolerance.

Actual: I have testing data from my DDWOscSyncClocks, where:

Index Local SystemClock.seconds Remote SystemClock.seconds local - remote
1 467.835693893 531.06219482422 -63.22650093122
2 468.5567713 531.17431640625 -62.61754510625

The remote clock advanced by c. 0.112 sec, and in the same amount of time, the local clock advanced by c. 0.721: over 600 ms ahead of where it should be. Allowing for clocks on different machines to drift slightly over time, it's pretty safe to say that a 500-600% jump ahead is neither within any reasonable tolerance, nor musically useful. Times remained consistent after the jump; clock times were permanently affected.

I'm not sure of all of the ramifications of switching SC in Linux to use a monotonic clock. But this finding is pretty bad. It may be worth some risk.

Windows, btw, is very likely using QueryPerformanceCounter which (I think) is monotonic. MacOS, I don't know. It's probably a Linux-only problem.

(Logging for sclang + scsynth + supernova, because their clocks must all agree.)

@muellmusik

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2019

high_resolution_clock is monotonic on OSX I believe. On Windows, it is not on VS2012 at least:

https://solarianprogrammer.com/2012/10/14/cpp-11-timing-code-performance/

Linux, don't know.

Shifting linux to a monotonic clock (if it's not!) could help. At a glance a potential issue is that SC_PortAudio and SC_Jack seem to use different sync algorithms, and system clock. I haven't had time to weed through all that and see if they're somehow equivalent to SC_CoreAudio, but it's worth a look.

Just to muddy the waters, I've also noticed sudden bursts of late messages on OSX in the last year or so, but can't say if they're related to waking from sleep.

Resyncing the scheduling clock with wall time after awaking might be a good idea, and could fix this regardless of the clocks.

That said, of course whatever clock is used still needs to (re-)sync to OSC time. If the OS NTP implementation is doing something funky with how it updates, none of this may solve the problem.

"A man with a watch knows what time it is. A man with two watches is never sure." ;-)

@sensestage

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2019

@jamshark70

This comment has been minimized.

Copy link
Contributor Author

commented Apr 6, 2019

Shifting linux to a monotonic clock (if it's not!) could help.

If it's skipping ahead 600 ms in an instant, it either simply isn't monotonic, or it's trying to be monotonic and there's a bug.

At a glance a potential issue is that SC_PortAudio and SC_Jack seem to use different sync algorithms, and system clock.

Hm, yeah, that's a recipe for confusion. Would be good to unify that.

That said, of course whatever clock is used still needs to (re-)sync to OSC time. If the OS NTP implementation is doing something funky with how it updates, none of this may solve the problem.

Hm, that is tricky -- if the monotonic clock doesn't precisely follow the system clock, then you would have to boot the server very quickly after launching sclang.

For multiple machines, I'm starting to think we might need a command line switch. There are two cases:

  • 2+ machines, NTP synced, where sclang on machine A is sending timestamped bundles to sclang or scsynth on machine B. Would a monotonic clock be valid for this? Maybe not. If monotonic clocks on two different machines run at slightly different rates, they would drift apart and the drift would probably be noticeable within a half hour.

  • 2+ machines, not necessarily NTP synced, using LinkClock to stay together. Ableton Link uses a monotonic clock internally and assumes that apps will not disagree with that. In this case, I think SC components should use a monotonic clock for local scheduling, and rely on Link to adjust. (I can reproduce broken Link sync pretty much on demand.)

Or, just document that it's recommended not to perform with SC immediately after waking the machine from sleep -- wake the machine at least 30 minutes before your set.

Marije:

This was in 2012 - so the remark 15+ years of the linux version existing may be incorrect

Thanks for the clarification -- I hadn't checked, good eye. Still, 7 years is kind of a long time. (I didn't spot it until I had made a live coding system that made it feasible to open the computer, launch SC and start playing music within a minute or two.)

@muellmusik

This comment has been minimized.

Copy link
Contributor

commented Apr 7, 2019

If it's skipping ahead 600 ms in an instant, it either simply isn't monotonic, or it's trying to be monotonic and there's a bug.

Aw sorry misunderstood. If it's definitely the high_resolution_clock (and not periodic correction to wall time) then you're right.

For multiple machines, I'm starting to think we might need a command line switch...

Well, I suggested just using the monotonic clock for both lang and server for local only cases. But I wouldn't get too complicated and use both for mixed situations. Either it's on or not I'd say.

Or, just document that it's recommended not to perform with SC immediately after waking the machine from sleep -- wake the machine at least 30 minutes before your set.

No, that's too much for me.

Forcing a resync of the offset between monotonic clock and wall time on wake might be sensible. it might be possible to ask for an NTP hard sync as well, but not sure if that's possible or wouldn't affect other apps in a negative way. (Though it sounds like something is funny anyway...)

@jamshark70

This comment has been minimized.

Copy link
Contributor Author

commented Apr 7, 2019

Aw sorry misunderstood. If it's definitely the high_resolution_clock (and not periodic correction to wall time) then you're right.

AFAICS there's a thread to sync the OSC offset to wall time, but we don't do the same for hrTimeOfInitialization. The latter is used for elapsedTime(), and elapsedTime() provides SystemClock.seconds. My data are comparing SystemClock.seconds between two machines. Unless I'm missing something, hrTimeOfInitialization is stable and the jump in time value could be coming only from the clock.

Also, when I tested against Link, I used the GUI in the LinkClock help file. The display update does not depend on OSC time in any way, but the desynchronization was plainly visible.

Unless there is code in there to resync hrTimeOfInitialization that I missed, it seems to be the hr clock.

Maybe a solution would be to resync hrTimeOfInitialization periodically as well.

Agreed on everything else.

@jamshark70

This comment has been minimized.

Copy link
Contributor Author

commented Apr 26, 2019

BTW: As a test, I'm currently running a local SC version where I've replaced references to high_resolution_clock with steady_clock, in both sclang and scsynth. (There aren't that many, in fact.)

So far, I haven't observed any bad timing behavior. OSC messaging latency behaves correctly. I tried it with ddwOSCSyncClocks and the behavior was at least no worse than before -- in some tests it was better, but that turned out not to be consistent.

I haven't tried it with Link yet.

So it might be reasonable, in Linux, to replace the default SystemClock reference, and maybe to introduce a commandline switch.

Tests to be done:

  • Tuplets (if steady_clock's resolution is lower, does that mean triplets, quintuplets etc. would drift?)

  • Link, especially Link after waking my Linux machine from sleep.

@jamshark70

This comment has been minimized.

Copy link
Contributor Author

commented May 19, 2019

I found recently that it isn't enough just to replace high_resolution_clock with steady_clock.

Occasionally I get streams of the following in the post window:

late 0.476187244
late 0.475691044
late 0.477262354
late 0.476695704
late 0.472331727
late 0.477599001
late 0.473280435
late 0.478532745
late 0.474173476
late 0.473699415
late 0.475085451
late 0.474630425

What's interesting is that these are all approximately the same value. If the interpreter blocks while sequencing is running, you will see decreasing "late" amounts as the scheduler catches up. It isn't interpreter blockage.

So I guess what is happening is that the language's OSC time is getting separated from scsynth's OSC time, which would happen if the two processes' resync threads are waking up at different times... but this would also require that the clock "jumps" (if you have two clocks increasing monotonically, you wouldn't see things suddenly late by hundreds of ms) -- meaning also that even steady_clock might not be truly steady.

This is a major bummer. Is there actually no reliable timing mechanism in Linux?

Possible workaround, btw, might be to add a /resync server command. Then the language could tell the server when it's about to resync its OSC time and they both do it at about the same time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.