sclang crash on system time sync #399
Comments
|
What are you using to set the time on network connection? Does the hardware have an RTC? |
|
the hardware does not have an RTC. it didn't occur to me that this would ever be an issue unfortunately. |
|
Okay. Apparently you can use Something like I guess the question is what counts as a "substantial amount"? If the issue is that it crashes when going from 1st Jan 1970 to the current date, but not if jump is only a few weeks/months then Otherwise someone will need to fire up This might be handy for setting up a reproducible crash on a desktop machine: https://serverfault.com/questions/138325/faking-the-date-for-a-specific-shell-session |
|
@artfwo has already done a lot of work on this including with |
|
Bother. Tried using
(skipped some output)
The 2 calls to |
|
[now seems like i was wrong about that, sorry] |
|
and it might be related to other stuff we're doing, like grabbing bus values with the shared memory interface |
|
I did try booting the server, but not in depth. I've edited the bug title to say Just tried on my desktop again (no Norns yet).
(skipped output)
(skipped output)
No crash with a I think someone that can observe a crash will need to try and minimise it. If you've got edit: I probably should say I can't guarantee that |
|
this is reproducible only in part, even when you change the actual system time using and yeah, raspbian is already using fake-hwclock+systemd-timesyncd out of the box. we haven't touched those. |
|
updated title since were not actually sure what the crash mechanism is, only that it can be (very sporadically) reproduced when enabling wifi |
|
it's related to timesync, this can be verified by watching syslog when the crash happens. it's also clear that the sporadicity of the bug decreases if you hold the box switched off for significant (hours) amount of time. |
|
Can confirm it's related to the time change, can reproduce this way:
This also happens why I just run The odd thing is that manually changing the date doesn't seem to trigger this issue. I'm not sure why this yet. Maybe it's related to how systemd-timesync works/changes the time? There is mention of gradually adjusting the time here https://www.freedesktop.org/software/systemd/man/systemd-timesyncd.service.html
One other odd thing, I've ran into the following a couple of times:
Not sure what's causing it. I did a diff and the unit that's loaded is no different than the unit on disk |
|
Apparently
https://www.freedesktop.org/software/systemd/man/systemd-timesyncd.service.html I'm wondering if there is some weird interplay between them. |
|
So, it seems this is a bug in supercollider where if the system time changes a large enough amount it'll consume 100% CPU for a longer period which will eventually result in a kernel panic. I've been able to replicate this on my laptop as well, so it's not related to ARM/the CM3.
When using smaller time delta's the following happens:
I'll file a bug for SuperCollider for this. I think for now the simplest way to address this is to disable timesync/systemd-timesyncd. This does mean we won't have timesync of course, but I don't think the system time is used in many places. @artfwo mentioned system time is used for naming recordings, we should probably change that to use something else like an incrementing number for this. |
|
forgive my ignorance, but would that be the same system timed used by posix we use that in metronomes: we also use |
|
@catfact Someone who knows C should probably answer that, but I think it sometimes is, depending on the first argument passed to it, see https://stackoverflow.com/a/12480485 |
|
thanks for checking. Ok, looks like metros should be fine (CLOCK_MONOTONIC) and |
|
Can someone who has repro'd attach an actual crash stack to this issue? |
|
i'll keep trying but haven't been able to get a nice stack trace or even a proper crash actually. what i get is arbitrarily long hangs where the server is just totally unresponsive. sclang output (in syslog since we're running it with systemd) is silent except when next will try restructuring things to work with a remote server and launching scsynth explicitly, maybe can catch more of its output. |
|
no, not getting kernel panics, just hangs. like last time i tried, it was running the
maybe i should take a snapshot of my filesystem since i haven't actually been using the update/sync process. :/ |
|
i've seen this jam-up of events happen typically with WIFI activation or attempted activation. this also happened with the old update method, where a long (20 seconds) os.execute was run, effectively stalling lua execution, and then all the metros would catch up and trigger at once. (os.execute should be run in a coroutine or the script should fork immediately, but the update routine has changed so this isn't an issue any longer) |
|
@simonvanderveldt @artfwo is there a strong reason not to simply disable systemd-timesyncd so we can move past this in the short term? |
@tehn I've been running with it disable for the last two weeks. Haven't noticed any issues because of the time being out of sync. There might be some funky issues eventually, mainly looking at security/encryption related things which can depend on some form of a somewhat sane time. But for example installing deb packages worked fine. @artfwo I believe you were mainly concerned about stuff like log ration, right? |
|
it would be worth testing the usb sync feature - iirc the rsync command being used is trying to do deltas in order to minimize the amount of data being synced. what i’m not sure of it whether rsync is using file modification timestamps as part of its selection criteria. |
|
(Non-local) OSC bundles require time sync to work - if you ever have an external device sending timing-sensitive OSC messages (or, for that matter, any OSC bundles that have a timestamp at all, whether or not you really care about timing), it'll be broken if your clocks aren't synced. Maybe not an immediate need, but definitely something to be aware of. |
|
@scztt Thanks! We can try that [edit] as @catfact said, if we're going to use supernova this probably won't be relevant. |
|
... not if we're using supernova? seems totally different @tehn agree, i got one too a couple hours ago. |
|
@simonvanderveldt https://github.com/supercollider/supercollider/blob/d0c3e438ec9cefda19af08d01db0b536829458f5/server/supernova/sc/sc_osc_handler.cpp#L730 is the corresponding chunk for supernova. This fix could possibly be mocked up in a very loose fashion using the |
|
short term solution.
|
|
we'll have to do
|
@tehn Why doesn't that work? |
|
@simonvanderveldt @tehn it works for me |
|
three more points:
|
|
oh indeed it works, i made a mistake in my launching earlier.
!!! would be nice on shutdown as well. |
|
I've run into reproducible screen lock ups, button latency and audio drop outs when I turn on wifi. I'm worried that all these hard resets might make the filesystem on the eMMC get corrupt. So I've been doing a reboot via SSH, which doesn't actually reboot the screen or wifi. I wonder if disabling NTP will fix this problem? Now that I have a shell, I'm tempted to do "normal linux stuff". |
|
@lazzarello when you turn on wifi and NTP sync's you MUST reboot crone. otherwise you'll experience problems. i'm preparing an update which disables crone on wifi connection, and implemented a menu option to reboot crone. |
|
fyi: i'm running into problems with the new wifi logic.
|
|
thanks for the report. i need to prioritize a wifi overhaul. the only thing that changed is that upon wifi activation crone is shut down (to be manually re-activated after connection is established.) i should add a message when activating wifi that explains this. i'm not sure why it would take two attempts, and why wifi is still saying failed, however. |
|
@tehn Why was this closed? I don't think we have a solution for this yet, right? |
|
this was closed in the great-ticket-purge at the march meetup. likely as it wasn't reported as a problem for some time. have there been issues with this? (AFAIK, no?) |
|
It still happens every now and then that someone runs into this. Maybe we should keep this open keep track of it? Although realistically there isn't much we'll be able to do about it ourselves. |
|
Mentioned by @lazzarello: ” I wonder if disabling NTP will fix this problem? ” This is probably one way to address this. Or have people manually sync upon confirming that the script/engine is restarted or something (weird ”fix” but would probably work anyhow). |
|
And this is messy still - any engine and script using a high rate polls stall upon time change. I think there’s reason to reopen this. |
|
ah, i forgot about the poll-heavy crash on startup w/ wifi attached. |
sclang crashes when time/date change by a substantial amount
The text was updated successfully, but these errors were encountered: