Cpu hog 100% by sclang when it's built without QT #2144
Comments
|
What OS, environment, compiled ... ? |
|
OS - Ubuntu-16.04, |
|
OS - Ubuntu-16.04,Intel(R) Core(TM) i5 CPU M 460 @ 2.53GH
What OS, environment, compiled ... ?— |
|
@hardiksingh-rathore can you still reproduce this? |
|
Actually, you know what? I can repro this. Running:
consumes (at most) 2% of CPU and 2% of memory. Running But when I open Maybe there's something constantly polling? |
|
(I'm on Linux, and compiling from this week's master) |
|
This is not present in SC-3.6.6, So you can differentiate the code SC-3.6 and SC-3.7. |
|
I can try to git-bisect this but it's going to be a long slog since I have to recompile each time. Anyone have thoughts about where the issue might have come from? |
|
This is kinda rough, slow going. Most older commits don't build anymore. Today I've narrowed it from between (Earliest known bad):
(Latest known good):
I'll be working on narrowing it down more tonight and tomorrow, but if anyone can think of an obvious code change between these points in time, that could help! |
|
(Unless "NotificationCenter" is part of the problem -- I've had to disable it in several of the builds, so if the problem lies there I'm bisecting in the wrong direction) |
|
Phew! I think I found it! Pretty sure the bug is in 96e56b5, although it's possible it's in its parent ( 871aa6e ). Can someone who is more familiar with sclang innards take a look at this? Repro steps are:
(As above, you can also call |
|
@timblechmann this code is fairly above my head, but maybe you have a better understanding of where it's starting to endlessly spin? (e.g. constant polling?) |
|
I ran into the same problem on a headless Raspberry Pi with Archlinux. One workaround I've found is to shutdown this routine by calling Since the aliveThread is down, you need to call |
|
@smiarx @vivid-synth I think we need one more test to clarify if it's the AppClock that's broken or all clocks. Disable the server's alive thread. Then, does this work?
Or this?
Or...?
I'm guessing 1 and 2 will be ok and 3 will die. |
|
That was my first guess too, but all three routines work correctly. I even try to do something similar to the alive thread in the prompt
and it runs correctly too, without cpu overloading. Maybe it only breaks in compiled classes and not in the prompt ? |
|
Could you try also with this?
|
|
I tried and it makes the CPU go 100% again, but I noticed that I was wrong in my last comment, and it does the same with But I've found that the problematic part is the |
|
I have a feeling that the xruns I experienced during a performance last night were at least partially due to this bug. Does anyone know the way forward on this? |
|
I'd like to have a look at it after merging #2422. @smiarx interesting. So when you get rid of the defer of this function, it works? Only when you run Note (even if this is unrelated perhaps) that there are a lot of |
|
@vivid-synth I don't know the boost internals at all, so I cn only help to continue to trace down the problem from the sclang side. As it seems, the problem depends on the running routine So maybe you can comment out lines of code to see which one causes it:
|
|
I can confirm that "s.statusWatcher.stopAliveThread" makes CPU usage drop again on a headless Raspberry Pi 3, scsynth 3.9dev built from branch 'master' [28713cd]. |
|
Since nobody ran my test, I just did it myself. First I reproduced the issue. Then I tried my small test. Result: no problem. So I ran other tests, and as far as I can see, the issue is calling
OK... wait... I got it.
If a routine is pending on AppClock ( Something about the main application loop, then? You can thank me in beer, later. |
|
beers forever! |
|
(Side node: the Qt "primitive not bound" referred to above issue is here: #1209 . Non-IDE SC could definitely use some love!) |
|
@jamshark70 I'm guessing the issue is in Run it in a debugger and see where it's getting stuck. Each schedule update triggers a 'tick'. I can't quite see the problem, but I think it's getting stuck in a loop. The tick() method is not much different than the IDE one, but uses a different timer. It is possible that Also, note this:
Can't test myself at the moment, sorry... |
|
Well, I've got the environment set up, so it may be easier for me to check it out than for others, but... the extent of my gdb knowledge for this issue is To be honest, I would rather hand this off to people who understand how the clocks work. I'm good at differential testing and diagnosis, not so good with debuggers. |
|
@jamshark70 I'm not much help I'm afraid. I'm lazy and usually use it with a GUI front end so I always have to look it up if I do cl. But there're lots of good tutorials if you google. (I'm sure there was a nice front end I used last time I had to do stuff on Linux...) Or maybe there's something like spindump that would let you sample? |
|
@jamshark70 I can recommend using nemiver a graphical frontend for gdb which doesn't require any setup, you use it like you would use gdb. I can also recommend QtCreator which can also open a cmake project. Just do new project -> import project -> import existing project and select the supercollider source project. Then you can use the built-in debbuger front end. |
|
I tried to run this in gdb, but I get some funny results (despite building with
this said, I try to print some values
and this floods my console once I do
Ultimately, this - quick and dirty - fixes the CPU hog, but I am not sure if it introduces any horrible side effects (note: I got the magic number 70000 from this old help file which specifies 0.7 seconds as the default time interval for
Note that I know basically nothing about SC, but if someone provides pointers as to what and how needs to be tested, I can spend some time on this (working on this for Bela, but a better solution than the above should benefit upstream SC as well). |
|
A few hours later: it seems that what is happening is that
So, it happens that I have never used NOTE: I am actually getting a
|
…This seems to be the solution for sensestage/supercollider/BelaPlatform#34 and supercollider/supercollider/supercollider#2144 , however I have no explanation as to why this issue only manifests itself when QT is disabled
|
Anyone with knowledge of boost have thoughts about this one? (Maybe @brianlheim or @scztt ?) This is tempting: a nice small solution to a painful bug we've had for a long time. |
|
[trying to understand the logic behind the timer usage here]
The 'tick' function is being called multiple times because of the 'cancel'
in that same function, which was added here:
788b530
The relevant notes in the commit might be:
* avoid deadlock in sendSignal
* fix thread-safety issue with asio timer
Is 'cancel' supposed to be called *within* the service function itself, and
does it in fact need to be called? [I don't have a standalone build to try
this on at this moment]
I assume the deadlock is partially avoided by the change from 'dispatch' to
'post' in the same function. I'm not sure that this 'cancel' would help
regarding the thread-safety issue mentioned in the commit message.
Maybe there's some other method by which 'tick' calls were being racked up,
such that the 'cancel' was necessary?
Will forward this conversation to the author of the original fix.
…On 4 September 2017 at 19:03, LFSaw ***@***.***> wrote:
seems like @giuliomoro <https://github.com/giuliomoro> found a solution
to this here: ***@***.***
<giuliomoro@79cb016>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2144 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGqHE4n2ZqTpLB_p9thXvKQlZuVZMsh_ks5sfDuBgaJpZM4IrZ1z>
.
|
|
Tried it; removing the 'cancel' alone doesn't work as an alternate fix. This implies that either it's the correct thing to do in this spot (and @giuliomoro's fix is necessary too), or that the 'tick' routine is being queued up more than is needed. |
|
@snappizz @GregBakker @vivid-synth what's the status of this bug? is anyone looking into it currently? |
|
I don't use sclang much these days so I haven't been working on it, but I'd be very happy for it to be fixed! |
|
@giuliomoro would you like to create a PR for this repo with your patch? it works wonderfully :) |
…This seems to be the solution for sensestage/supercollider/supercollider#34 and supercollider/supercollider/supercollider#2144 , however I have no explanation as to why this issue only manifests itself when QT is disabled
|
@giuliomoro's patch seems correct to me. i've filed it as a PR: #3772 i've also figured out the mystery of why sclang behaves differently with and without Qt. it's quite simple -- right at the bottom of
so when
turns out so there's a more fundamental problem of fragmentation of sclang with and without qt that probably should be addressed. (i'm guessing it's a historical artifact of the way QtCollider was originally developed.) but i do think this patch is okay nevertheless. |
|
Closed in #3772 |
sclang cpu hog issue in supercollider 3.7 when we build it without QT. the sclang process never goes to sleep state and using 100% cpu after booting server.
The text was updated successfully, but these errors were encountered: