Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: 100% cpu for examples/platformer-tutorial/1-start #145

Closed
glycerine opened this issue Feb 4, 2021 · 8 comments
Closed

Q: 100% cpu for examples/platformer-tutorial/1-start #145

glycerine opened this issue Feb 4, 2021 · 8 comments

Comments

@glycerine
Copy link

Hi, oak looks really neat.

I'm just worried a little about efficiency. I run the examples/platformer-tutorial/1-start example on ubuntu 18.04 under go1.15.7 and it consumes 100% of one cpu... while doing... what looks like nothing.

Is there a way to make it do "less" busy work?

The 6-complete does the same, even when I'm not pressing any key.

@200sc
Copy link
Contributor

200sc commented Feb 4, 2021

There are a few ongoing routines in oak programs with default settings that will run up the CPU on purpose:

  1. the event handler, which is continuously looping to resolve ongoing event bindings/unbindings and
  2. the render loop, which will draw to the window as fast as the given restricted frame rate. (Oak doesn't use C APIs by default on Windows / Linux, so GPU pressure is moved to the CPU as there isn't, to my knowledge, a cgo-less GPU library in Go right now)

Both of these loops can be slowed down to reduce CPU pressure by providing a configuration that reduces the default rate (60 fps for each), with the (logical) FrameRate and DrawFrameRate config settings: https://github.com/oakmound/oak/blob/master/config.go#L58

There are a few other approaches-- if you think the default routines are still taking up too much CPU time, they can be swapped out with your own implementations. Obviously if you want a fully featured renderer from scratch that'll probably be tricky to get optimized, but you could theoretically get a long way there by starting with a library that calls out to the GPU with C bindings.

Finally there have been some optimizations made in develop that haven't made it back to a versioned release yet if you wanted to give it a shot, but I wouldn't imagine it would drastically change the behavior you're seeing.

We're also always open to PRs or suggestions if you see something that you would like to try to optimize, or if you have an additional configuration concept that would help your use case.

@glycerine
Copy link
Author

Thanks @200sc. I'll play with those configurations. I do really like that oak doesn't depend on cgo.

@glycerine
Copy link
Author

glycerine commented Feb 4, 2021

Changing the frame rates even to 1 per second had no impact. So I profiled 1-start for 10 seconds. I see that 92% of the cpu is being burned here in ResolvePending() during its calls to Bus.Flush() at the top of oak/event/resolve.go

// ResolvePending is a contant loop that tracks slices of bind or unbind calls                                        
// and resolves them individually such that they don't break the bus                                                  
// Todo: this should be a function on the event bus itself, and should have a better name                             
// If you ask "Why does this not use select over channels, share memory by communicating",                            
// the answer is we tried, and it was cripplingly slow.                                                               
func (eb *Bus) ResolvePending() {
    eb.init.Do(func() {
        for {
            eb.Flush()
        }
    })
}

profile:

File: 1-start
Type: cpu
Time: Feb 4, 2021 at 4:34pm (CST)
Duration: 10.15s, Total samples = 10.34s (101.91%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 10.20s, 98.65% of 10.34s total
Dropped 60 nodes (cum <= 0.05s)
Showing top 10 nodes out of 16
      flat  flat%   sum%        cum   cum%
     9.57s 92.55% 92.55%      9.57s 92.55%  github.com/oakmound/oak/v2/event.(*Bus).Flush
     0.46s  4.45% 97.00%     10.03s 97.00%  github.com/oakmound/oak/v2/event.(*Bus).ResolvePending.func1
     0.10s  0.97% 97.97%      0.10s  0.97%  runtime.memmove
     0.07s  0.68% 98.65%      0.07s  0.68%  github.com/oakmound/shiny/driver/internal/swizzle.bgra16
         0     0% 98.65%      0.19s  1.84%  github.com/oakmound/oak/v2.drawLoop
         0     0% 98.65%      0.07s  0.68%  github.com/oakmound/oak/v2.glob..func1
         0     0% 98.65%     10.03s 97.00%  github.com/oakmound/oak/v2/event.(*Bus).ResolvePending
         0     0% 98.65%      0.07s  0.68%  github.com/oakmound/shiny/driver/internal/swizzle.BGRA
         0     0% 98.65%      0.06s  0.58%  github.com/oakmound/shiny/driver/x11driver.(*bufferImpl).postUpload
         0     0% 98.65%      0.07s  0.68%  github.com/oakmound/shiny/driver/x11driver.(*bufferImpl).upload
(pprof)

If I pause for 100ms between Flush() calls, then I find 420ms out of that 10 second sample are being used according to the profile, much better. An average of 5% of 1 cpu is being used, according to htop.

Showing nodes accounting for 420ms, 100% of 420ms total
Showing top 10 nodes out of 40
      flat  flat%   sum%        cum   cum%
     260ms 61.90% 61.90%      260ms 61.90%  github.com/oakmound/shiny/driver/internal/swizzle.bgra16
      30ms  7.14% 69.05%       30ms  7.14%  runtime.runqgrab
      30ms  7.14% 76.19%       60ms 14.29%  runtime.runqsteal
      20ms  4.76% 80.95%       20ms  4.76%  runtime.epollwait
      20ms  4.76% 85.71%      130ms 30.95%  runtime.findrunnable
      20ms  4.76% 90.48%       20ms  4.76%  syscall.Syscall
      10ms  2.38% 92.86%       10ms  2.38%  runtime.(*randomEnum).next (inline)
      10ms  2.38% 95.24%       10ms  2.38%  runtime.memclrNoHeapPointers
      10ms  2.38% 97.62%       10ms  2.38%  runtime.notesleep
      10ms  2.38%   100%       20ms  4.76%  runtime.stopm
(pprof) 

The examples seem to play okay with a time.Sleep(100 * time.Millisecond) inserted before each eb.Flush().

func (eb *Bus) ResolvePending() {
    eb.init.Do(func() {
        for {
            time.Sleep(100 * time.Millisecond)
            eb.Flush()
        }
    })
}

Since I don't know the architecture, what would be the expected impact of inserting such a time.Sleep()?

(Obviously on my fork, not suggesting it makes sense for general use).

@200sc
Copy link
Contributor

200sc commented Feb 5, 2021

I just ran our current project with this change and it seems like a huge performance improvement with, as far as I can tell, no detectable gameplay difference!

To answer your question:

The impact would more or less be that it would take up to however long that sleep is for newly created or destroyed entities to start or stop responding to events respectively.

It seems like a good change in general-- the event bus was the first code we wrote for oak and it could definitely use some improvement. This could be a configuration option as well for the default bus-- EventRefreshRate or something of the like, defaulting to 0ms for backwards compatibility.

For some games that kind of sleep might make an impact (high speed games like a shoot-em-up), and some existing games might not work with such a sleep (they might assume that as soon as an entity is destroyed it will stop responding to events before the next tick), but as a configuration option I don't see any issue with it, and would probably default it to 50ms or higher in the next breaking release, assuming the event bus had a similar structure.

@glycerine
Copy link
Author

glycerine commented Feb 5, 2021

Very interesting. The other thing I might suggest trying would be a simple condition variable. These can be more efficient than Go channels (which is why the Go ssh and http2 packages use them), and then there's never any busy waiting burning cpu if there's really nothing to do. At the same time, you should get fast response when its needed.

@200sc 200sc mentioned this issue Feb 20, 2021
@200sc
Copy link
Contributor

200sc commented Feb 20, 2021

The upcoming release will add a EventRefreshRate configuration option for this purpose. I'm not sure how I would use a single conditional variable for the loop being used (which is currently checking the length of five slices). In practice I found a setting of over a few hundred milliseconds gave a poor experience-- when clicking through menus, for example, buttons would take up to the refresh rate to react to clicks if the button was newly created by a dropdown / tab.

@glycerine
Copy link
Author

great. Look forward to trying it.

@200sc
Copy link
Contributor

200sc commented Feb 25, 2021

Flag added, merged and tagged. Feel free to reopen if there's a bug with the implementation.

@200sc 200sc closed this as completed Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants