Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage on Windows 10, even if vsync is enabled #257

Open
FeldrinH opened this issue Jul 15, 2021 · 15 comments
Open

High CPU usage on Windows 10, even if vsync is enabled #257

FeldrinH opened this issue Jul 15, 2021 · 15 comments

Comments

@FeldrinH
Copy link

FeldrinH commented Jul 15, 2021

Running a simple project which only clears the screen the program seems to be consuming 100% of one CPU core at all times.

Both get_fps() and FRAPS show a stable 60 fps, which would indicate that vsync or some other frame limiting is enabled.
The program is compiled with cargo run --release.
Macroquad version is 0.3.7.
The system is a laptop with integrated graphics (Intel UHD 620) running Windows 10.

It seems that for some reason the program uses a thread 100% of the time for just waiting to draw the next frame.

The tested code is:

use macroquad::prelude::*;

#[macroquad::main("Test")]
async fn main() {
    loop {
        clear_background(RED);
        next_frame().await
    }
}
@louiidev
Copy link

Maybe the author can clarify, but I've looked over the code, doesn't seem like there's any thread sleeping going on, so it would make sense it just runs as fast as possible.

@not-fl3
Copy link
Owner

not-fl3 commented Jul 18, 2021

there is some sleeping: https://github.com/not-fl3/macroquad/blob/master/src/lib.rs#L559

Maybe it is not enough?

@FeldrinH
Copy link
Author

The docs are kinda vague on this, but from what I gather yield_now behaviour is highly platform specific and in many cases it may not sleep at all, if there are few other processes that need to run at the same time.

@not-fl3
Copy link
Owner

not-fl3 commented Jul 18, 2021

Also if it is windows/nvidia - it may be the driver issue https://stackoverflow.com/questions/21925313/100-cpu-utilization-when-using-vsync-opengl/21929019#21929019 - it looks like the driver is doing spinlock while waiting for a vsync

@FeldrinH
Copy link
Author

I suppose its possible. I have intel integrated graphics though.

Could also be related to this: rust-lang/rust#46774

Also, I tested wasm as well and the high cpu usage is absent on wasm. The cpu usage seems to be fairly minimal there.

@not-fl3
Copy link
Owner

not-fl3 commented Jul 18, 2021

Oh, intresting. Try to add some sleep right before calling next_frame(), maybe it will help?

@louiidev
Copy link

there is some sleeping: https://github.com/not-fl3/macroquad/blob/master/src/lib.rs#L559

Maybe it is not enough?

Ah, my bad. I was looking for the sleep fn.

@FeldrinH
Copy link
Author

Oh, intresting. Try to add some sleep right before calling next_frame(), maybe it will help?

Calling sleep does help cpu usage, however, this presents two new issues:

  • How do I know how long to sleep? According to Notes on the time module #134 there is no way to find out how much time drawing actually takes?
  • Thread sleep is not very portable (notably not present on wasm). This breaks macroquads stated goal of having all targets work the same with the exact same code.

@64kramsystem
Copy link

64kramsystem commented Aug 9, 2021

I'm experiencing the same issue, although on different hardware and O/S.

I suspect that this is a Macroquad problem; on my system, this doesn't happen on many other game engines (Amethyst, Tetra, Ggez), and also, my hardware is different from @FeldrinH.

@martinxyz
Copy link

Got this on Linux.

The backtrace shows that glFinish() ends with the nvidia driver calling sched_yield(), presumably in a busy-wait loop. (To confirm, with strace I'm counting 60k sched_yield syscalls per second.)

As suggested in #257 (comment) this seems to be a nvidia driver issue. A web search finds related issues in a couple of projects (e.g. gnome), but I don't understand what the solution was.

Thread 1 "tiled-test" received signal SIGINT, Interrupt.
0x00007ffff7a48857 in sched_yield () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff7a48857 in sched_yield () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff5c8651d in ?? () from /lib/x86_64-linux-gnu/libnvidia-glcore.so.470.63.01
#2  0x00007ffff5c91624 in ?? () from /lib/x86_64-linux-gnu/libnvidia-glcore.so.470.63.01
#3  0x00007ffff5c0e1cf in ?? () from /lib/x86_64-linux-gnu/libnvidia-glcore.so.470.63.01
#4  0x00007ffff5813f64 in ?? () from /lib/x86_64-linux-gnu/libnvidia-glcore.so.470.63.01
#5  0x00005555555b2c77 in macroquad::{impl#2}::draw (self=0x1) at /home/martin/.cargo/registry/src/github.com-1ecc6299db9ec823/macroquad-0.3.10/src/lib.rs:646
#6  0x0000555555689b1b in miniquad::frame (user_data=0x5555557566a0) at /home/martin/.cargo/registry/src/github.com-1ecc6299db9ec823/miniquad-0.3.0-alpha.37/src/lib.rs:272
#7  0x0000555555690e27 in sapp_linux::_sapp_call_frame () at /home/martin/.cargo/registry/src/github.com-1ecc6299db9ec823/sapp-linux-0.1.13/src/lib.rs:2627
#8  0x0000555555690e47 in sapp_linux::_sapp_frame () at /home/martin/.cargo/registry/src/github.com-1ecc6299db9ec823/sapp-linux-0.1.13/src/lib.rs:2639
#9  0x0000555555691696 in sapp_linux::sapp_run (desc=0x7fffffffcb88) at /home/martin/.cargo/registry/src/github.com-1ecc6299db9ec823/sapp-linux-0.1.13/src/lib.rs:2783
...
(gdb) up 5
#5  0x00005555555b2c77 in macroquad::{impl#2}::draw (self=0x1) at /home/martin/.cargo/registry/src/github.com-1ecc6299db9ec823/macroquad-0.3.10/src/lib.rs:646
646                         miniquad::gl::glFinish();
(gdb) l

@64kramsystem
Copy link

Got this on Linux.

The backtrace shows that glFinish() ends with the nvidia driver calling sched_yield(), presumably in a busy-wait loop. (To confirm, with strace I'm counting 60k sched_yield syscalls per second.)

As suggested in #257 (comment) this seems to be a nvidia driver issue. A web search finds related issues in a couple of projects (e.g. gnome), but I don't understand what the solution was.

With the information at hand (thanks @martinxyz), and after some digging, I think it's an overlook ("bug-ish" sort of thing) in MQ.

I've checked the glFinish() invocation history, and it is used for telemetry:

commit 05e69c49c5e8b5281f7f6645882168628e45b89f
Author: Fedor Logachev <not.fl3@gmail.com>
Date:   Thu Jan 7 12:11:04 2021 -0600

    telemetry: More internal metrics

diff --git a/profiler/src/lib.rs b/profiler/src/lib.rs
index 0317101..4286e55 100644
--- a/profiler/src/lib.rs
+++ b/profiler/src/lib.rs
@@ -188,14 +188,20 @@ fn zone_ui(ui: &mut Ui, zone: &Zone, n: usize) {
                     }
                 }
 
-                ui.label(
-                    None,
-                    &format!(
-                        "Full frame time: {:.3}ms {:.1}(1/t)",
-                        get_frame_time(),
-                        (1.0 / get_frame_time())
-                    ),
-                );
+                if let Some(frame) = state
+                    .selected_frame
+                    .as_ref()
+                    .or_else(|| state.frames_buffer.get(0))
+                {
+                    ui.label(
+                        None,
+                        &format!(
+                            "Full frame time: {:.3}ms {:.1}(1/t)",
+                            frame.full_frame_time,
+                            (1.0 / frame.full_frame_time)
+                        ),
+                    );
+                }
 
                 if state.paused {
                     if ui.button(None, "resume") {
diff --git a/src/lib.rs b/src/lib.rs
index 486e01c..4e3bd7f 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -247,6 +247,8 @@ fn draw(&mut self) {
             let _z = telemetry::ZoneGuard::new("Event::draw");
 
             if let Some(future) = unsafe { MAIN_FUTURE.as_mut() } {
+                let _z = telemetry::ZoneGuard::new("Main loop");
+
                 get_context().begin_frame();
 
                 if exec::resume(future) {
@@ -263,6 +265,16 @@ fn draw(&mut self) {
 
             get_context().frame_time = date::now() - get_context().last_frame_time;
             get_context().last_frame_time = date::now();
+
+            #[cfg(any(target_arch = "wasm32", target_os = "linux"))]
+            {
+                let _z = telemetry::ZoneGuard::new("glFinish/glFLush");
+
+                unsafe {
+                    miniquad::gl::glFlush();
+                    miniquad::gl::glFinish();
+                }
+            }
         }
 
         telemetry::reset();

I've had a look around, and for example, Intel discourages using glFinish() in its OpenGL optimization guide.

Potentially, this could be a misuse of the OpenGL APIs, but I'm no expert in this.

Regardless, at the very least, telemetry should be preset according to the profile (e.g. disabled in release mode, and enabled in debug mode), or anyway, it should be configurable, and if enabled in release mode, it should be very clearly documented that it can potentially hurt performance.

@64kramsystem
Copy link

Note: the code above strictly relates to Linux (as it's guarded by a feature), so it's very likely unrelated to the current issue (which is on Windows).

I've copied my comment to the Linux-specific issue (#275).

@martinxyz FYI

@flaribbit
Copy link

Same problem.

Adding some sleep before next_frame() will alleviate this problem. But it is so weird.

After trying some different versions, I found that 0.3.26 does not have this problem, but 0.4.0 does. I ran a profiling on 0.4.5, the result is as follows. Hope it can help fix the problem.

image

image

btw, I noticed that when I ran the program with 0.3.26, the console showes:

WGL: failed to create 3.2 context, trying 2.1
GL_VERSION: 4.6.0 - Build 26.20.100.7642

but nothing with 0.4.0. Maybe it is related to the version of WGL or not?

@SeedyROM
Copy link

@flaribbit I can verify this same result with both versions. I don't need anything fancy so I'm opting to downgrade for now.

@brettchalupa
Copy link

brettchalupa commented Aug 4, 2024

I recognize that this issue is specifically about Windows, but I want to share that I noticed high CPU usage on MacOS starting in v0.4.4 and pinpointed the specific commit using git bisect, which ended up being an issue in Miniquad. I logged the issue here: not-fl3/miniquad#470

@flaribbit have you ever used git bisect before? It's a useful tool for finding what exact commit caused the problem! So for this case you'd checkout the Macroquad codebase and follow these steps:

  1. git bisect start — start the bisect process
  2. git checkout v0.4.0 — test the problem, like cargo run --example texture --release, and verify it's present
  3. git bisect bad — mark the commit as "bad"
  4. git checkout v0.3.26 — go to a working commit and test it out, if problem isn't present then:
  5. git bisect good — mark the commit as "good"

Git will then bisect through the commits based on the good and bad marker. You test out each commit, marking it as either git bisect good or git bisect bad. Eventually you'll land on the commit that introduced the regression, which can majorly help with fixing it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants