High CPU usage on Windows 10, even if vsync is enabled #257

FeldrinH · 2021-07-15T20:04:17Z

Running a simple project which only clears the screen the program seems to be consuming 100% of one CPU core at all times.

Both get_fps() and FRAPS show a stable 60 fps, which would indicate that vsync or some other frame limiting is enabled.
The program is compiled with cargo run --release.
Macroquad version is 0.3.7.
The system is a laptop with integrated graphics (Intel UHD 620) running Windows 10.

It seems that for some reason the program uses a thread 100% of the time for just waiting to draw the next frame.

The tested code is:

use macroquad::prelude::*;

#[macroquad::main("Test")]
async fn main() {
    loop {
        clear_background(RED);
        next_frame().await
    }
}

The text was updated successfully, but these errors were encountered:

louiidev · 2021-07-18T10:26:45Z

Maybe the author can clarify, but I've looked over the code, doesn't seem like there's any thread sleeping going on, so it would make sense it just runs as fast as possible.

not-fl3 · 2021-07-18T17:03:11Z

there is some sleeping: https://github.com/not-fl3/macroquad/blob/master/src/lib.rs#L559

Maybe it is not enough?

FeldrinH · 2021-07-18T19:38:19Z

The docs are kinda vague on this, but from what I gather yield_now behaviour is highly platform specific and in many cases it may not sleep at all, if there are few other processes that need to run at the same time.

not-fl3 · 2021-07-18T19:47:09Z

Also if it is windows/nvidia - it may be the driver issue https://stackoverflow.com/questions/21925313/100-cpu-utilization-when-using-vsync-opengl/21929019#21929019 - it looks like the driver is doing spinlock while waiting for a vsync

FeldrinH · 2021-07-18T19:55:29Z

I suppose its possible. I have intel integrated graphics though.

Could also be related to this: rust-lang/rust#46774

Also, I tested wasm as well and the high cpu usage is absent on wasm. The cpu usage seems to be fairly minimal there.

not-fl3 · 2021-07-18T20:12:29Z

Oh, intresting. Try to add some sleep right before calling next_frame(), maybe it will help?

louiidev · 2021-07-19T02:42:09Z

there is some sleeping: https://github.com/not-fl3/macroquad/blob/master/src/lib.rs#L559

Maybe it is not enough?

Ah, my bad. I was looking for the sleep fn.

FeldrinH · 2021-07-19T13:44:37Z

Oh, intresting. Try to add some sleep right before calling next_frame(), maybe it will help?

Calling sleep does help cpu usage, however, this presents two new issues:

How do I know how long to sleep? According to Notes on the time module #134 there is no way to find out how much time drawing actually takes?
Thread sleep is not very portable (notably not present on wasm). This breaks macroquads stated goal of having all targets work the same with the exact same code.

64kramsystem · 2021-08-09T21:05:47Z

I'm experiencing the same issue, although on different hardware and O/S.

I suspect that this is a Macroquad problem; on my system, this doesn't happen on many other game engines (Amethyst, Tetra, Ggez), and also, my hardware is different from @FeldrinH.

martinxyz · 2021-10-10T16:17:54Z

Got this on Linux.

The backtrace shows that glFinish() ends with the nvidia driver calling sched_yield(), presumably in a busy-wait loop. (To confirm, with strace I'm counting 60k sched_yield syscalls per second.)

As suggested in #257 (comment) this seems to be a nvidia driver issue. A web search finds related issues in a couple of projects (e.g. gnome), but I don't understand what the solution was.

Thread 1 "tiled-test" received signal SIGINT, Interrupt.
0x00007ffff7a48857 in sched_yield () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff7a48857 in sched_yield () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff5c8651d in ?? () from /lib/x86_64-linux-gnu/libnvidia-glcore.so.470.63.01
#2  0x00007ffff5c91624 in ?? () from /lib/x86_64-linux-gnu/libnvidia-glcore.so.470.63.01
#3  0x00007ffff5c0e1cf in ?? () from /lib/x86_64-linux-gnu/libnvidia-glcore.so.470.63.01
#4  0x00007ffff5813f64 in ?? () from /lib/x86_64-linux-gnu/libnvidia-glcore.so.470.63.01
#5  0x00005555555b2c77 in macroquad::{impl#2}::draw (self=0x1) at /home/martin/.cargo/registry/src/github.com-1ecc6299db9ec823/macroquad-0.3.10/src/lib.rs:646
#6  0x0000555555689b1b in miniquad::frame (user_data=0x5555557566a0) at /home/martin/.cargo/registry/src/github.com-1ecc6299db9ec823/miniquad-0.3.0-alpha.37/src/lib.rs:272
#7  0x0000555555690e27 in sapp_linux::_sapp_call_frame () at /home/martin/.cargo/registry/src/github.com-1ecc6299db9ec823/sapp-linux-0.1.13/src/lib.rs:2627
#8  0x0000555555690e47 in sapp_linux::_sapp_frame () at /home/martin/.cargo/registry/src/github.com-1ecc6299db9ec823/sapp-linux-0.1.13/src/lib.rs:2639
#9  0x0000555555691696 in sapp_linux::sapp_run (desc=0x7fffffffcb88) at /home/martin/.cargo/registry/src/github.com-1ecc6299db9ec823/sapp-linux-0.1.13/src/lib.rs:2783
...
(gdb) up 5
#5  0x00005555555b2c77 in macroquad::{impl#2}::draw (self=0x1) at /home/martin/.cargo/registry/src/github.com-1ecc6299db9ec823/macroquad-0.3.10/src/lib.rs:646
646                         miniquad::gl::glFinish();
(gdb) l

64kramsystem · 2021-10-10T17:47:20Z

Got this on Linux.

The backtrace shows that glFinish() ends with the nvidia driver calling sched_yield(), presumably in a busy-wait loop. (To confirm, with strace I'm counting 60k sched_yield syscalls per second.)

As suggested in #257 (comment) this seems to be a nvidia driver issue. A web search finds related issues in a couple of projects (e.g. gnome), but I don't understand what the solution was.

With the information at hand (thanks @martinxyz), and after some digging, I think it's an overlook ("bug-ish" sort of thing) in MQ.

I've checked the glFinish() invocation history, and it is used for telemetry:

commit 05e69c49c5e8b5281f7f6645882168628e45b89f
Author: Fedor Logachev <not.fl3@gmail.com>
Date:   Thu Jan 7 12:11:04 2021 -0600

    telemetry: More internal metrics

diff --git a/profiler/src/lib.rs b/profiler/src/lib.rs
index 0317101..4286e55 100644
--- a/profiler/src/lib.rs
+++ b/profiler/src/lib.rs
@@ -188,14 +188,20 @@ fn zone_ui(ui: &mut Ui, zone: &Zone, n: usize) {
                     }
                 }
 
-                ui.label(
-                    None,
-                    &format!(
-                        "Full frame time: {:.3}ms {:.1}(1/t)",
-                        get_frame_time(),
-                        (1.0 / get_frame_time())
-                    ),
-                );
+                if let Some(frame) = state
+                    .selected_frame
+                    .as_ref()
+                    .or_else(|| state.frames_buffer.get(0))
+                {
+                    ui.label(
+                        None,
+                        &format!(
+                            "Full frame time: {:.3}ms {:.1}(1/t)",
+                            frame.full_frame_time,
+                            (1.0 / frame.full_frame_time)
+                        ),
+                    );
+                }
 
                 if state.paused {
                     if ui.button(None, "resume") {
diff --git a/src/lib.rs b/src/lib.rs
index 486e01c..4e3bd7f 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -247,6 +247,8 @@ fn draw(&mut self) {
             let _z = telemetry::ZoneGuard::new("Event::draw");
 
             if let Some(future) = unsafe { MAIN_FUTURE.as_mut() } {
+                let _z = telemetry::ZoneGuard::new("Main loop");
+
                 get_context().begin_frame();
 
                 if exec::resume(future) {
@@ -263,6 +265,16 @@ fn draw(&mut self) {
 
             get_context().frame_time = date::now() - get_context().last_frame_time;
             get_context().last_frame_time = date::now();
+
+            #[cfg(any(target_arch = "wasm32", target_os = "linux"))]
+            {
+                let _z = telemetry::ZoneGuard::new("glFinish/glFLush");
+
+                unsafe {
+                    miniquad::gl::glFlush();
+                    miniquad::gl::glFinish();
+                }
+            }
         }
 
         telemetry::reset();

I've had a look around, and for example, Intel discourages using glFinish() in its OpenGL optimization guide.

Potentially, this could be a misuse of the OpenGL APIs, but I'm no expert in this.

Regardless, at the very least, telemetry should be preset according to the profile (e.g. disabled in release mode, and enabled in debug mode), or anyway, it should be configurable, and if enabled in release mode, it should be very clearly documented that it can potentially hurt performance.

64kramsystem · 2021-10-10T18:00:13Z

Note: the code above strictly relates to Linux (as it's guarded by a feature), so it's very likely unrelated to the current issue (which is on Windows).

I've copied my comment to the Linux-specific issue (#275).

@martinxyz FYI

flaribbit · 2024-05-19T11:35:56Z

Same problem.

Adding some sleep before next_frame() will alleviate this problem. But it is so weird.

After trying some different versions, I found that 0.3.26 does not have this problem, but 0.4.0 does. I ran a profiling on 0.4.5, the result is as follows. Hope it can help fix the problem.

btw, I noticed that when I ran the program with 0.3.26, the console showes:

WGL: failed to create 3.2 context, trying 2.1
GL_VERSION: 4.6.0 - Build 26.20.100.7642

but nothing with 0.4.0. Maybe it is related to the version of WGL or not?

SeedyROM · 2024-07-27T05:37:45Z

@flaribbit I can verify this same result with both versions. I don't need anything fancy so I'm opting to downgrade for now.

brettchalupa · 2024-08-04T01:13:47Z

I recognize that this issue is specifically about Windows, but I want to share that I noticed high CPU usage on MacOS starting in v0.4.4 and pinpointed the specific commit using git bisect, which ended up being an issue in Miniquad. I logged the issue here: not-fl3/miniquad#470

@flaribbit have you ever used git bisect before? It's a useful tool for finding what exact commit caused the problem! So for this case you'd checkout the Macroquad codebase and follow these steps:

git bisect start — start the bisect process
git checkout v0.4.0 — test the problem, like cargo run --example texture --release, and verify it's present
git bisect bad — mark the commit as "bad"
git checkout v0.3.26 — go to a working commit and test it out, if problem isn't present then:
git bisect good — mark the commit as "good"

Git will then bisect through the commits based on the good and bad marker. You test out each commit, marking it as either git bisect good or git bisect bad. Eventually you'll land on the commit that introduced the regression, which can majorly help with fixing it!

64kramsystem mentioned this issue Aug 9, 2021

High (100%) CPU usage on Linux/Nvidia, even with vsync enabled #275

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High CPU usage on Windows 10, even if vsync is enabled #257

High CPU usage on Windows 10, even if vsync is enabled #257

FeldrinH commented Jul 15, 2021 •

edited

Loading

louiidev commented Jul 18, 2021

not-fl3 commented Jul 18, 2021

FeldrinH commented Jul 18, 2021

not-fl3 commented Jul 18, 2021

FeldrinH commented Jul 18, 2021

not-fl3 commented Jul 18, 2021

louiidev commented Jul 19, 2021

FeldrinH commented Jul 19, 2021

64kramsystem commented Aug 9, 2021 •

edited

Loading

martinxyz commented Oct 10, 2021

64kramsystem commented Oct 10, 2021

64kramsystem commented Oct 10, 2021

flaribbit commented May 19, 2024

SeedyROM commented Jul 27, 2024

brettchalupa commented Aug 4, 2024 •

edited

Loading

High CPU usage on Windows 10, even if vsync is enabled #257

High CPU usage on Windows 10, even if vsync is enabled #257

Comments

FeldrinH commented Jul 15, 2021 • edited Loading

louiidev commented Jul 18, 2021

not-fl3 commented Jul 18, 2021

FeldrinH commented Jul 18, 2021

not-fl3 commented Jul 18, 2021

FeldrinH commented Jul 18, 2021

not-fl3 commented Jul 18, 2021

louiidev commented Jul 19, 2021

FeldrinH commented Jul 19, 2021

64kramsystem commented Aug 9, 2021 • edited Loading

martinxyz commented Oct 10, 2021

64kramsystem commented Oct 10, 2021

64kramsystem commented Oct 10, 2021

flaribbit commented May 19, 2024

SeedyROM commented Jul 27, 2024

brettchalupa commented Aug 4, 2024 • edited Loading

FeldrinH commented Jul 15, 2021 •

edited

Loading

64kramsystem commented Aug 9, 2021 •

edited

Loading

brettchalupa commented Aug 4, 2024 •

edited

Loading