High (100%) CPU usage on Linux/Nvidia, even with vsync enabled #275

64kramsystem · 2021-08-09T21:08:33Z

Seems to be the same issue as #257, but leaving here for tracking purposes in case somebody searches.

On my system, any Macroquad program causes CPU to run at 100%+. This is unique to Macroquad - all the other engines I've tried don't have this problem (Amethist/Tetra/Ggez).

I'm on Ubuntu 20.04, and vsync is enabled.

ar37-rs · 2021-08-18T15:11:17Z

for temporary solution, put main thread to sleep for a while after next_frame:
next_frame().await;
std:: thread ::sleep(Duration::from_millis(10));

64kramsystem · 2021-10-10T17:59:40Z

(Cross-posted from #257, as this is the correct location).

Got this on Linux.

The backtrace shows that glFinish() ends with the nvidia driver calling sched_yield(), presumably in a busy-wait loop. (To confirm, with strace I'm counting 60k sched_yield syscalls per second.)

As suggested in #257 (comment) this seems to be a nvidia driver issue. A web search finds related issues in a couple of projects (e.g. gnome), but I don't understand what the solution was.

With the information at hand (thanks @martinxyz), and after some digging, I think it's an overlook ("bug-ish" sort of thing) in MQ.

I've checked the glFinish() invocation history, and it is used for telemetry:

commit 05e69c49c5e8b5281f7f6645882168628e45b89f
Author: Fedor Logachev <not.fl3@gmail.com>
Date:   Thu Jan 7 12:11:04 2021 -0600

    telemetry: More internal metrics

diff --git a/profiler/src/lib.rs b/profiler/src/lib.rs
index 0317101..4286e55 100644
--- a/profiler/src/lib.rs
+++ b/profiler/src/lib.rs
@@ -188,14 +188,20 @@ fn zone_ui(ui: &mut Ui, zone: &Zone, n: usize) {
                     }
                 }
 
-                ui.label(
-                    None,
-                    &format!(
-                        "Full frame time: {:.3}ms {:.1}(1/t)",
-                        get_frame_time(),
-                        (1.0 / get_frame_time())
-                    ),
-                );
+                if let Some(frame) = state
+                    .selected_frame
+                    .as_ref()
+                    .or_else(|| state.frames_buffer.get(0))
+                {
+                    ui.label(
+                        None,
+                        &format!(
+                            "Full frame time: {:.3}ms {:.1}(1/t)",
+                            frame.full_frame_time,
+                            (1.0 / frame.full_frame_time)
+                        ),
+                    );
+                }
 
                 if state.paused {
                     if ui.button(None, "resume") {
diff --git a/src/lib.rs b/src/lib.rs
index 486e01c..4e3bd7f 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -247,6 +247,8 @@ fn draw(&mut self) {
             let _z = telemetry::ZoneGuard::new("Event::draw");
 
             if let Some(future) = unsafe { MAIN_FUTURE.as_mut() } {
+                let _z = telemetry::ZoneGuard::new("Main loop");
+
                 get_context().begin_frame();
 
                 if exec::resume(future) {
@@ -263,6 +265,16 @@ fn draw(&mut self) {
 
             get_context().frame_time = date::now() - get_context().last_frame_time;
             get_context().last_frame_time = date::now();
+
+            #[cfg(any(target_arch = "wasm32", target_os = "linux"))]
+            {
+                let _z = telemetry::ZoneGuard::new("glFinish/glFLush");
+
+                unsafe {
+                    miniquad::gl::glFlush();
+                    miniquad::gl::glFinish();
+                }
+            }
         }
 
         telemetry::reset();

I've had a look around, and for example, Intel discourages using glFinish() in its OpenGL optimization guide.

Potentially, this could be a misuse of the OpenGL APIs, but I'm no expert in this.

Regardless, at the very least, telemetry should be preset according to the profile (e.g. disabled in release mode, and enabled in debug mode), or anyway, it should be configurable, and if enabled in release mode, it should be very clearly documented that it can potentially hurt performance.

martinxyz · 2021-10-10T21:59:16Z

Out of curiosity, I switched to the nouveau driver (open source) and the CPU load didn't disappear.

The traceback was still going through the same glFinish() call, though usually ending at poll() (which would be good) and almost never at sched_yield(). Not 100% trusting this. Anyway, strace still showed ~8'000 sched_yield calls per second.

When I comment out the glFlush(); glFinish(); lines added by the telemetry commit (see previous comment by @64kramsystem) then the sched_yield() syscalls go away.

My "Ctrl-C" tracebacks in gdb now go from miniquad::graphics::Context::clear() through dri_image_drawable_get_buffers() and end up in a poll() syscall.

CPU load also seems lower, but I'm having trouble measuring it reliably. Load average and top disagree, and it seems to depend on the window size. A larger window gives lower(?!) CPU load.

In case it matters, all my tests were on Gnome3 and Xorg (not Wayland).

mhcerri · 2023-05-30T21:11:06Z

@not-fl3, what exactly are you trying to achieve with this piece of code bellow? Is there any reason it is only used for linux and wasm32?

@@ -263,6 +265,16 @@ impl EventHandlerFree for Stage {

             get_context().frame_time = date::now() - get_context().last_frame_time;
             get_context().last_frame_time = date::now();
+
+            #[cfg(any(target_arch = "wasm32", target_os = "linux"))]
+            {
+                let _z = telemetry::ZoneGuard::new("glFinish/glFLush");
+
+                unsafe {
+                    miniquad::gl::glFlush();
+                    miniquad::gl::glFinish();
+                }
+            }
         }

         telemetry::reset();

glFinish() is known for causing busy waits with Nvidia drivers. If that is really needed, do you think it would be possible to only call it when the profiler is enabled?

Thanks for the awesome project!

DissolveDZ · 2024-04-15T23:32:06Z

this is still an active issue, I might do a pull request soon if i get around to it

64kramsystem closed this as completed Oct 10, 2021

64kramsystem changed the title ~~High (100%) CPU usage, even with vsync enabled~~ High (100%) CPU usage on Linux/Nvidia, even with vsync enabled Oct 10, 2021

64kramsystem reopened this Oct 10, 2021

64kramsystem mentioned this issue Oct 10, 2021

High CPU usage on Windows 10, even if vsync is enabled #257

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High (100%) CPU usage on Linux/Nvidia, even with vsync enabled #275

High (100%) CPU usage on Linux/Nvidia, even with vsync enabled #275

64kramsystem commented Aug 9, 2021 •

edited

Loading

ar37-rs commented Aug 18, 2021 •

edited

Loading

64kramsystem commented Oct 10, 2021

martinxyz commented Oct 10, 2021

mhcerri commented May 30, 2023

DissolveDZ commented Apr 15, 2024

High (100%) CPU usage on Linux/Nvidia, even with vsync enabled #275

High (100%) CPU usage on Linux/Nvidia, even with vsync enabled #275

Comments

64kramsystem commented Aug 9, 2021 • edited Loading

ar37-rs commented Aug 18, 2021 • edited Loading

64kramsystem commented Oct 10, 2021

martinxyz commented Oct 10, 2021

mhcerri commented May 30, 2023

DissolveDZ commented Apr 15, 2024

64kramsystem commented Aug 9, 2021 •

edited

Loading

ar37-rs commented Aug 18, 2021 •

edited

Loading