Skip to content

Conversation

@almarklein
Copy link
Member

@almarklein almarklein commented Nov 17, 2025

Revisiting #109 (and #107)

Intro

In #109 I opted for a solution that did some time.sleep() in the scheduler. This is a relatively simple solution, that fixed the problem of lower-than-max-fps framerates for all backends.

This morning I was happily planning to implement the polling for wgpu, so that we can properly leverage async, but I bumped in this problem again. I decided to fix the problem in a per-backend manner, so that we can await sleep() with precise timers everywhere (and for the wgpu-polling).

Changes

  • Add loop.call_soon_threadsafe().
    • All loop backends must implement _rc_call_soon_threadsafe().
  • Implement _coreutils.call_later_from_thread()
    • This uses a single global thread to handle all call-later needs.
    • The thread is created on first use, because Pyodide cannot do threads.
    • Has about 1ms precision.
  • For asyncio and trio, use this util to implement precise sleep on Windows.
  • For the wx backend, use this util to implement precise call_later on Windows.
  • For the qt backend, use PreciseTimer to implement precise call_later on Windows.
    • Also tried the thread util, but the PreciseTimer seems even more precise.
  • The raw loop is now dead-simple, because the util takes care of all the scheduling.
  • The scheduler now uses a normal await sleep(..), without any trickery.

Some low-level details

Mostly to record design decisions. You can probably skip this.

Some more details about how polling led me here .... I thought about a few different approaches to do the polling:

  • Let wgpu-py run the polling in a thread. But then that thread will call loop.call_soon(), which means that the call_soon() of all backends will have to be thread-safe, which is currently not the case.
  • I also considered a thread in rendercanvas to call a registered function at a precise interval, which has the same problem.
  • So then a nice and clean async task that sleeps regular intervals. This works, but then we run into the issue that async-sleep is inprecise... so here we are.

Some details about an earlier posted solution that uses a threadpool, see this link.

When using a thread pool, one thread is used to sleep for `delay` seconds, and when it wakes up, it will scheduler a callback in the main loop. The downside is that when multiple tasks are running (and sleeping), there will be one thread per task. And when the threads run out, they will wait for each-other and the sleep will be much longer. That's why I came up with a technique that uses a single thread with a priority queue.

Some findings while I was experimenting

Mostly to record interesting findings. You can probably skip this.

I tried several things to try to find a mechanism that does not suffer from imprecise timer. BTW, ChatGTP is absolutely worthless for this topic.

Allow a thread to wait for a signal from another thread on a timeout

This is a simple mechanism that would have been useful. But it looks like its impossible. Using time.sleep() is precise, but any method that has a timeout (threading.Event.wait(), Queue.get(), threading.Lock.acquire(), using select on a socket) is inprecise, and waits 15.6 ms on average.

import time
import threading
from queue import Queue,SimpleQueue, Empty

import socket
import select


class PollThread(threading.Thread):

    def __init__(self):
        super().__init__()
        self.e = threading.Event()

        self.pair =socket.socketpair()
        self.q = Queue()
        self.lock = threading.Lock()


    def run(self):
        timeout = 0.002

        while True:
            x = []
            for i in range(20):

                t0 = time.perf_counter() 

                time.sleep(timeout)  # This is precise, the rest is not

                # self.e.wait(timeout)

                # try:
                #     self.q.get(True, timeout)
                # except Empty:
                #     pass

                # ready, _, _ = select.select([self.pair[0]], [], [], timeout)

                # self.lock.acquire(True, timeout)

                t1 =  time.perf_counter()
                x.append(t1-t0)

            print(f"{1000*sum(x)/len(x):0.1f}  {1000*max(x):0.1f}")
            x.clear()

poller = PollThread()
poller.start()

Let a sub-thread invoke a method in the main thread

This actually does work precise!

For Asyncio, you can take a asyncio.Event, let a task wait for it, and then let the worker thread do loop.call_soon_threadsafe(event.set).

For Qt it works with a litle code like this:

from PySide6.QtCore import QObject, Signal, Slot
from PySide6.QtGui import QGuiApplication
class InvokeMethod(QObject):
    def __init__(self, method):     
        super().__init__()
        main_thread = QGuiApplication.instance().thread()
        self.moveToThread(main_thread)
        self.setParent(QGuiApplication.instance())
        self.method = method
        self.called.connect(self.execute)
        self.called.emit()

    called = Signal()

    @Slot()
    def execute(self):
        self.method()        
        self.setParent(None)  # trigger garbage collector

And then the worker thread simply does InvokeMethod(self.cb).

I considered an approach where rendercanvas would do the scheduling in a thread and then use this mechanism to make it do stuff in the main thread. But that would mean things need to be differen on systems where we have no threads, like Pyodide.

Piece of code to check that it works

import time
from rendercanvas.glfw import RenderCanvas
from rendercanvas.asyncio import loop   # use pyside6, asyncio, raw, wx, trio
from rendercanvas.utils.asyncs import sleep


RenderCanvas.select_loop(loop)
c = RenderCanvas()

async def main():
    while True:
        x = []
        for i in range(20):

            t0 = time.perf_counter()
            await sleep(0.002)
            t1 =  time.perf_counter()
            x.append(t1-t0)

        print(f"{1000*sum(x)/len(x):0.1f}  {1000*max(x):0.1f}")
        x.clear()

loop.add_task(main)

loop.run()

@almarklein almarklein requested a review from Korijn November 17, 2025 15:31
@almarklein
Copy link
Member Author

Note that for the trio and wx loops, this re-introduces #107, so to some degree this is a regression.

@almarklein
Copy link
Member Author

cc @Vipitis

@Vipitis
Copy link
Contributor

Vipitis commented Nov 17, 2025

hm, I checked out the branch and it doesn't seem to be much different than the previous improvement we had. Still better compared to what we had originally - but also not perfect.
My main monitor has as 165hz refreshrate... and with continous you seem to still be limited by that - kinda. Don't have the time right now to run a full sweep of frametimes to see if the curve is non linear. So I will just share a few quick tests (let it warm up for ~10 seconds and had the window highlighted):

unrestricted validation:
image
vsync, solid 165:
image
continous (using the sheduler, vsync off)
image
image
image
image

I wanted to try some external benchmarking tools like PresentMon anyway to better understand the CPU vs GPU side - but haven't gotten there yet to see if it even works with wgpu.

@almarklein
Copy link
Member Author

it doesn't seem to be much different than the previous improvement we had.

That's also not the point. The purpose is that await sleep() is now actually precise (for asyncio/qt/raw). This makes the scheduler code a bit simpler, but more importantly, other tasks can make use of precise sleeps as well, e.g. a wgpu poller that we want to run faster that with steps of 15.6ms.

@almarklein almarklein marked this pull request as ready for review November 19, 2025 11:33
@almarklein
Copy link
Member Author

almarklein commented Nov 19, 2025

This is ready. All backends now have a precise timer. Mostly through a new threaded callback util. Qt through PreciseTimer. Tested on PyQt5, PyQt6, PySide2, PySide6. Also see the updated top post.

@Korijn
Copy link
Contributor

Korijn commented Nov 20, 2025

I see a lot of notes about preciseness on Windows, what about the other platforms? Do they just sidestep this whole thing?

@almarklein almarklein changed the title Re-implement precise sleep Add loop.call_soon_threadsafe() and re-implement precise sleep Nov 20, 2025
@almarklein almarklein changed the title Add loop.call_soon_threadsafe() and re-implement precise sleep Add loop.call_soon_threadsafe() and re-implement precise sleep Nov 20, 2025
@almarklein
Copy link
Member Author

almarklein commented Nov 20, 2025

I see a lot of notes about preciseness on Windows, what about the other platforms? Do they just sidestep this whole thing?

Windows historically uses ticks that go at 64 ticks per second, i.e. 15.625 ms each. Other platforms are "tickless" and have microsecond resolution.

edit: I'll add this as a comment somewhere.

@almarklein
Copy link
Member Author

almarklein commented Nov 20, 2025

I also added loop.call_soon_threadsafe(). This was easy to add since I already had the logic for each backend to implement the threaded timer. It opens up the possibility to interact with threads.

The real use-case is polling wgpu. I first thought I wanted to do that using threads, but we did not have a backend-agnostic way to let a thread invoke a call in the main thread. So I thought to poll using await sleep(), but for that the accuracy had to go up, otherwise the steps would be too big on Windows. Turned out the best way to implement a precise timer on Window is using a thread, so I had to implement support to invoke a call in the main thread, for each backend. But now that we have that, I added call_soon_threadsafe(), and we went full circle :) Now we can actually poll wgpu in a thread, which is much better (more on that soon).

@almarklein
Copy link
Member Author

I ❤️ this piece of code:

class RawLoop(BaseLoop):
    ...

    def _rc_run(self):
        while not self._should_stop:
            callback = self._queue.get(True, None)
            try:
                callback()
            except Exception as err:
                logger.error(f"Error in RawLoop callback: {err}")

    def _rc_call_later(self, delay, callback):
        call_later_from_thread(delay, self._rc_call_soon_threadsafe, callback)

    def _rc_call_soon_threadsafe(self, callback):
        self._queue.put(callback)

@almarklein almarklein merged commit 79bae0c into main Nov 24, 2025
13 checks passed
@almarklein almarklein deleted the precise-sleep branch November 24, 2025 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants