Add `loop.call_soon_threadsafe()` and re-implement precise sleep #146

almarklein · 2025-11-17T14:51:03Z

Revisiting #109 (and #107)

Intro

In #109 I opted for a solution that did some time.sleep() in the scheduler. This is a relatively simple solution, that fixed the problem of lower-than-max-fps framerates for all backends.

This morning I was happily planning to implement the polling for wgpu, so that we can properly leverage async, but I bumped in this problem again. I decided to fix the problem in a per-backend manner, so that we can await sleep() with precise timers everywhere (and for the wgpu-polling).

Changes

Add loop.call_soon_threadsafe().
- All loop backends must implement _rc_call_soon_threadsafe().
Implement _coreutils.call_later_from_thread()
- This uses a single global thread to handle all call-later needs.
- The thread is created on first use, because Pyodide cannot do threads.
- Has about 1ms precision.
For asyncio and trio, use this util to implement precise sleep on Windows.
For the wx backend, use this util to implement precise call_later on Windows.
For the qt backend, use PreciseTimer to implement precise call_later on Windows.
- Also tried the thread util, but the PreciseTimer seems even more precise.
The raw loop is now dead-simple, because the util takes care of all the scheduling.
The scheduler now uses a normal await sleep(..), without any trickery.

Some low-level details

Mostly to record design decisions. You can probably skip this.

Some more details about how polling led me here .... I thought about a few different approaches to do the polling:

Let wgpu-py run the polling in a thread. But then that thread will call loop.call_soon(), which means that the call_soon() of all backends will have to be thread-safe, which is currently not the case.
I also considered a thread in rendercanvas to call a registered function at a precise interval, which has the same problem.
So then a nice and clean async task that sleeps regular intervals. This works, but then we run into the issue that async-sleep is inprecise... so here we are.

Some details about an earlier posted solution that uses a threadpool, see this link.

When using a thread pool, one thread is used to sleep for `delay` seconds, and when it wakes up, it will scheduler a callback in the main loop. The downside is that when multiple tasks are running (and sleeping), there will be one thread per task. And when the threads run out, they will wait for each-other and the sleep will be much longer. That's why I came up with a technique that uses a single thread with a priority queue.

Some findings while I was experimenting

Mostly to record interesting findings. You can probably skip this.

I tried several things to try to find a mechanism that does not suffer from imprecise timer. BTW, ChatGTP is absolutely worthless for this topic.

Allow a thread to wait for a signal from another thread on a timeout

This is a simple mechanism that would have been useful. But it looks like its impossible. Using time.sleep() is precise, but any method that has a timeout (threading.Event.wait(), Queue.get(), threading.Lock.acquire(), using select on a socket) is inprecise, and waits 15.6 ms on average.

import time
import threading
from queue import Queue,SimpleQueue, Empty

import socket
import select


class PollThread(threading.Thread):

    def __init__(self):
        super().__init__()
        self.e = threading.Event()

        self.pair =socket.socketpair()
        self.q = Queue()
        self.lock = threading.Lock()


    def run(self):
        timeout = 0.002

        while True:
            x = []
            for i in range(20):

                t0 = time.perf_counter() 

                time.sleep(timeout)  # This is precise, the rest is not

                # self.e.wait(timeout)

                # try:
                #     self.q.get(True, timeout)
                # except Empty:
                #     pass

                # ready, _, _ = select.select([self.pair[0]], [], [], timeout)

                # self.lock.acquire(True, timeout)

                t1 =  time.perf_counter()
                x.append(t1-t0)

            print(f"{1000*sum(x)/len(x):0.1f}  {1000*max(x):0.1f}")
            x.clear()

poller = PollThread()
poller.start()

Let a sub-thread invoke a method in the main thread

This actually does work precise!

For Asyncio, you can take a asyncio.Event, let a task wait for it, and then let the worker thread do loop.call_soon_threadsafe(event.set).

For Qt it works with a litle code like this:

from PySide6.QtCore import QObject, Signal, Slot
from PySide6.QtGui import QGuiApplication
class InvokeMethod(QObject):
    def __init__(self, method):     
        super().__init__()
        main_thread = QGuiApplication.instance().thread()
        self.moveToThread(main_thread)
        self.setParent(QGuiApplication.instance())
        self.method = method
        self.called.connect(self.execute)
        self.called.emit()

    called = Signal()

    @Slot()
    def execute(self):
        self.method()        
        self.setParent(None)  # trigger garbage collector

And then the worker thread simply does InvokeMethod(self.cb).

I considered an approach where rendercanvas would do the scheduling in a thread and then use this mechanism to make it do stuff in the main thread. But that would mean things need to be differen on systems where we have no threads, like Pyodide.

Piece of code to check that it works

import time
from rendercanvas.glfw import RenderCanvas
from rendercanvas.asyncio import loop   # use pyside6, asyncio, raw, wx, trio
from rendercanvas.utils.asyncs import sleep


RenderCanvas.select_loop(loop)
c = RenderCanvas()

async def main():
    while True:
        x = []
        for i in range(20):

            t0 = time.perf_counter()
            await sleep(0.002)
            t1 =  time.perf_counter()
            x.append(t1-t0)

        print(f"{1000*sum(x)/len(x):0.1f}  {1000*max(x):0.1f}")
        x.clear()

loop.add_task(main)

loop.run()

almarklein · 2025-11-17T15:31:24Z

Note that for the trio and wx loops, this re-introduces #107, so to some degree this is a regression.

almarklein · 2025-11-17T15:31:38Z

cc @Vipitis

Vipitis · 2025-11-17T17:11:15Z

hm, I checked out the branch and it doesn't seem to be much different than the previous improvement we had. Still better compared to what we had originally - but also not perfect.
My main monitor has as 165hz refreshrate... and with continous you seem to still be limited by that - kinda. Don't have the time right now to run a full sweep of frametimes to see if the curve is non linear. So I will just share a few quick tests (let it warm up for ~10 seconds and had the window highlighted):

unrestricted validation:

vsync, solid 165:

continous (using the sheduler, vsync off)

I wanted to try some external benchmarking tools like PresentMon anyway to better understand the CPU vs GPU side - but haven't gotten there yet to see if it even works with wgpu.

almarklein · 2025-11-17T18:39:16Z

it doesn't seem to be much different than the previous improvement we had.

That's also not the point. The purpose is that await sleep() is now actually precise (for asyncio/qt/raw). This makes the scheduler code a bit simpler, but more importantly, other tasks can make use of precise sleeps as well, e.g. a wgpu poller that we want to run faster that with steps of 15.6ms.

almarklein · 2025-11-19T11:34:22Z

This is ready. All backends now have a precise timer. Mostly through a new threaded callback util. Qt through PreciseTimer. Tested on PyQt5, PyQt6, PySide2, PySide6. Also see the updated top post.

Korijn · 2025-11-20T10:23:42Z

I see a lot of notes about preciseness on Windows, what about the other platforms? Do they just sidestep this whole thing?

almarklein · 2025-11-20T11:54:56Z

I see a lot of notes about preciseness on Windows, what about the other platforms? Do they just sidestep this whole thing?

Windows historically uses ticks that go at 64 ticks per second, i.e. 15.625 ms each. Other platforms are "tickless" and have microsecond resolution.

edit: I'll add this as a comment somewhere.

almarklein · 2025-11-20T12:01:22Z

I also added loop.call_soon_threadsafe(). This was easy to add since I already had the logic for each backend to implement the threaded timer. It opens up the possibility to interact with threads.

The real use-case is polling wgpu. I first thought I wanted to do that using threads, but we did not have a backend-agnostic way to let a thread invoke a call in the main thread. So I thought to poll using await sleep(), but for that the accuracy had to go up, otherwise the steps would be too big on Windows. Turned out the best way to implement a precise timer on Window is using a thread, so I had to implement support to invoke a call in the main thread, for each backend. But now that we have that, I added call_soon_threadsafe(), and we went full circle :) Now we can actually poll wgpu in a thread, which is much better (more on that soon).

almarklein · 2025-11-20T12:10:32Z

I ❤️ this piece of code:

class RawLoop(BaseLoop):
    ...

    def _rc_run(self):
        while not self._should_stop:
            callback = self._queue.get(True, None)
            try:
                callback()
            except Exception as err:
                logger.error(f"Error in RawLoop callback: {err}")

    def _rc_call_later(self, delay, callback):
        call_later_from_thread(delay, self._rc_call_soon_threadsafe, callback)

    def _rc_call_soon_threadsafe(self, callback):
        self._queue.put(callback)

almarklein added 4 commits November 17, 2025 15:36

Re-implement precise sleep

6dbb1e1

improve accuracy of raw loop

ecd3804

ruff

b91a4ac

docs

39903dc

almarklein requested a review from Korijn November 17, 2025 15:31

almarklein added 14 commits November 18, 2025 13:16

Try a scheduler thread util

dd3b330

improvinh

2b74282

Also apply for trio

8554cda

tiny tweak

493924e

Implement for wx

aba63cc

Make precise timers and threaded timers work for all qt backends

2591017

Clean up

19aa7a4

add comment

9a01953

simplify thread code a bit

6270abb

Avoid using Future.set_result, which we are not supposed to be calling

c06c5c5

clean

5326bab

comment

eeabd01

Using the thread, the raw loop can become dead simple

8f6bbda

cleanup

886e6d4

almarklein marked this pull request as ready for review November 19, 2025 11:33

Merge branch 'main' into precise-sleep

f1b89fb

almarklein added 2 commits November 20, 2025 12:34

Add loop.call_soon_threadsafe()

63350b2

Add comment

e25d1f3

almarklein changed the title ~~Re-implement precise sleep~~ Add loop.call_soon_threadsafe() and re-implement precise sleep Nov 20, 2025

almarklein changed the title ~~Add loop.call_soon_threadsafe() and re-implement precise sleep~~ Add loop.call_soon_threadsafe() and re-implement precise sleep Nov 20, 2025

reset changed flag while testing

39ad9cb

docstring, and little extra offset

7cfc6f6

almarklein mentioned this pull request Nov 21, 2025

Use a polling thread per device pygfx/wgpu-py#778

Merged

almarklein merged commit 79bae0c into main Nov 24, 2025
13 checks passed

almarklein deleted the precise-sleep branch November 24, 2025 10:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `loop.call_soon_threadsafe()` and re-implement precise sleep #146

Add `loop.call_soon_threadsafe()` and re-implement precise sleep #146

Uh oh!

almarklein commented Nov 17, 2025 •

edited

Loading

Uh oh!

almarklein commented Nov 17, 2025

Uh oh!

almarklein commented Nov 17, 2025

Uh oh!

Vipitis commented Nov 17, 2025

Uh oh!

almarklein commented Nov 17, 2025

Uh oh!

almarklein commented Nov 19, 2025 •

edited

Loading

Uh oh!

Korijn commented Nov 20, 2025

Uh oh!

almarklein commented Nov 20, 2025 •

edited

Loading

Uh oh!

almarklein commented Nov 20, 2025 •

edited

Loading

Uh oh!

almarklein commented Nov 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add loop.call_soon_threadsafe() and re-implement precise sleep #146

Add loop.call_soon_threadsafe() and re-implement precise sleep #146

Uh oh!

Conversation

almarklein commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Intro

Changes

Some low-level details

Some findings while I was experimenting

Allow a thread to wait for a signal from another thread on a timeout

Let a sub-thread invoke a method in the main thread

Piece of code to check that it works

Uh oh!

almarklein commented Nov 17, 2025

Uh oh!

almarklein commented Nov 17, 2025

Uh oh!

Vipitis commented Nov 17, 2025

Uh oh!

almarklein commented Nov 17, 2025

Uh oh!

almarklein commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Korijn commented Nov 20, 2025

Uh oh!

almarklein commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

almarklein commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

almarklein commented Nov 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add `loop.call_soon_threadsafe()` and re-implement precise sleep #146

Add `loop.call_soon_threadsafe()` and re-implement precise sleep #146

almarklein commented Nov 17, 2025 •

edited

Loading

almarklein commented Nov 19, 2025 •

edited

Loading

almarklein commented Nov 20, 2025 •

edited

Loading

almarklein commented Nov 20, 2025 •

edited

Loading