Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clocks: how to safely and efficiently schedule events from other threads? #2104

Open
jcelerier opened this issue Oct 11, 2023 · 9 comments
Open

Comments

@jcelerier
Copy link
Contributor

Hello,
I'm writing externals for devices that output their data from a callback thread which I want to bring back into Pd's scheduling thread with as little latency and performance impact as possible.

I'm currently having a clock that does:

sys_lock();
clock_set(m_clock, 0);
sys_unlock();

whenever such a message is received to go read the data from a lock-free spsc queue where the device callback writes.

I found out the hard way that these clock objects aren't thread-safe. But of course we don't want a sys_lock / sys_unlock on every message the devices will send (which could realistically be in the khz range). So what's the most idiomatic way to tell pd to process a message coming from another thread as fast as possible?

@umlaeute
Copy link
Contributor

there's no "idiomatic" way.

but typically you would signal (to a socket) that new data has arrived, and poll that socket in the mainthread with clock

@Spacechild1
Copy link
Contributor

Spacechild1 commented Oct 11, 2023

Generally, sys_lock and sys_unlock is not safe to use from within an external as it can always deadlock:

  1. main thread calls sys_lock
  2. helper thread of object A calls sys_lock and blocks
  3. main thread calls destructor of object A
  4. object A tries to join helper thread -> deadlock!

We can only hope that some day we get a proper asynchronous task/messaging API, such as #1357.

In the mean time, there are several workarounds:

If you just want to notify the main thread, you could use a semaphore (or socket, if you're lazy) and poll it regularly with a clock.

If you also want to send data, you would need to manage your own thread-safe (ideally lock-free) queue and regularly poll it from the main thread with a clock.


whenever such a message is received to go read the data from a lock-free spsc queue where the device callback writes.

In that case you wouldn't need another thread in the first place! You can just regularly poll the spsc queue from the main thread with a clock. Ideally, you would allow users to set the polling interval, like [comport] does.

@jcelerier
Copy link
Contributor Author

jcelerier commented Oct 11, 2023

You can just regularly poll

If the only solution is manual polling there's no actual solution :( maybe it can work for a couple externals but it definitely will waste a lot of resources if one has e.g. a few hundred / thousand objects which all poll concurrently, that's not how one designs scalable async systems ... from what I can see #1357 is indeed exactly what is needed.

@Spacechild1
Copy link
Contributor

Spacechild1 commented Oct 11, 2023

but it definitely will waste a lot of resources if one has e.g. a few hundred / thousand objects which all poll concurrently

How many objects require regular (fast) polling? How many instances are needed? I'm pretty sure it's not hundreds or thousands. Anyway, clocks are rather cheap. Pd can run hundreds of [metro] objects at short intervals just fine. I wouldn't worry about it.

that's not how one designs scalable async systems ... from what I can see #1357 is indeed exactly what is needed.

I certainly agree with that :)

@jcelerier
Copy link
Contributor Author

Pd can run hundreds of [metro] objects at short intervals just fine.

I really disagree with this. I just tried a basic test with a hundred-ish metro objects ticking at 1ms and pd is already sitting at 15% of a 4.5 ghz i7 CPU core...

@Spacechild1
Copy link
Contributor

Spacechild1 commented Oct 11, 2023

Interesting. On my machine (AMD Ryzen 7 PRO 5850U), 100 - 200 [metro] objects at 1 ms are basically not measurable. With 300 objects I suddenly get 15%.

Generally, the implementation of Pd's scheduler is rather inefficient because it uses a linked list instead of a heap. If a clock is rescheduled, it needs to traverse the whole list until it finds the appropriate spot. The more clocks, the more expensive it gets. If many clocks are rescheduled at the same time, the complexity approaches O(n^2).

Two questions:

  1. do you really need to poll at 1 ms?
  2. how many instances of your object do you expect in a typical patch?

Again, I agree that polling is not ideal, but there's currently no other reliable solution and I don't think it is that much of problem in practice.

Now, if you really expect users to create dozens or hundreds of instances and poll at very low intervals, you can use a shared clock for all your objects. You would create a "fake" object containing the clock and bind it to some (obscure) symbol. The object would maintain a list of owners that it has to notify; this list may also act as a reference count, so the clock gets freed when there are no clients anymore.

@jcelerier
Copy link
Contributor Author

jcelerier commented Oct 11, 2023

do you really need to poll at 1 ms?

I really need the lowest possible jitter & latency, closer to the us than the ms range. Ideally I wouldn't poll because Pd would do it for me in a much more efficient way :) and I would just tell it "there's a message for this object, please process it ASAP".

In some cases right now the only solution is to put the data in audio and work with dsp but that really limits the kind of data that can be worked with and comes with its own set of problems. (of course all this is assuming super-optimized linux boxes running with isolated CPU cores dedicated to this processing)

how many instances of your object do you expect in a typical patch?

with the specific set of objects I'm thinking of, I've seen Max patches with a number of instances in the low thousands (e.g. 1-5k) and I'd like to manage the same (or more, I was once asked "would it work with 20k objects") with Pd.

Most of these objects will rarely get a message, but it's important that they all process it in a timely way. So I guess what you are proposing with the "fake" object is the only way... better than nothing :)

@Spacechild1
Copy link
Contributor

I really need the lowest possible jitter & latency, closer to the us than the ms range.

How should that work? External input is necessarily quantized to scheduler ticks. Also note that the scheduler is not ticked at regular intervals because the messaging system necessarily introduces jitter. It gets even worse when DSP is on; e.g. with a hardware buffer size of 256 samples, Pd would process 4 blocks in a row and then wait. You would need to run Pd in callback mode at 64 samples, but even then you cannot expect a precise tick rate.

I've seen Max patches with a number of instances in the low thousands (e.g. 1-5k)

:-O What kind of object is that? It might help if you could give more info on what your object actually does.

Most of these objects will rarely get a message, but it's important that they all process it in a timely way.

Ok, in this case polling would be really wasteful. That's an important detail ;-)

So I guess what you are proposing with the "fake" object is the only way... better than nothing :)

Probably.

@Spacechild1
Copy link
Contributor

I really need the lowest possible jitter & latency

I also want to add that these two goals are mutually exclusive. If you need lowest possible latency, you have to tolerate jitter. Conversely, if you want to preserve the original timing, you need to timestamp and schedule in advance, which necessarily adds latency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants