Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement AudioWorklet #23807

Open
gterzian opened this issue Jul 19, 2019 · 38 comments
Open

Implement AudioWorklet #23807

gterzian opened this issue Jul 19, 2019 · 38 comments

Comments

@gterzian
Copy link
Member

gterzian commented Jul 19, 2019

https://webaudio.github.io/web-audio-api/#audioworklet

Any interest in this? @ferjm @Manishearth

It's using MessagePorts, coming soon to a neighorhood near you. #23637

Although this might be one of those "implicit port" which can more easily be implemented with a channel(like in dedicated workers).

@Manishearth
Copy link
Member

Interest, yes, but I'm not sure how to make the processing model work safely with spidermonkey and the audio thread. SHould be fine with an audio thread, may be tricky if it's a process, which we might eventualy do.

@gterzian
Copy link
Member Author

gterzian commented Jul 19, 2019

the audio thread

Is that the "rendering thread" mentioned in the spec? https://webaudio.github.io/web-audio-api/#rendering-thread

And the "control thread" is basically a script-thread then, right?

may be tricky if it's a process, which we might eventualy do.

What kind of process isolation do you have in mind? One rendering thread/process per origin? In that case, a worklet would be created from a "control thread"(script, I guess), whereas the audioworkletglobalscope/audioworkletprocessorwould run in the same process as the rendering thread(but on a different thread I guess).

So, the interface used from script/control is the audioworkletnode, which comes with a port attribute.

The interface used "inside" the worklet scope, which could be running in a separate process, is then the audioworkletprocessor, which also comes with a port.

Those two ports are then the endpoints to a communication channel, which in the process isolation case would have to cross-process.

So the good thing about the MessagePort implementation over at #23637 is that it's basically an ipc-channel available from script, allowing script to communicate port-to-port where each port can be in a different process(it works in-process as well, and in that case the ipc is just unnecessary overhead).

So having audioworkletnode communicate directly with it's corresponding audioworkletprocessor across process would work out of the box.

@gterzian
Copy link
Member Author

gterzian commented Jul 19, 2019

Ok so I read the spec a bit more, so the audio worklet would run on the same "rendering thread" as the base audio context, it's really just a way for script to run a script in that context, it's not an additional background worker(although it's runs in the "background" rendering thread).

So you do need a separate worklet global scope(a new concept, separate from worker global scope) in which to run the worklet processor code.

Having said that, if you were to run the "rendering thread" in a separate process from script(and share it on a per-origin basis with various script processes?), you could use MessagePort as the direct means of communication between the worklet processor running in the rendering process and the script process using the worklet node.

So the fact that the spec uses MessagePort means it can actually cross process, and the currently in-progress implementation basically is written with that use-case in mind.

@Manishearth
Copy link
Member

IMG_20190719_114130

Rough diagram of all the processing units involved. Red arrows should be in the same process, ideally. Blue arrows can have a process boundary. Our task is to find the right red arrows to sacrifice.

@asajeffrey
Copy link
Member

@gterzian
Copy link
Member Author

gterzian commented Jul 21, 2019

Ok thanks for the info.

currently webaudio runs on the script thread, but ideally there's a process boundary, but worklets ruin things https://mozilla.logbot.info/servo/20190719#c16481450

Why would worklet would ruin introducing a process boundary?

Reading the chat, it seems there is a mental model of running the Audio worklet as some sort of worker in the script-process.

I have a different mental model, where the worklet is just a piece of JS code that is run on the same thread as where the audio processing happens, and not as a worker.

The spec has some examples:

// The main global scope
const context = new AudioContext();
context.audioWorklet.addModule('bypass-processor.js').then(() => {
  const bypassNode = new AudioWorkletNode(context, 'bypass-processor');
});
// bypass-processor.js script file, runs on AudioWorkletGlobalScope
class BypassProcessor extends AudioWorkletProcessor {
  process (inputs, outputs) {
    // Single input, single channel.
    const input = inputs[0];
    const output = outputs[0];
    output[0].set(input[0]);

    // Process only while there are active inputs.
    return false;
  }
};

registerProcessor('bypass-processor', BypassProcessor);

So if you were to move WebAudio to run in a separate process, it would require:

  1. Setting up a AudioWorkletGlobalScope, which would be a separate JS execution environment from the script-thread that is using the AudioContext. You could do that in a separate audio process, I think.
  2. Upon a call to context.audioWorklet.addModule, send an IPC message to audio, and from there run the https://drafts.css-houdini.org/worklets/#dom-worklet-addmodule (Step 12 involves queuing a task on the global scope's queue to fetch the script module)
  3. Send an ipc message back to script to resolve the promise.

Then, the spec says to synchronously call the process method of the AudioWorkletProcessor "at every render quantum, if AudioWorkletNode is actively processing.", from the "render thread"(not the script-thread nor a background worker thread).

So I think that would require implementing AudioNodeEngine and somehow having process call into process of the AudioWorkletProcessor in the context of the AudioWorkletGlobalScope.

Also, AudioWorkletProcessor and AudioWorkletNode are connected via a MessagePort, which would also work over ipc.

Also, I was looking at this example https://github.com/GoogleChromeLabs/web-audio-samples/blob/master/audio-worklet/design-pattern/shared-buffer/shared-buffer-worklet-processor.js and it does seem that you need to run an event-loop for the AudioWorkletGlobalScope besides just calling process at each render quantum, but that event-loop would basically only handle potentially incoming messages from the port.

See for example how process simply returns until initialized has been set to true by handling a message on the port.

So I think you might have to run a separate thread for the AudioWorkletGlobalScope, and maybe the process method of the corresponding AudioNodeEngine would send a blocking message using an ack channel to call process of the AudioWorkletProcessor in the context of AudioWorkletGlobalScope? (you'd have to block since it's a sync processing step, not a parallel one).

@Manishearth
Copy link
Member

Reading the chat, it seems there is a mental model of running the Audio worklet as some sort of worker in the script-process.

The process boundary is kinda useless if we're allowing arbitrary JS into the audio process. I'm wary of letting gstreamer and spidermonkey share a process. It doesn't have to be the same process as script, but we should be careful about mixing it with gstreamer.

But yes, this is one option! All of the red arrows in the diagram I drew are things which would ideally be process boundaries. We need to get rid of some and consider the trade-offs, this is an acceptable choice.

@gterzian
Copy link
Member Author

gterzian commented Jul 21, 2019

The process boundary is kinda useless if we're allowing arbitrary JS into the audio process.

Yes I agree.

If I understand it correctly, we currently have one instance of ServoMedia per content-process, initialized at

media_platform::init();

So one could imagine having one ServoMedia per origin, running in it's own process. It could actually save us a few ServoMedia instances, since same-origin pages that aren't part of the same browsing-context group currently will not share a content-process(see how we share event-loops).

What would be inside one such "servo-media process" for a given origin?

  1. ServoMedia(with it's running backend)
  2. One AudioWorkletGlobalScope per AudioContext for that origin, each running an arbitrary number of AudioWorkletProcessor(one per AudioWorkletNode).

Exactly one AudioWorkletGlobalScope exists for each AudioContext that contains one or more AudioWorkletNodes. https://webaudio.github.io/web-audio-api/#audioworkletglobalscope

So you could run one thread per AudioWorkletGlobalScope, in the same "Servomedia process".

In terms of actually plugging running the worklet into the overall processing of the node graph, I think you could try something like having the process method of a AudioWorkletNode(not the DOM object, something implementingAudioNodeEngine) sending a message to the corresponding AudioWorkletGlobalScope thread and block the main rendering thread on a reply, the global would then call the process method of the corresponding AudioWorkletProcessor, and send the result back into a channel on which the "rendering thread" is blocking(since the spec mentions it should be a sync call).


Where would those ServoMedia instances be stored and made accessible to a script wishing to start an AudioContext?

it could be in a HasMap<Origin, ServoMedia> on the constellation, where ServoMedia would be an ipc-sender to the actual process. And if there is no ServoMedia in place for a given origin, it would be started by the constellation and stored.

You could also re-consider the current implementation of create_audio_context, so that it would happen inside the "media process", and return a wrapper around an ipc-sender that would be made available directly to script as part of the resume_success task. Where that task would be enqueued in response to script receiving a constellation message, containing a servo_media::audio::context::AudioContext, but instead of it being an Arc<Mutex<AudioContext>> it would be essentially a wrapper around a direct ipc-sender to the corresponding media process.

So the constellation would store a kind of "control sender" to each running media processes, but a script for a given origin would store a direct ipc-sender in the form of a AudioContext(which could also just be a clone of the sender stored by the constellation).

One could imagine the below workflow:

// The main global scope
const context = new AudioContext();
context.audioWorklet.addModule('bypass-processor.js').then(() => {
  const bypassNode = new AudioWorkletNode(context, 'bypass-processor');
});
  1. Create a new AudioContext. The DOM object is immediately created, however it can't really do anything yet.
  2. Create an ipc-channel, setup a route.
  3. A message is sent to the constellation to "get an audio context", containing the ipc-sender created at 2.
  4. Constellation receives the message, and either starts a new "audio process", or has one already running for that origin.
  5. Constellation forward the message, containing the ipc-sender created by script at 2, to the running media process for that origin.
  6. The media process handles the messages, setting up a new context, and repying on the ipc-sender by sending a message containing an ipc-sender and other necessary data(essentially the audio::context::AudioContext).
  7. When script receives that message, it has an direct ipc-channel to it's corresponding media process.
  8. context.audioWorklet.addModule('bypass-processor.js').then is done entirely over ipc between the media process and a content-process, bypassing the constelllation. The media process creates a new AudioWorkletGlobalScope, sends an ipc-message back to resolve the promise, fetches the module in parallel from the AudioWorkletGlobalScope, and so on.
  9. const bypassNode = new AudioWorkletNode(context, 'bypass-processor'); creates a new AudioWorkletNode, and hooks it up(via more ipc-messages between script and media processes) to the corresponding AudioWorkletProcessor via a MessagePort.

Come to think of it, what's kinda interesting is that each AudioWorkletGlobalScope running inside a given media process would then also have it's own ipc-sender to the constellation, like any other GlobalScope, and each AudioWorkletNode DOM object running in a scrip content-process as well as its corresponding AudioWorkletProcessor running in the AudioWorkletGlobalScope , would also have a direct ipc-channel in the form of a MessagePort. And this part would work out-of-the-box if we simply make AudioWorkletGlobalScope be a GlobalScope as well...

@Manishearth
Copy link
Member

If I understand it correctly, we currently have one instance of ServoMedia per content-process, initialized at

Right, currently we don't have a media-script process boundary, but it was designed with a plan to add such a process boundary eventually.

sending a message to the corresponding AudioWorkletGlobalScope thread and block the main rendering thread on a reply

This can work. It's unclear if the performance impact would be high.

OTOH, i'm not sure if we want to run JS code and gstreamer in the same process, ever, and those are really the only two options we have :)

@gterzian
Copy link
Member Author

gterzian commented Jul 22, 2019

https://docs.google.com/presentation/d/1GZJ3VnLIO_Pw0jr9nRw6_-trg68ol-AkliMxJ6jo6Bo/edit#slide=id.g36f61837b7_1_1

It seems in Chromium the audio worklet is run on a separate thread, in the same process as the "main" script-thread. So that actually is the opposite of what I had in mind.

And, this approach seems to create issues:

  1. https://bugs.chromium.org/p/chromium/issues/detail?id=836306 (something about the main script-thread doing a GC which also stops the in-process worklet thread)
  2. https://bugs.chromium.org/p/chromium/issues/detail?id=813825 (something about the worklet thread not being "real-time priority" and people complaining about glitching, although the response from the Chromium seems to be that it's not a very good idea to run JS on a high priority thread since that would allow for abuse of system resources as well).
  3. https://bugs.chromium.org/p/chromium/issues/detail?id=796330 (another one related to GC on the main script-thread).

i'm not sure if we want to run JS code and gstreamer in the same process, ever

So, in the light of the security issues related to abusing the worklet, it might indeed not be good idea to run the AudioWorkletProcessor alongside the "native" nodes.

And it might still be a good idea to run the worklet, on a per-origin basis, in a separate process from the main content-process, using their own JS runtime, so as to isolate the GC? So basically making audio worklets "out-of-process" like service- and shared-workers?

So you'd end-up with:

  1. A "native" audio process(just one for the whole UA, or one per origin?)
  2. One process per origin, running severalAudioWorkletGlobalScope threads, one per AudioContext for that origin.
  3. Content-processes, using both 1 and 2. With the constellation used for initial setup of ipc-channels, followed by direct ipc between AudioWorkletGlobalScope and the content-process using the corresponding AudioContext/AudioWorkletNode.

Alternatively 1 and 2 are combined into one process per origin, with 2 running on "lower priority" threads than the native audio code(but that means JS code is running in the same process as Gstreamer).

It might sound like a lot of processes, and it could still result in more re-use of audio instances, since you'd be sharing those per origin, as opposed to have one per content-process(which is sharing one audio instance per browsing context group).

Or, we just run AudioWorkletGlobalScope in a thread on the content-process, while still isolating the audio in a dedicated process(shared on a per-origin basis across content-processes?).


I personally like the idea of the constellation starting to track ServiceWorker, SharedWorker, and perhaps Worklet, on a per-origin basis. While leaving DedicatedWorker "hidden" inside an EventLoop.

So the difference with Chromium is that they put Worklet inside EventLoop, basically.

Three arguments I can find in favor of making Worklet "out-of-process", a least the AudioWorklet variant, and store it at the constellation level:

  1. Easier to isolate JS runtime performance(maybe can be done with a thread as well?).
  2. Easier to do complicated initial setup coordination(with the audio backend), if the constellation is used as a broker. Otherwise script would have to start a process for the backend and so on.
  3. Easier to share stuff on a per-origin basis acorss the UA(versus "hide" audio like a dedicated worker inside script).

Although 2 has less weight if we were to run the worklet "in-process" in script, while still isolating the audio backend in a process. In such a setup the constellation could be used to initialize the backend and setup the ipc with script, and each new worklet would just have to be hooked up when created, perhaps even bypassing the constellation if script has a direct line of comm with audio. But that would mean worklet-backend communication would go through the script-process, whereas if we isolate the worket, it would involve backend to worklet direct communication(over ipc), bypassing script for the processing part.

@Manishearth
Copy link
Member

And it might still be a good idea to run the worklet, on a per-origin basis, in a separate process from the main content-process, using their own JS runtime, so as to isolate the GC?

Less convinced by this, but if you feel it would be useful!

@asajeffrey
Copy link
Member

The paint worklet does a lot of jumping through hoops to make sure that GC is never run by the worklet. (There are three worklet threads, each running its own SM instance, and when the active worklet detects GC pressure, it swaps itself out.) This depends on paint worklets being stateless though, if audio worklets are allowed to be stateful then this approach is out. They can still run an a a separate thread from the main script thread though, so at least GC on the main script thread won't pause the worklet.

@gterzian
Copy link
Member Author

gterzian commented Jul 23, 2019

Less convinced by this, but if you feel it would be useful!

I'm not sure, starting to tend towards running the worklet in script. My initial understanding from reading the spec was that the intent of the worklet was really to run JS code on the audio thread(or at least right next to it in the same process, in the case of process isolation of audio), with the goal being making that JS code run "on par" with the native audio processing code.

Its sort of what the spec says pretty explicitly:

The AudioWorklet object allows developers to supply scripts (such as JavaScript or WebAssembly code) to process audio on the rendering thread.

But if in practice people run the worklet in a thread in the content-process, and then I assume call into the process method of the processors(via ipc then, where is audio run in practice?), then it's more of a script worker that happens to be "driven" by the audio thread, as opposed to run "on it".

Actually I think the spec is not very clear about how calling process is interleaved with running an event-loop for the AudioWorkletGlobalScope, which is necessary to handle incoming messages on the port.

I've asked a question at WebAudio/web-audio-api#2008

This depends on paint worklets being stateless though, if audio worklets are allowed to be stateful then this approach is out.

Yes, not only are they stateful, they also come with a MessagePort which means they should somehow run their own event-loop to handle incoming messages, all the while the audio rendering thread also calls into their process method, accessing the same state as a potential onmessage handler, as part of a different "render loop".

See this example, for the interplay between onmessage and process: https://webaudio.github.io/web-audio-api/#vu-meter-mode

They can still run an a a separate thread from the main script thread though, so at least GC on the main script thread won't pause the worklet.

That's the idea I am slowly starting to adapt too.

However, if we run audio in a separate process, then that means the audio "rendering loop" will have to ipc to script at each render quantum to make a blocking call into the process method of the worklet processor running on a thread in the script process. And since it's a sync call, it means you'd have to make it blocking somehow, the easiest way to do this would imply:

  1. One ipc message sent to script(containing a ACK sender). Block on the reply.
  2. Router thread in script handles the message, queues a task on the event-loop of AudioWorkletGlobalScope.
  3. Wait until that task is handled(the processor might be busy handling an incoming message via onmessage).
  4. When the task is run, call process of the processor.
  5. Send the result back over IPC.

That seems like a lot of overhead for something that is supposed to happen "continuously".

And even if we run audio in the content-process, like is done now, you'd still have to wait for the worklet thread to handle your process message and send a reply back, and it might be handling an incoming message port message first.

Unless you somehow "pause" the event-loop of AudioWorkletGlobalScope and switch to a "only run process mode", and then only let it run again when the rendering thread is done with calling process on it?

The spec contains wording like:

For this reason, the traditional way of executing asynchronous operations on the Web Platform, the event loop, does not work here, as the thread is not continuously executing. (https://webaudio.github.io/web-audio-api/#processing-model)

@gterzian
Copy link
Member Author

In Chromium, the audio rendering thread maintains itsreal-time priority unless AudioWorklet system gets acti-vated. When a user explicitly callsAudioContext.addModule(), the rendering threadis replaced with a thread withdisplaypriority. Note that thebrowser’s main thread runs with the same display priority,which is the second-tier priority in the browser. Chromiumengineers believed it is reasonable for AudioWorklet to usea higher priority thread because it helps glitch-free audiorendering and AudioWorklet is only available withinSe-cureContext5. By comparison, regular priority is given togeneral worker threads and this is for a security reason. WebWorker can run arbitrary user code from a non-securedomain. (source: https://hoch.io/assets/publications/icmc-2018-choi-audioworklet.pdf)

@gterzian
Copy link
Member Author

gterzian commented Jul 23, 2019

Ok sorry for driving everyone crazy, but I have one more "idea":

  1. Separate the backend running behind a AudioSink from the render thread.
  2. Run the actual sink in a separate process(one per origin, running many sinks?), with communication between the actual sink and the rendering thread taking place via ipc.
  3. Run the render thread in script(like now).
  4. self.sink.push_data becomes an IPC call.
  5. self.sink.has_enough_data() could perhaps be replaced by adding a number of bytes to AudioRenderThreadMsg::SinkNeedData(max_bytes). How often does self.appsrc.get_max_bytes actually change?

Basically, you could break-up AudioSink and AudioRenderThreadMsg to be one interface between the control-thread and the render-thread, both running in the script-process, and then have an ipc interface between the backend sink and the render-thread.

(While the OfflineAudioSink could just keep running in script?)

So at that point you could run a worklet processor just like any other native audio-node as part of self.graph.process(&info), and that wouldn't happen on the same process as were GStreamer is running.

it would happen on the script-process, in the rendering thread, and we'd find a way to run the event-loop of the AudioWorkletGlobalScope alongside the rendering loop, in the same rendering thread.

So that way no JS is running alongside Gstreamer, yet you avoid cross-thread/process stuff between the rendering thread and a worklet processor.

And self.sink.push_data could initially be an ipc-message, later optimized to some fancy shared-memory?


So this could be done in two steps:

  1. Separate the backend from the rendering thread, involving some orchestation via the constellation to initialize both when required. Each script process would have it's own rendering thread, and would share a backend per origin(running one or more sinks)?
  2. Impelment AudioWorklet and somehow integrate AudioWorkletGlobalScope with the rendering loop. Depends on Continue message port #23637

@padenot
Copy link

padenot commented Jul 23, 2019

In general, you cannot have context switches in this whole situation (they are too long and brings non-determinism, and that causes glitches, and that's not acceptable).

In the diagram that @Manishearth draw above, it's necessary to have the native Web Audio API node run synchronously alongside with the AudioWorklet rendering (the process call), in the same thread, that has the highest priority possible.

It's customary to then ship the full rendered audio buffer to another process to hand it off to the system. You roughly have a couple milliseconds to all do the above, if the system is not loaded too high.

I'm always available to chat (Paris time), this is being implemented in Gecko at the minute, and spidermonkey has a couple peculiarities that we'd have loved to know before starting this.

@gterzian
Copy link
Member Author

gterzian commented Jul 24, 2019

I'm always available to chat (Paris time), this is being implemented in Gecko at the minute, and spidermonkey has a couple peculiarities that we'd have loved to know before starting this.

Thanks! Yes it would be good to chat. Perhaps we could organize something with a few people? cc @Manishearth @asajeffrey @ferjm

you cannot have context switches in this whole situation

In the diagram that @Manishearth draw above, it's necessary to have the native Web Audio API node run synchronously alongside with the AudioWorklet rendering (the process call), in the same thread, that has the highest priority possible.

Ok, so that means the rendering thread and worklet global scope need to run in the same thread, and their event-loops(rendering loop + event-loop of the global scope) need to be interleaved on that same thread.

It's customary to then ship the full rendered audio buffer to another process to hand it off to the system.

Ok so then we'd have to split our current WebAudio implemenation, put the actual backend in a separate process, while keeping the rendering thread in the script/content process(and have it run any worklets as well).

I've tried to express this potential change in the diagram:

audio

@gterzian
Copy link
Member Author

I've looked at the currently worklet code, mostly used for paint worklet, and I think we can base audio worklet on it. cc @asajeffrey

It's a bit of a pitty the worklet would not live on the same thread as the audio rendering thread, however I think using the threadpool and the gc/script-loading mechanism it brings is worth it, and I'm not sure we could run the worklet as a whole on the audio rendering thread anyway.

One thing I was wondering about is how we could call the process method of a registered audio processor JS class from within the worklet thread, and I see there is already a similar mechanism in place for the paint worklet, so I guess it shouldn't be too hard.

Then the rendering thread, when encountering a worklet node as part of the rendering graph, would simply need to make a blocking call to the worklet thread pool and instruct it to run a task which would call the process method of the corresponding processor, and send the result back. The idea is that this call would not actually block, since there should always be a thread ready to handle it(we might want to use the std::sync::mpsc channel for the blocking call, since that is the most efficient channel for one-shot cases).

Since audio processors(the JS class registered to run on the worklet) have internal state, I do wonder if we would need additional plumbing to replicate that state correctly across the worklet pool, since the threads switch roles and we'd want any new "primary" to always be up-to-date...

@gterzian
Copy link
Member Author

Since audio processors(the JS class registered to run on the worklet) have internal state, I do wonder if we would need additional plumbing to replicate that state correctly across the worklet pool, since the threads switch roles and we'd want any new "primary" to always be up-to-date...

Maybe for this we can just put a mutex around the node-name-to-processor-constructor-map that contains all the processors(similar to paint classes in the paint worklet), and share that with all threads in the pool, with the understanding that the mutex will not be contented since it will only be used by the primary?

@gterzian
Copy link
Member Author

Since audio processors(the JS class registered to run on the worklet) have internal state, I do wonder if we would need additional plumbing to replicate that state correctly across the worklet pool, since the threads switch roles and we'd want any new "primary" to always be up-to-date...

Maybe for this we can just put a mutex around the node-name-to-processor-constructor-map that contains all the processors(similar to paint classes in the paint worklet), and share that with all threads in the pool, with the understanding that the mutex will not be contented since it will only be used by the primary?

So whereas the PaintWorkletGlobalScope has a DomRefCell<HashMap<Atom, Box<PaintDefinition>>>, a AudioWorkletGlobalScope could have a Arc<Mutex<HashMap<Atom, Box<AudioProcessor>>>>?

@asajeffrey
Copy link
Member

The problem is that Spidermonkey is letting internal JS state, so even if you could replicate the worklet state, the JS state would get out of sync. So I think we'd need a single-threaded implementation of audio worklets.

@padenot
Copy link

padenot commented Aug 19, 2019

It's a bit of a pitty the worklet would not live on the same thread as the audio rendering thread, however I think using the threadpool and the gc/script-loading mechanism it brings is worth it, and I'm not sure we could run the worklet as a whole on the audio rendering thread anyway.

It's the opposite: you really want to run this on a single thread, and certainly not on a thread pool (also you can't if you're using SM, as Jeffrey indicates below, because SM uses TLS). In general, multi-threaded real-time audio is a bad idea.

Having thread/process hops in between the audio rendering thread and the worklets means that it won't be possible to have it work reliably (it will work for small work loads on a fast machine, maybe).

@gterzian
Copy link
Member Author

gterzian commented Aug 19, 2019

@padenot Ok, thanks for the info.

In Gecko, are you implementing the AudioWorkletGlobalScope by running SM on the rendering thread and simply calling into the global-scope as part of audio processing, on the same thread?

I'm wondering how you go from processing the "native" audio graph on the rendering thread, to calling into the process method of a AudioProcessor, in the context of a AudioWorkletGlobalScope, when you encounter an AudioWorkletNode as part of the graph.

@asajeffrey
Copy link
Member

@padenot it sounds like there's a tension between security (which would indicate running the worklet code in the same process as the user content) and low-latency (which would indicate running the worklet code in the main process). Is the plan to resolve this by having a dedicated audio process?

@padenot
Copy link

padenot commented Aug 19, 2019

For now, we're calling into SM from the render thread and that's it.

The worklet code is user content, so I don't see the difference it makes compared to a normal script?

The preferred architecture is to remote the audio system calls (and thus the audio callbacks) to a privileged process, as mentioned in #23807 (comment), and to forward the real-time callbacks using a synchronous IPC call. This way, there is only two context switches. This has been measured to be acceptable even under heavy load if the respective thread priorities are set to prevent priority inversions.

@gterzian
Copy link
Member Author

gterzian commented Aug 19, 2019

It sounds like we essentially need two changes:

  1. Split servo/media into a backend running in it's own process, and a audio rendering thread running in the content-process.
  2. Run SM on the audio rendering thread, and simply call into registered AudioProcessor in the context of a AudioWorkletGlobalScope from that same thread. That thread would also have to handle messageports messages and forwrad them to the right audio processors(we probably woudln't need actual messageports for that, since they're mostly optimized for cross-process, we can use "implicit ports" implemented as threaded channels, like is done for dedicated workers).

I think this issue is mostly about 2, and is also mostly orthogonal to 1, which should happen anyway regardless.

@gterzian
Copy link
Member Author

@padenot One more question: on what basis do you make the privileged audio backend available to content? Is there one privileged process per origin, one for the entire UA, or something else?

@padenot
Copy link

padenot commented Aug 19, 2019

The content process (where the js runs, and also the DSP code in C++ that backs the native audio nodes) is sandboxed. The system calls required to open and run an audio stream don't work there.

The audio stream is therefore opened in the parent process, that has the capability to open an audio stream. When a real-time callback is called in the parent process, we make a synchronous IPC call to the content process, and in particular to a specific thread in the content process, that has been prioritized appropriately. The DSP code runs there. When the correct number of frames has been processed, this synchronous IPC call returns.

Any content process can open an audio stream remotely in the parent.

@gterzian
Copy link
Member Author

@padenot thank you.

Ok so I think this translates to running gstreamer in the "main" process, alongside the constellation, or we could put it in a separate process. In both cases we could have the constellation do the initial setup when an audiocontext is created by content, followed by setting up a direct ipc channel between the rendering thread in the content process, and the audio backend.

@Manishearth
Copy link
Member

We really do not want to run gstreamer in the main process, it should be in its own process.

@padenot
Copy link

padenot commented Aug 19, 2019

It does not really matter where audio stream resides, as long as the context switches are kept to an absolute minimum (2 per audio callback).

@gterzian
Copy link
Member Author

I think we can do that by setting up a direct ipc channel between the audio backend and the audio rendering thread, using the constellation only for initial orchestration when content creates an audio-context.

We would also probably need for the ipc to happen without an ipc-router thread, since that adds a context-switch per message. We could do that if we made the control-thread(the script-thread that created the audio context) also communicate with the rendering thread over ipc, even-though they are in the same process(The only reason we need a router thread is when we need to integrate ipc-messages with other multi-threaded messages in a single select).

@asajeffrey
Copy link
Member

@Manishearth I can understand how we might be able to run gstreamer audio in its own process, but doesn't gstreamer video need to be in the main process?

@Manishearth
Copy link
Member

Does it? I thought we need to ship frames over to webrender anyway?

But yeah if gstreamer is forced to be in the main process anyway we can ignore this. I'd really like for it to be elsewhere, though.

@asajeffrey
Copy link
Member

@Manishearth depends on whether we can share textures between processes.

@gterzian
Copy link
Member Author

gterzian commented Aug 29, 2019

Ok I'm interested in working on this, not right now but let's say fall/winter.

I propose the following outline of major work items, for your consideration:

  1. Separate the backend from the rendering thread.
  2. Backend lives either in the constellation/embedder process, or in it's own process. Switching between one or the other shouldn't in itself be difficult. Perhaps we start with just running the backend in the same process as the "main process"? We could still do the constellation/media backend communication over IPC, just so that it's easier to switch to either setup.
  3. Rendering thread lives in the content process, and starts when script starts using audio. Constellation is used for initial setup, followed by setting up a direct line of IPC communication between the rendering thread and the backend(Do we want one backend, or one per origin?).
  4. The "AudioWorklet" consists of one, or several, AudioWorkletProcessor, which is just another implemenation of media::audio::node::AudioNodeEngine. The difference with the others is that the process method calls into SpiderMonkey.
  5. we run SpiderMonkey on the audio rendering thread, probably using a similar "child runtime" setup as is done with dedicated workers.

I think doing 1 is going to require some changes across interfaces, I can't provide an exhaustive overview upfront, but here are some examples/thoughts:

AudioSink trait implementation will have to basically be a wrapper around IPC to the backend, and GStreamerAudioSink will have to be split up into one part of logic running before doing an IPC on the rendering thread, and another part on the receiving end of an IPC in the backend.

impl AudioBackend for GStreamerBackend doing a init_sink will similarly have to actually make an IPC call to the backend, and return the GStreamerAudioSink half that will act as a wrapper around IPC to the other half running in the backend(Or maybe the sink, in the form of a wrapper around IPC, can just be provided by the constellation when doing the initial setup).

I think we can replace the GstAppSrcCallbacks thread, with a blocking IPC call to the rendering thread right from within the need_data callback.

The "backend" part of GStreamerAudioSink will probably have to be a static ref, used from the need_data callback, as well as from an IPC router thread to handle start/stop messages from the rendering thread(By the way, anyone knows if there are any restrictions on what we can do inside need_data? Can we acquire locks and so on?).

We wouldn't need has_enough_data, since that workflow would be replaced by a sync IPC call from within the need_data callback fro a precise amount of data.

Here is what a render quantum would look like:

  1. In the backend process, the need_data callback executes. This makes a blocking IPC call to the rendering thread.
  2. The rendering thread wakes-up(unless already busy handling control messages), renders the graph, potentially calling into SM if there is an audio worklet in the graph.
  3. The rendering thread sends an IPC reply to the backend, an Option<Chunk>.
  4. backend wakes-up on the reply, and runs the push_data logic(the "backend half" of it).

Note that the rendering thread would also have to handle control messages, and messageport messages, coming from the "control thread", the script-thread that started using audio. I think we probably want to make those messages go over IPC as well, even though control and rendering are in the same process, just so that we can cut out the IPC router thread from the equation for both control and data messages, while running them using the same IPC-driven event-loop.

The audio sink on the backend would be a static ref Arc<Mutex<GStreamerAudioSink>> or similar, shared by the need_data callback as well as a IPC router thread to handle start and stop messages. The idea is that the mutex wouldn't see any contention(you might have to deal somehow with a need_data happening while the rendering thread has already sent a "stop" message, probably by simply sending a None as Option<Chunk> in the reply, just so that the blocking IPC call wakes-up, drops the mutex, and the router thread wanting to do a "stop" can then acquire it).

Please let me know what you think.

@gterzian
Copy link
Member Author

We could also have some fun and use shared-memory for the buffer of audio::block::Block.

@gterzian
Copy link
Member Author

gterzian commented Sep 16, 2019

Good read https://padenot.github.io/web-audio-perf/, contains some info on architecture and perf considerations, dated by some years and not covering worklets however...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
@padenot @asajeffrey @Manishearth @gterzian and others