Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio Worklet support #81

Open
mmontag opened this issue Jun 18, 2021 · 4 comments
Open

Audio Worklet support #81

mmontag opened this issue Jun 18, 2021 · 4 comments
Labels
enhancement New feature or request

Comments

@mmontag
Copy link
Owner

mmontag commented Jun 18, 2021

It would be really nice to use Audio Worklets.

ScriptProcessorNode renders audio on the UI thread and glitches during scrolling, window resize, etc.
This is really not acceptable for a music player and the ScriptProcessorNode deprecation warning has showed up in the Chrome console for a long time now.

Might solve some of the glitch reports too.

It's widely supported: https://caniuse.com/mdn-api_audioworklet
https://developers.google.com/web/updates/2018/06/audio-worklet-design-pattern

@padenot
Copy link

padenot commented Jul 28, 2023

Hi, just found this really cool, what architecture do you prefer for this?

What I'd recommend is to run all the synths in a Web Worker, and to communicate with a wait-free ring-buffer to a very simple AudioWorkletProcessor that can do interleave/deinterleave, sample format conversion and the like.

This way, it works like your regular media file player (one thread decodes the audio, one thread plays it back), almost immune to glitches (except if the device is completely overloaded, of course).

If this is a design that would work for you, I've written some material to help:

The only requirements for this to work are to serve the website with two headers:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

so that it's put in an isolated process in the web browser. This MDN link explains why this is unfortunately necessary.

Also, it would certainly be possible to keep the existing code, and to use an AudioWorkletProcessor when/if possible (recent browsers, correct header set, etc.). But as you say AudioWorkletProcessor is generally available now, and SharedArrayBuffer as well.

Also, I'd like to provide some context about the meaning of "deprecation" in the context of the Web Platform: https://lists.w3.org/Archives/Public/public-audio/2023JanMar/0003.html (tl;dr it's not going to be removed, no rush).

@mmontag
Copy link
Owner Author

mmontag commented Jul 31, 2023

Hi @padenot thanks for sharing all of this! I appreciate your insights.

More than a year ago, I attempted to use Audio Worklets and I think my approach was wrong:
043894

As I recall, it felt like I was stacking up weird hacks, and wrote at the time:

// After all this work, there were still audio glitches[...]
// The only way to avoid it is to fill a ring buffer on a *worker* thread that is also readable from
// AudioWorklet thread. And then getting into the world of shared array buffers which are still poorly
// supported, compounding the browser issues.

Ah okay; ring buffer and shared array buffers are needed, but I still have many questions.

To pick one example:

The Chip Player wasm binary relies on the Emscripten virtual file system, backed by IndexedDB. For example, some MDX music files use PDX audio sample files in the same folder. The MDX C library uses file I/O to read the PDX file. (I preload the PDX into the virtual file system with a network fetch.) How do we do Audio Worklets (or Web Workers) in this case? Where the code writing audio samples also needs IndexedDB API? In my worklet branch, I stubbed all the Emscripten filesystem code to use MEMFS instead of IDBFS. But MEMFS is no good because it does not persist across sessions.

If these questions reveal a misunderstanding on my part, please do share.

@padenot
Copy link

padenot commented Jul 31, 2023

The Chip Player wasm binary relies on the Emscripten virtual file system, backed by IndexedDB. For example, some MDX music files use PDX audio sample files in the same folder. The MDX C library uses file I/O to read the PDX file. (I preload the PDX into the virtual file system with a network fetch.) How do we do Audio Worklets (or Web Workers) in this case? Where the code writing audio samples also needs IndexedDB API? In my worklet branch, I stubbed all the Emscripten filesystem code to use MEMFS instead of IDBFS. But MEMFS is no good because it does not persist across sessions.

Web Workers can use IndexedDB and make network requests normally, so this shouldn't be a problem. I think it's not a misunderstanding of your part, it's probably a lack of good documentation of the various moving parts here.

In this model, the sound generation happens in the worker, the UI is only concerned about rendering the UI, the visualization, user interaction of course, and orchestrating all of this (e.g., start loading a tune and start playback when an entry in the browser is clicked). The AudioWorkletProcessor is just going to play the audio samples generated by the worker.

If we describe a standard scenario of opening the web app and playing a tune, it would go like this (I tried to explain as many details as possible, maybe there are trivial things in there):

  • The web app loads.
  • The main thread creates an instance of the RingBuffer class, able to contains audio samples (we can configure its duration to trade memory usage again robustness when the machine is overloaded). This RingBuffer has two ends: the producing ends, where writing happens, and the consuming end, where reading happens.
  • It then creates an AudioWorkletProcessor, and hands off the consuming end of the ring buffer -- it then suspends the AudioContext to save resources. When and AudioContext is suspended, the process method of an AudioWorkletProcessor is not called, and everything is more or less in idle state
  • It then creates a Web Worker containing all the WASM stuff, this is all instantiated and set up very much like it's done on the main thread currently. In particular, it can use IndexedDB and fetch. The main thread hands off the producing end of the ring buffer.
  • The user clicks a tune to play it -- the main thread sends a message via postMessage to the Web Worker with information to play the tune, and resumes the AudioContext.
  • The AudioWorkletProcessor's process method starts being called. In this method, it checks if there's any audio samples in the consuming end of the ring buffer. If there is none, it returns true. This plays silence out, but by returning true, this method will be called again.
  • The worker thread fetches the various resources that it needs and prepares playback, potentially using fetch, potentially using IndexedDB, via Emscripten facilities -- this is all very similar to the current architecture, but in a worker
  • The worker inspects the ring buffer, sees it is empty, and starts to produce audio samples to start filling it up
  • At this point, without any explicit communication, the AudioWorkletProcessor notices that there are samples to play out in the ring buffer, and plays them out, by popping them from the ring buffer into its output buffer argument
  • The worker continuously checks if there's a need to produce more audio samples by looking at how much empty space there is in the ring buffer -- whenever the number of samples goes below a configurable threshold, it produces more samples, and writes them out. In a playback scenario like this, we can imagine buffering a few hundred milliseconds of audio at a time
  • If at any point the main thread becomes unresponsive (you mentioned resizing the window, but it could be because we're loading a big folder, or something), the Web Worker and the AudioWorklet are still going to be called on time, they're not affected by the main thread load. Besides, the thread on which the AudioWorkletProcessor runs has the highest scheduling priority at the OS level, so it pre-empts everything to ensure a smooth playback

An alternative approach, and that look like what you've tried, is to do the sound generation within the AudioWorkletProcessor. This would be a preferred approach if we're doing a clean-sheet design, e.g. writing a new synthesizer, with complete control on how IOs are made, and we can preload everything ahead of time. This is because the AudioWorkletProcessor, by design, can only do real-time safe operations, very much like in native code.

Here, because we're using a piece of code that already does everything (IO, sound synthesis, etc., intermixed), we need to resort to running the code normally in a worker, and then playing the audio out -- but we can move everything out of the main thread to make the app very robust against load. The same architecture is used when running e.g. emulators on the web, or other piece of code where the separation between real-time digital signal processing code and everything else is not clear, maybe because back in the days, it was all single-thread in one big run loop.

In short, three pieces:

  • The main thread does UI stuff and all the coordination (play/pause/load/etc.)
  • The Worker does the heavy lifting -- fetching resources, doing "file IOs" via IndexedDB, actually generating the samples
  • The AudioWorkletProcessor plays the audio out

@mmontag mmontag added the enhancement New feature or request label Dec 13, 2023
@mmontag
Copy link
Owner Author

mmontag commented Jun 18, 2024

@padenot I just wanted to say thanks again for the writeup, and I haven't forgotten about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants