Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add per-event multithreaded rendering #636

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

rcombs
Copy link
Member

@rcombs rcombs commented Jul 27, 2022

Long-awaited successor to #107.

TODO:

  • Make sure we require a new enough (thread-safe) fribidi
  • Gate CONFIG_PTHREAD on <stdatomic.h> availability
  • Implement the pthread functions used for win32 (anybody wanna take this one? this only really uses some fairly basic functions)
  • Detect CPU count on Win32
  • Provide an API to set thread count

And of course, this'll need some extensive testing (I've put it through some basics under asan/tsan, but I'm sure other folks have other ideas; stuff with a lot of weird fonts is likely to be tricky).

Note that this PR can be merged piecemeal; any initial series of commits will work fine without the rest. In particular, we could merge the portion prior to configure: detect pthreads ahead of the last few commits that actually add pthread-specific code.

@rcombs rcombs requested review from astiob, MrSmile and TheOneric and removed request for MrSmile July 27, 2022 09:50
@rcombs rcombs force-pushed the threading branch 3 times, most recently from bfcfbbb to 053f80e Compare July 27, 2022 10:13
libass/ass_rasterizer.h Outdated Show resolved Hide resolved
libass/ass_cache.c Outdated Show resolved Hide resolved
libass/ass_cache.c Show resolved Hide resolved
@rcombs rcombs force-pushed the threading branch 3 times, most recently from aecd930 to 0e138c3 Compare July 27, 2022 22:09
libass/ass_render.h Outdated Show resolved Hide resolved
configure.ac Outdated Show resolved Hide resolved
Copy link
Member

@astiob astiob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review: I’ve looked at all the commits before pthreads. They look good to me sans a few nits. 👍 That said, because I haven’t reviewed the threading commits yet, I can’t vouch that the set of things moved inside RenderContext is complete and minimal.

One commit message, “ass_render: take BitmapContext* in render_text()”, mistakenly refers to “BitmapContext” instead of RenderContext.

By the way: if we’re gonna use stdatomic.h, why not use C11 threads and mutexes as well? Is it that stdatomic.h is easier to polyfill provide our own fallback for? For reference, Visual C++/UCRT don’t have any of this yet, but it supposedly is all “on the roadmap”.

libass/ass_render.c Outdated Show resolved Hide resolved
libass/ass_bitmap.h Outdated Show resolved Hide resolved
@rcombs
Copy link
Member Author

rcombs commented Jul 30, 2022

C11 threads aren't supported on Darwin/macOS yet, and introduce a dependency on a relatively recent libc on Linux. Pthreads and atomics are both supported everywhere relevant (including Windows via MinGW). If we really cared about threading in win32 builds that don't bring their own winpthread wrapper, we could provide our own header-only polyfill just for that, and if we really cared about threading in MSVC builds (I still don't get why anyone would want to do this…), we could polyfill that as well. I don't think any of that needs to block this as long as MinGW builds against an external pthread package work, though.

@rcombs
Copy link
Member Author

rcombs commented Jul 30, 2022

@TheOneric There should be no need for -pthreads in CFLAGS or when linking against shared libass; no pthread structs or symbols are part of the public API or ABI as of this PR. So your patch looks reasonable, and I'll experiment with it a bit more soon.

@astiob, I've also fixed the commit message you mentioned.

More generally: this has improved overall performance on every case I've tried so far (though I haven't tested any that only ever have a single event per frame; those obviously won't see any benefit, but we should at least confirm the added overhead isn't too severe. If so, it might be worth falling back to the single-thread path when only one event is selected.). However, the improvement hasn't been as large as I'd have liked in some cases, and the added CPU usage from mutex overhead in the caches is sometimes substantial, so it's probably worth experimenting a bit more to see if we can reduce that pre-merge.

We might also want to tune the thread count a little more carefully (like, maybe we only benefit from using up to some particular number? Are there specific tweaks worth making for big.LITTLE systems, like core-affinity pinning?).

What do we think an API for setting thread count should look like? We'd either have to always spawn [NCORES] at renderer-setup and then potentially rejoin+restart them when a count change is requested, or lazy-spawn the threads only when a render call is made.

@DanOscarsson
Copy link
Contributor

As this does make things faster when there is one event per frame I think you have to ensure that that case is not much slower.
I use mpv and in it libass is used for subtitles and for "on screen display".
In the case "on screen display" there will often be more then one event per frame but thus case is only used now and then.
But for the normal case with subtitles will in most cases only have one event per frame.
The most common (I suspect just below 100 %) use of subtitles if for movies and TV series which almost always use .srt format. The same is valid for most video viewers that use libass for subtitle rendering. While .ass is used for anime the most common viewed videos use .srt and as libass is used by many video players to show subtitles it is important that libass will not be slower/use more CPU for that type of subtitles.

@rcombs rcombs force-pushed the threading branch 2 times, most recently from cc1e523 to 66cf76c Compare July 31, 2022 09:32
@rcombs
Copy link
Member Author

rcombs commented Jul 31, 2022

I just pushed a commit with some cache experiments that should make most hits lock-free (and misses lower-overhead). It reduces the amount of time we spend in mutex calls dramatically, but still needs some more cleanup (in particular, it currently doesn't build when pthreads are disabled).

@rcombs rcombs force-pushed the threading branch 2 times, most recently from 41440fe to bb31b71 Compare August 1, 2022 05:57
Copy link
Member

@TheOneric TheOneric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I didn't miss something, there's no special handling for external function calls (only the message callback?) from threads. Is your intent to change the API to require callbacks to be thread-safe in the future?

(Still not a complete review)

libass/ass_render.c Outdated Show resolved Hide resolved
@TheOneric TheOneric linked an issue Sep 15, 2022 that may be closed by this pull request
@rcombs
Copy link
Member Author

rcombs commented Apr 28, 2024

Replaced the FT_Size_Metrics cache with a hb_font_t* cache, which has negligible additional memory cost but saves some time creating and destroying sub-fonts.

@rcombs rcombs force-pushed the threading branch 6 times, most recently from d5f0948 to 908ed1a Compare April 30, 2024 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-threading!
9 participants