AUDIO: Optimise mixing and rate converters#7385
AUDIO: Optimise mixing and rate converters#7385mikrosk wants to merge 8 commits intoscummvm:masterfrom
Conversation
7a106fa to
09fcca9
Compare
|
What is the rationale behind changing typedef int16 st_sample_t; |
|
Do we have 32-bit audio samples anywhere? |
To prepare it for the 32-bit samples (
No and that's why it works -- since ScummVM either upscales 8-bit samples or uses 16-bit samples, we can safely skip the clamping part with 32-bit |
Apologies, I do not understand what you are saying here. You say, we do not use 32-bit samples, still we do prepare for them? 🧐 |
Not only prepare, we treat them as such -- adding 16-bit samples, no matter how many, will never overflow in a 32-bit number. So this is an option for those which prefer:
So yes, while we are not aiming at 32-bit sample values we aim at 32-bit adding. |
So make it explicit in code.
AudioStream always works with int16 so make it explicit. The only exception here is Audio32::writeAudioInternal() where the targetBuffer actually is expected to be of type Audio::st_sample_t but that is just Audio32 class taking advantage of an existing converter. It could easily take byte or void pointer as input.
- RateConverter now accepts 32-bit input (although still with max 24-bit values) - make clampedAdd suited for 8-24 bit samples
Also, introduce upscaleConvert for e.g. 11025, 22050 -> 44100 conversions.
This is my attempt to tackle a couple of bullet points from https://planka.scummvm.org/cards/1382186040223074043, namely:
The former has been fully implemented, the latter has been implemented as an option not to clamp a muted buffer. I had a version where sample writes were tracked (or even changed from add to assign at the first access) but the results weren't too convincing and the code was just a mess. It seems that leaving memset to clear an aligned 4KB buffer performs much better than a few
ifs and calling it for clearing an unaligned 2 KB buffer.The commits should be pretty self-explanatory, all but the last one ("AUDIO: Template-optimize convert methods") keep old behaviour and have zero side effects.
As for performance, I have done a small benchmark: measuring number of ms spent in the callback (on the backend side). I measure baseline (no changes), "AUDIO: Make sample clamping optional" (as a safety check that nothing has regressed), the same commit + clamping done on the backend side on 32-bit sample pairs and finally the template optimisations. I tested all four (yes, there is four of them now) rate converters on the first 60s of

ft-demo:A few notes:
interpolateConv... as you can see, its separate implementation (basically an extendedcopyConv) in v4 drops by 74%Complete numbers:
P.S. I have rewritten v4 into m68k assembly and I was shocked to see that the gains were abysmal. With the template magic the compiler was able to basically allocate all registers the same way as I did.