Skip to content

AUDIO: Optimise mixing and rate converters#7385

Open
mikrosk wants to merge 8 commits intoscummvm:masterfrom
mikrosk:audio_opt_pr
Open

AUDIO: Optimise mixing and rate converters#7385
mikrosk wants to merge 8 commits intoscummvm:masterfrom
mikrosk:audio_opt_pr

Conversation

@mikrosk
Copy link
Copy Markdown
Contributor

@mikrosk mikrosk commented Apr 1, 2026

This is my attempt to tackle a couple of bullet points from https://planka.scummvm.org/cards/1382186040223074043, namely:

  • Optimise for cases where the volume is silent or at the maximum value.
  • Optimise for cases where the destination buffer is all or partially zeroed out.

The former has been fully implemented, the latter has been implemented as an option not to clamp a muted buffer. I had a version where sample writes were tracked (or even changed from add to assign at the first access) but the results weren't too convincing and the code was just a mess. It seems that leaving memset to clear an aligned 4KB buffer performs much better than a few ifs and calling it for clearing an unaligned 2 KB buffer.

The commits should be pretty self-explanatory, all but the last one ("AUDIO: Template-optimize convert methods") keep old behaviour and have zero side effects.

As for performance, I have done a small benchmark: measuring number of ms spent in the callback (on the backend side). I measure baseline (no changes), "AUDIO: Make sample clamping optional" (as a safety check that nothing has regressed), the same commit + clamping done on the backend side on 32-bit sample pairs and finally the template optimisations. I tested all four (yes, there is four of them now) rate converters on the first 60s of ft-demo:
image
A few notes:

  • I was suprised to see that the most used path on modern hardware (freq->44100) is left for interpolateConv... as you can see, its separate implementation (basically an extended copyConv) in v4 drops by 74%
  • muted path shows how much of a good idea it was to create a separate path for it
  • same for max volume -- the batch optimisations in v4 do some good but skipping volume multiplies helps, too

Complete numbers:

                 |    v1    |    v2    |    v3    |    v4    |  1→4     |
-----------------+----------+----------+----------+----------+----------+
 copyConv        |  16.61   |  16.53   |  14.57   |   9.56   |  -42.4%  |
 simpleConv      |  23.54   |  23.99   |  21.64   |  12.00   |  -49.0%  |
 upsampleConv    |  26.41   |  26.61   |  21.52   |   6.76   |  -74.4%  |
 interpolateConv |  31.35   |  31.26   |  25.97   |  19.74   |  -37.0%  |
 interp (muted)  |  30.43   |     -    |     -    |   9.94   |  -67.3%  |
-----------------+----------+----------+----------+----------+----------+

P.S. I have rewritten v4 into m68k assembly and I was shocked to see that the gains were abysmal. With the template magic the compiler was able to basically allocate all registers the same way as I did.

@mikrosk mikrosk force-pushed the audio_opt_pr branch 2 times, most recently from 7a106fa to 09fcca9 Compare April 2, 2026 01:41
@sev-
Copy link
Copy Markdown
Member

sev- commented Apr 5, 2026

What is the rationale behind changing Audio::st_sample_t to int16, when it is

typedef int16 st_sample_t;

Comment thread audio/rate.cpp Outdated
@sev-
Copy link
Copy Markdown
Member

sev- commented Apr 5, 2026

Do we have 32-bit audio samples anywhere?

@mikrosk
Copy link
Copy Markdown
Contributor Author

mikrosk commented Apr 6, 2026

@sev-

What is the rationale behind changing Audio::st_sample_t to int16, when it is

typedef int16 st_sample_t;

To prepare it for the 32-bit samples (outBuffer is declared as st_sample_t *, too).

Do we have 32-bit audio samples anywhere?

No and that's why it works -- since ScummVM either upscales 8-bit samples or uses 16-bit samples, we can safely skip the clamping part with 32-bit st_sample_t and do it as the last step.

@sev-
Copy link
Copy Markdown
Member

sev- commented Apr 6, 2026

No and that's why it works -- since ScummVM either upscales 8-bit samples or uses 16-bit samples, we can safely skip the clamping part with 32-bit st_sample_t and do it as the last step.

Apologies, I do not understand what you are saying here. You say, we do not use 32-bit samples, still we do prepare for them? 🧐

@mikrosk
Copy link
Copy Markdown
Contributor Author

mikrosk commented Apr 6, 2026

Apologies, I do not understand what you are saying here. You say, we do not use 32-bit samples, still we do prepare for them? 🧐

Not only prepare, we treat them as such -- adding 16-bit samples, no matter how many, will never overflow in a 32-bit number. So this is an option for those which prefer:

  • a) not early-clamped samples (imagine adding -32000, -15000 and 15000 from three channels: current implementation would produce (-32000 - 15000) + 15000 = -32768 + 15000 = -17768 while 32-bit adding would produce -32000).
  • b) doing clamping as the last step => huge speed optimisation

So yes, while we are not aiming at 32-bit sample values we aim at 32-bit adding.

mikrosk added 3 commits April 21, 2026 12:13
So make it explicit in code.
AudioStream always works with int16 so make it explicit. The only
exception here is Audio32::writeAudioInternal() where the targetBuffer
actually is expected to be of type Audio::st_sample_t but that is just
Audio32 class taking advantage of an existing converter. It could easily
take byte or void pointer as input.
mikrosk added 5 commits April 21, 2026 13:37
- RateConverter now accepts 32-bit input (although still with max 24-bit
  values)

- make clampedAdd suited for 8-24 bit samples
Also, introduce upscaleConvert for e.g. 11025, 22050 -> 44100
conversions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants