AUDIO: Optimise mixing and rate converters by mikrosk · Pull Request #7385 · scummvm/scummvm

mikrosk · 2026-04-01T14:02:17Z

This is my attempt to tackle a couple of bullet points from https://planka.scummvm.org/cards/1382186040223074043, namely:

Optimise for cases where the volume is silent or at the maximum value.
Optimise for cases where the destination buffer is all or partially zeroed out.

The former has been fully implemented, the latter has been implemented as an option not to clamp a muted buffer. I had a version where sample writes were tracked (or even changed from add to assign at the first access) but the results weren't too convincing and the code was just a mess. It seems that leaving memset to clear an aligned 4KB buffer performs much better than a few ifs and calling it for clearing an unaligned 2 KB buffer.

The commits should be pretty self-explanatory, all but the last one ("AUDIO: Template-optimize convert methods") keep old behaviour and have zero side effects.

As for performance, I have done a small benchmark: measuring number of ms spent in the callback (on the backend side). I measure baseline (no changes), "AUDIO: Make sample clamping optional" (as a safety check that nothing has regressed), the same commit + clamping done on the backend side on 32-bit sample pairs and finally the template optimisations. I tested all four (yes, there is four of them now) rate converters on the first 60s of ft-demo:

A few notes:

I was suprised to see that the most used path on modern hardware (freq->44100) is left for interpolateConv... as you can see, its separate implementation (basically an extended copyConv) in v4 drops by 74%
muted path shows how much of a good idea it was to create a separate path for it
same for max volume -- the batch optimisations in v4 do some good but skipping volume multiplies helps, too

Complete numbers:

                 |    v1    |    v2    |    v3    |    v4    |  1→4     |
-----------------+----------+----------+----------+----------+----------+
 copyConv        |  16.61   |  16.53   |  14.57   |   9.56   |  -42.4%  |
 simpleConv      |  23.54   |  23.99   |  21.64   |  12.00   |  -49.0%  |
 upsampleConv    |  26.41   |  26.61   |  21.52   |   6.76   |  -74.4%  |
 interpolateConv |  31.35   |  31.26   |  25.97   |  19.74   |  -37.0%  |
 interp (muted)  |  30.43   |     -    |     -    |   9.94   |  -67.3%  |
-----------------+----------+----------+----------+----------+----------+

P.S. I have rewritten v4 into m68k assembly and I was shocked to see that the gains were abysmal. With the template magic the compiler was able to basically allocate all registers the same way as I did.

sev- · 2026-04-05T22:58:56Z

What is the rationale behind changing Audio::st_sample_t to int16, when it is

typedef int16 st_sample_t;

sev- · 2026-04-05T23:01:12Z

Do we have 32-bit audio samples anywhere?

mikrosk · 2026-04-06T05:27:35Z

@sev-

What is the rationale behind changing Audio::st_sample_t to int16, when it is
typedef int16 st_sample_t;

To prepare it for the 32-bit samples (outBuffer is declared as st_sample_t *, too).

Do we have 32-bit audio samples anywhere?

No and that's why it works -- since ScummVM either upscales 8-bit samples or uses 16-bit samples, we can safely skip the clamping part with 32-bit st_sample_t and do it as the last step.

sev- · 2026-04-06T23:18:25Z

No and that's why it works -- since ScummVM either upscales 8-bit samples or uses 16-bit samples, we can safely skip the clamping part with 32-bit st_sample_t and do it as the last step.

Apologies, I do not understand what you are saying here. You say, we do not use 32-bit samples, still we do prepare for them? 🧐

mikrosk · 2026-04-06T23:26:32Z

Apologies, I do not understand what you are saying here. You say, we do not use 32-bit samples, still we do prepare for them? 🧐

Not only prepare, we treat them as such -- adding 16-bit samples, no matter how many, will never overflow in a 32-bit number. So this is an option for those which prefer:

a) not early-clamped samples (imagine adding -32000, -15000 and 15000 from three channels: current implementation would produce (-32000 - 15000) + 15000 = -32768 + 15000 = -17768 while 32-bit adding would produce -32000).
b) doing clamping as the last step => huge speed optimisation

So yes, while we are not aiming at 32-bit sample values we aim at 32-bit adding.

So make it explicit in code.

AudioStream always works with int16 so make it explicit. The only exception here is Audio32::writeAudioInternal() where the targetBuffer actually is expected to be of type Audio::st_sample_t but that is just Audio32 class taking advantage of an existing converter. It could easily take byte or void pointer as input.

- RateConverter now accepts 32-bit input (although still with max 24-bit values) - make clampedAdd suited for 8-24 bit samples

Also, introduce upscaleConvert for e.g. 11025, 22050 -> 44100 conversions.

mikrosk force-pushed the audio_opt_pr branch 2 times, most recently from 7a106fa to 09fcca9 Compare April 2, 2026 01:41

sev- reviewed Apr 5, 2026

View reviewed changes

Comment thread audio/rate.cpp Outdated

mikrosk force-pushed the audio_opt_pr branch from 09fcca9 to 9274f5d Compare April 6, 2026 05:24

mikrosk mentioned this pull request Apr 6, 2026

AUDIO: Introduce print_rate #7365

Merged

mikrosk added 3 commits April 21, 2026 12:13

AUDIO: Fix constness

bd540c9

AUDIO: Input stream always works with int16

51995b0

So make it explicit in code.

mikrosk force-pushed the audio_opt_pr branch from 9274f5d to 4ff549c Compare April 21, 2026 02:18

mikrosk added 5 commits April 21, 2026 13:37

AUDIO: Add support for 32-bit sample size

2416b23

- RateConverter now accepts 32-bit input (although still with max 24-bit values) - make clampedAdd suited for 8-24 bit samples

AUDIO: RateConverter now supports unclamped mixing

78c48c8

AUDIO: Make sample clamping optional

5327056

AUDIO: Don't clamp muted channels

ebc0ae1

AUDIO: Template-optimize convert methods

434e0e7

Also, introduce upscaleConvert for e.g. 11025, 22050 -> 44100 conversions.

mikrosk force-pushed the audio_opt_pr branch from 4ff549c to 434e0e7 Compare April 21, 2026 03:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AUDIO: Optimise mixing and rate converters#7385

AUDIO: Optimise mixing and rate converters#7385
mikrosk wants to merge 8 commits intoscummvm:masterfrom
mikrosk:audio_opt_pr

mikrosk commented Apr 1, 2026

Uh oh!

sev- commented Apr 5, 2026

Uh oh!

Uh oh!

sev- commented Apr 5, 2026

Uh oh!

mikrosk commented Apr 6, 2026

Uh oh!

sev- commented Apr 6, 2026

Uh oh!

mikrosk commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mikrosk commented Apr 1, 2026

Uh oh!

sev- commented Apr 5, 2026

Uh oh!

Uh oh!

sev- commented Apr 5, 2026

Uh oh!

mikrosk commented Apr 6, 2026

Uh oh!

sev- commented Apr 6, 2026

Uh oh!

mikrosk commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants