Optimize the decoders a bit #1

aras-p · 2022-06-22T20:19:30Z

The actual changes are fairly simple, and most of them are like "instead of doing work at byte level, do it at integer level".

In my tests, on windows (vs2022, ryzen 5950x):

BC1 821->1327 Mpix/s
BC3 516->694
BC6H 65->85
BC7 91->143

On mac (clang 13, M1 Max):

BC1 804->2037
BC3 585->1062
BC6H 63->76
BC7 113->212

With the speed bump, this makes it one of the fastest BCn decoders out there, actually (with some exceptions in some formats). I plan to write about this more somewhere, but here's a sneak peek (higher numbers are better; bcdec is upstream repo, bcdec_opt is with this PR)

In my tests, on windows (vs2022, ryzen 5950x): - BC1 821->1327 Mpix/s - BC3 516->694 - BC6H 65->85 - BC7 91->143 On mac (clang 13, M1 Max): - BC1 804->2037 - BC3 585->1062 - BC6H 63->76 - BC7 113->212

iOrange · 2022-06-23T01:24:52Z

Thanks! The bits pulling optimization is quite clever, I went the lazy way, got really tired hard-coding all those partitions tables :)

Optimize the decoders a bit

cd3f93e

In my tests, on windows (vs2022, ryzen 5950x): - BC1 821->1327 Mpix/s - BC3 516->694 - BC6H 65->85 - BC7 91->143 On mac (clang 13, M1 Max): - BC1 804->2037 - BC3 585->1062 - BC6H 63->76 - BC7 113->212

iOrange merged commit 3711543 into iOrange:main Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the decoders a bit #1

Optimize the decoders a bit #1

aras-p commented Jun 22, 2022

iOrange commented Jun 23, 2022

Optimize the decoders a bit #1

Optimize the decoders a bit #1

Conversation

aras-p commented Jun 22, 2022

iOrange commented Jun 23, 2022