Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the decoders a bit #1

Merged
merged 1 commit into from Jun 23, 2022
Merged

Optimize the decoders a bit #1

merged 1 commit into from Jun 23, 2022

Conversation

aras-p
Copy link
Contributor

@aras-p aras-p commented Jun 22, 2022

The actual changes are fairly simple, and most of them are like "instead of doing work at byte level, do it at integer level".

In my tests, on windows (vs2022, ryzen 5950x):

  • BC1 821->1327 Mpix/s
  • BC3 516->694
  • BC6H 65->85
  • BC7 91->143

On mac (clang 13, M1 Max):

  • BC1 804->2037
  • BC3 585->1062
  • BC6H 63->76
  • BC7 113->212

With the speed bump, this makes it one of the fastest BCn decoders out there, actually (with some exceptions in some formats). I plan to write about this more somewhere, but here's a sneak peek (higher numbers are better; bcdec is upstream repo, bcdec_opt is with this PR)
Clipboard01

In my tests, on windows (vs2022, ryzen 5950x):
- BC1 821->1327 Mpix/s
- BC3 516->694
- BC6H 65->85
- BC7 91->143
On mac (clang 13, M1 Max):
- BC1 804->2037
- BC3 585->1062
- BC6H 63->76
- BC7 113->212
@iOrange iOrange merged commit 3711543 into iOrange:main Jun 23, 2022
@iOrange
Copy link
Owner

iOrange commented Jun 23, 2022

Thanks! The bits pulling optimization is quite clever, I went the lazy way, got really tired hard-coding all those partitions tables :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants