Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further investigation into Windows ARGB #2

Open
3 of 6 tasks
sudara opened this issue Nov 8, 2023 · 0 comments
Open
3 of 6 tasks

Further investigation into Windows ARGB #2

sudara opened this issue Nov 8, 2023 · 0 comments

Comments

@sudara
Copy link
Owner

sudara commented Nov 8, 2023

I wasn't able to get better performance than baseline (gin) for ARGB version on Windows:

Routes tried:

  • Various IPP methods like separated convolution. These were faster, but broke down at higher radii, probably because under the hood they are a 2D kernel.
  • The "rotated" vector implementation (where the queue is vertical) with FloatVectorOperations — this was consistently slow
  • Rotated vector implementation in IPP — again, performance seems to be worse than 4x the single channel, no matter how it's implemented (looping around the channels, allocating 4 channels worth of queue/temp storage, etc)

Things to try:

  • Investigate whether Alpha needs to be blurred at all, given the pixels are premultiplied.
  • Custom dot-product with the kernel
  • The "journey of a pixel" sliding kernel idea from my blog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant