Further investigation into Windows ARGB #2

sudara · 2023-11-08T09:12:58Z

I wasn't able to get better performance than baseline (gin) for ARGB version on Windows:

Routes tried:

Various IPP methods like separated convolution. These were faster, but broke down at higher radii, probably because under the hood they are a 2D kernel.
The "rotated" vector implementation (where the queue is vertical) with FloatVectorOperations — this was consistently slow
Rotated vector implementation in IPP — again, performance seems to be worse than 4x the single channel, no matter how it's implemented (looping around the channels, allocating 4 channels worth of queue/temp storage, etc)

Things to try:

Investigate whether Alpha needs to be blurred at all, given the pixels are premultiplied.
Custom dot-product with the kernel
The "journey of a pixel" sliding kernel idea from my blog

sudara added benchmarks investigation needed labels Nov 8, 2023

Provide feedback