-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Description
Current implementation of drawPaletted() function has the following calls to sqDiff() function in its hot path:
Line 628 in a31e0a4
| sum := sqDiff(er, p[0]) + sqDiff(eg, p[1]) + sqDiff(eb, p[2]) + sqDiff(ea, p[3]) |
Number of executions of this line for each drawPaletted() call is between width×height and width×height×palette size.
Here's how sqDiff() currently implemented:
Lines 562 to 574 in a31e0a4
| // sqDiff returns the squared-difference of x and y, shifted by 2 so that | |
| // adding four of those won't overflow a uint32. | |
| // | |
| // x and y are both assumed to be in the range [0, 0xffff]. | |
| func sqDiff(x, y int32) uint32 { | |
| var d uint32 | |
| if x > y { | |
| d = uint32(x - y) | |
| } else { | |
| d = uint32(y - x) | |
| } | |
| return (d * d) >> 2 | |
| } |
It can be reduced to:
func sqDiff(x, y int32) uint32 {
d := uint32(x - y)
return (d * d) >> 2
}
This relies on overflows but produces the same result, see https://play.golang.org/p/6q3Cvqk1k7
While the change itself is rather miniscule, the net effect of it being in the hot path of the drawPaletted() is noticeable in benchmarks, i.e. QuantizedEncode benchmark from the image/gif package shows significant improvements after such change applied:
name old time/op new time/op delta
QuantizedEncode-4 880ms ± 2% 482ms ± 2% -45.20% (p=0.008 n=5+5)
name old speed new speed delta
QuantizedEncode-4 1.40MB/s ± 2% 2.55MB/s ± 2% +82.52% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
QuantizedEncode-4 417kB ± 0% 417kB ± 0% ~ (all equal)
name old allocs/op new allocs/op delta
QuantizedEncode-4 13.0 ± 0% 13.0 ± 0% ~ (all equal)
Please let me know if this is something that can be accepted as a CL.