You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replace SAD function with assembly version ~ 25% faster
However ABSDIFF macro (the assembly version) uses hardcoded image width of 64 px, while compute_sad_8x8 has variable image width. So I made an experiment - I added timing + debug output to the code and measured compute_sad_8x8(..., (uint16_t) FRAME_SIZE) vs ABSDIFF(...) vs compute_sad_8x8(..., 64); and I got that the last one is actually the fastest. Also, the speed up from compute_sad_8x8(..., (uint16_t) FRAME_SIZE) to ABSDIFF(...) is far less than 25% for me - more like 8%.
Assuming somebody can reproduce my results - maybe we can just delete ABSDIFF macro and use compute_sad_8x8(..., 64) instead?
The text was updated successfully, but these errors were encountered:
The commit 24b790d states:
However
ABSDIFF
macro (the assembly version) uses hardcoded image width of 64 px, whilecompute_sad_8x8
has variable image width. So I made an experiment - I added timing + debug output to the code and measuredcompute_sad_8x8(..., (uint16_t) FRAME_SIZE)
vsABSDIFF(...)
vscompute_sad_8x8(..., 64)
; and I got that the last one is actually the fastest. Also, the speed up fromcompute_sad_8x8(..., (uint16_t) FRAME_SIZE)
toABSDIFF(...)
is far less than 25% for me - more like 8%.Assuming somebody can reproduce my results - maybe we can just delete
ABSDIFF
macro and usecompute_sad_8x8(..., 64)
instead?The text was updated successfully, but these errors were encountered: