Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCL: speeded up ocl::distanceToCenters #1759

Merged
merged 1 commit into from Nov 8, 2013

Conversation

ilya-lavrenov
Copy link
Contributor

@hgaspar
Copy link
Contributor

hgaspar commented Nov 6, 2013

Guys, what is the current performance? Better than 20x slower than cpu i hope? was this because something got broken at some point?

@ghost ghost assigned apavlenko Nov 7, 2013
@ilya-lavrenov
Copy link
Contributor Author

@hgaspar, previous performance:

distanceToCenters::distanceToCentersFixture::(256x256, NORM_L1)      2.244 ms    34.951 ms   0.06
distanceToCenters::distanceToCentersFixture::(256x256, NORM_L2SQR)   3.018 ms    35.044 ms   0.09
distanceToCenters::distanceToCentersFixture::(512x512, NORM_L1)      7.505 ms    155.863 ms  0.05
distanceToCenters::distanceToCentersFixture::(512x512, NORM_L2SQR)   7.150 ms    155.907 ms  0.05

current performance:

distanceToCenters::distanceToCentersFixture::(256x256, NORM_L1)      1.961 ms    1.903 ms    1.03
distanceToCenters::distanceToCentersFixture::(256x256, NORM_L2SQR)   3.163 ms    1.742 ms    1.82
distanceToCenters::distanceToCentersFixture::(512x512, NORM_L1)      7.757 ms    10.135 ms   0.77
distanceToCenters::distanceToCentersFixture::(512x512, NORM_L2SQR)   7.152 ms    10.239 ms   0.70

Cycle, that performs calculation of vector distance (L1 or L2SQR), has wrong condition and as a result only 1/4 of all the calculations were vectorized. And I change calculation scheme: previously work-item calculates distance for each row of centers, e.g. if centers contains n rows, then each work-item calculates n distances. I think it's too much work for each work-item and currently OpenCL kernel contains only one distance calculation.
These fixes and optimizations give resultant acceleration - faster 15-20x then previous, but CPU version is also very fast because it's optimized using TBB and SSE2.

@apavlenko
Copy link
Contributor

👍

@opencv-pushbot opencv-pushbot merged commit 56d9433 into opencv:2.4 Nov 8, 2013
@ilya-lavrenov ilya-lavrenov deleted the ocl_distanceToCenters branch November 8, 2013 08:41
@pengx17
Copy link
Contributor

pengx17 commented Nov 11, 2013

Oh, I did not realize the bug in previous dist implementation ... and it shocked me that find the best match is much faster on CPU.
Thanks for the in sights 👍

@SpecLad SpecLad mentioned this pull request Nov 11, 2013
@opoplawski
Copy link

It appears that this broke API and ABI compatibility for distanceToCenters. Isn't that a bad thing to do in a stable release series?

@apavlenko
Copy link
Contributor

2.4.x releases kept backward binary compatibility, but this was not true for ocl module that was in active development stage in 2013 (that was declared from the very beginning).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants