-
-
Notifications
You must be signed in to change notification settings - Fork 56.3k
Use T-API for critical parts of blobFromImagesWithParams #23894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@kallaballa, thank you for the pull request. I have some doubts regarding it.
|
Alright. At least I'll provide some numbers based on my machine. |
I modified one of the V4D demos to track face detection time using In this scenario 1000 iterations of FaceDetectorYN::detect take:
I'll add flame graphs. I understand your arguments but given the simplicity of the patch the risk/gain ratio isn't so bad :) |
Btw. I made similar patches e.g for TrackerKCF with considerable performance gain. I guess I should drop those? |
@kallaballa, thank you for the quick response. The acceleration in your case is noticeable, indeed! What's the resolution of images that you feed to blobFromImagesWithParams? I think, I can propose a compromise solution that will make everybody happy.
|
960x540
Sounds good. Will implement it like that. |
c231091
to
e026820
Compare
4610c9d
to
8f72b3b
Compare
fixed separate code paths for face detect
8f72b3b
to
00804cb
Compare
I think that's it. 00804cb |
If you are alright with the general approach, i'd improve the implementation a bit more. |
Also I wrote a test that compares detection. there are differences (some frames not detected with UMat) I am trying to track down. |
modules/dnn/src/dnn_utils.cpp
Outdated
void getChannelFromBlob(UMat& m, InputArray blob, int i, int j, int rows, int cols, int type) { | ||
UMat ublob = blob.getUMat(); | ||
int offset = i * cols + j; | ||
int offset = (i * ublob.step.p[0] + j * ublob.step.p[1]) / ublob.elemSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to take into account step() and elemSize() for the offset. Now it works on par.
I have put other more code on the fast-path by porting NaryEltwiseLayer to UMat. I understand given the developments around 5.0 that this doesn't have priority but there is considerable gain. Should i post figures and make a PR? Only part left to port is ResizeLayer to keep it on the GPU all the time. |
modules/dnn/src/dnn_utils.cpp
Outdated
if(blob_.kind() == _InputArray::UMAT) | ||
blob = blob_.getUMat(); | ||
else if(blob_.kind() == _InputArray::MAT) { | ||
blob = blob_.getMat().getUMat(flag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current UMat design has limitation for storing results of .getMat()
/ .getUMat
somewhere (should be used locally only) - upstream lifetime check should pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so clone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.clone()
is overkill.
blob_.getMat()
return temporary object - it should be alive till .getUMat(flag)
release.
We don't need this method at all as there is _InputArray::getUMat()
: https://github.com/opencv/opencv/blob/4.8.0/modules/core/src/matrix_wrap.cpp#L126C5-L126C27
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx!
modules/dnn/src/dnn_utils.cpp
Outdated
void blobFromImagesWithParams(InputArrayOfArrays images, OutputArray blob, const Image2BlobParams& param) { | ||
CV_TRACE_FUNCTION(); | ||
|
||
if (images.kind() == _InputArray::STD_VECTOR_UMAT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
broken indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
padWithDivisor(input_image, pad_image); | ||
// Build blob from input image | ||
input_blob = dnn::blobFromImage(pad_image); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vpisarev T-API declares what we should not have such code separation on "user" side.
@opencv-alalek @vpisarev Is it ready for merge? |
Yay! |
Pertaining Issue: opencv#5697 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Pertaining Issue: opencv#5697 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Pertaining Issue: opencv#5697 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Pertaining Issue: #5697
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.