-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[proto] Small optims for elastic op on bboxes #6897
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I benchmarked your intermediate commits and what you have now (along with the proposed patch below) seems the fastest:
[------------------------------- elastic_bounding_box cpu BoundingBoxFormat.XYXY --------------------------------]
| elastic_bounding_box main | elastic_bounding_box_bbeca bbeca | elastic_bounding_box_b32f7 b32f7
1 threads: -------------------------------------------------------------------------------------------------------
(4,) | 613 | 575 | 490
Times are in microseconds (us).
[------------------------------- elastic_bounding_box cpu BoundingBoxFormat.XYWH --------------------------------]
| elastic_bounding_box main | elastic_bounding_box_bbeca bbeca | elastic_bounding_box_b32f7 b32f7
1 threads: -------------------------------------------------------------------------------------------------------
(4,) | 674 | 636 | 551
Times are in microseconds (us).
[------------------------------ elastic_bounding_box cpu BoundingBoxFormat.CXCYWH -------------------------------]
| elastic_bounding_box main | elastic_bounding_box_bbeca bbeca | elastic_bounding_box_b32f7 b32f7
1 threads: -------------------------------------------------------------------------------------------------------
(4,) | 710 | 676 | 580
Times are in microseconds (us).
[------------------------------- elastic_bounding_box cuda BoundingBoxFormat.XYXY -------------------------------]
| elastic_bounding_box main | elastic_bounding_box_bbeca bbeca | elastic_bounding_box_b32f7 b32f7
1 threads: -------------------------------------------------------------------------------------------------------
(4,) | 1370 | 1310 | 708
Times are in microseconds (us).
[------------------------------- elastic_bounding_box cuda BoundingBoxFormat.XYWH -------------------------------]
| elastic_bounding_box main | elastic_bounding_box_bbeca bbeca | elastic_bounding_box_b32f7 b32f7
1 threads: -------------------------------------------------------------------------------------------------------
(4,) | 1470 | 1410 | 789
Times are in microseconds (us).
[------------------------------ elastic_bounding_box cuda BoundingBoxFormat.CXCYWH ------------------------------]
| elastic_bounding_box main | elastic_bounding_box_bbeca bbeca | elastic_bounding_box_b32f7 b32f7
1 threads: -------------------------------------------------------------------------------------------------------
(4,) | 1540 | 1480 | 858
Times are in microseconds (us).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Hey @vfdev-5! You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py |
Summary: * [proto] Small optims for elastic op on bboxes * More inplace ops according to the review * Create grid on device directly. This should be faster * PR Review update. Apply ceil on float input Reviewed By: NicolasHug Differential Revision: D41265181 fbshipit-source-id: c9e203d5d039f9969b28951ed2a60299b116e1a1
cc @datumbox @bjuncek @pmeier