Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proto] Speed-up h/v bboxes flip ops #6877

Merged
merged 8 commits into from
Nov 1, 2022

Conversation

vfdev-5
Copy link
Collaborator

@vfdev-5 vfdev-5 commented Oct 31, 2022

[------------- horizontal_flip_bounding_box cpu BoundingBoxFormat.XYXY --------------]
            |  horizontal_flip_bounding_box_old v2  |  horizontal_flip_bounding_box v2
1 threads: ---------------------------------------------------------------------------
      (4,)  |                  75.4                 |                 71              
6 threads: ---------------------------------------------------------------------------
      (4,)  |                  75.8                 |                 70              

Times are in microseconds (us).

[------------- horizontal_flip_bounding_box cpu BoundingBoxFormat.XYWH --------------]
            |  horizontal_flip_bounding_box_old v2  |  horizontal_flip_bounding_box v2
1 threads: ---------------------------------------------------------------------------
      (4,)  |                  120                  |                 40              
6 threads: ---------------------------------------------------------------------------
      (4,)  |                  120                  |                 40              

Times are in microseconds (us).

[------------ horizontal_flip_bounding_box cpu BoundingBoxFormat.CXCYWH -------------]
            |  horizontal_flip_bounding_box_old v2  |  horizontal_flip_bounding_box v2
1 threads: ---------------------------------------------------------------------------
      (4,)  |                  200                  |                 40              
6 threads: ---------------------------------------------------------------------------
      (4,)  |                  200                  |                 40              

Times are in microseconds (us).

[------------ vertical_flip_bounding_box cpu BoundingBoxFormat.XYXY -------------]
            |  vertical_flip_bounding_box_old v2  |  vertical_flip_bounding_box v2
1 threads: -----------------------------------------------------------------------
      (4,)  |                  75                 |                71             
6 threads: -----------------------------------------------------------------------
      (4,)  |                  70                 |                70             

Times are in microseconds (us).

[------------ vertical_flip_bounding_box cpu BoundingBoxFormat.XYWH -------------]
            |  vertical_flip_bounding_box_old v2  |  vertical_flip_bounding_box v2
1 threads: -----------------------------------------------------------------------
      (4,)  |                 120                 |                40             
6 threads: -----------------------------------------------------------------------
      (4,)  |                 122                 |                40             

Times are in microseconds (us).

[----------- vertical_flip_bounding_box cpu BoundingBoxFormat.CXCYWH ------------]
            |  vertical_flip_bounding_box_old v2  |  vertical_flip_bounding_box v2
1 threads: -----------------------------------------------------------------------
      (4,)  |                 200                 |                40             
6 threads: -----------------------------------------------------------------------
      (4,)  |                 200                 |                40             

Times are in microseconds (us).

[------------- horizontal_flip_bounding_box cuda BoundingBoxFormat.XYXY -------------]
            |  horizontal_flip_bounding_box_old v2  |  horizontal_flip_bounding_box v2
1 threads: ---------------------------------------------------------------------------
      (4,)  |                  163                  |                163              
6 threads: ---------------------------------------------------------------------------
      (4,)  |                  163                  |                163              

Times are in microseconds (us).

[------------- horizontal_flip_bounding_box cuda BoundingBoxFormat.XYWH -------------]
            |  horizontal_flip_bounding_box_old v2  |  horizontal_flip_bounding_box v2
1 threads: ---------------------------------------------------------------------------
      (4,)  |                  233                  |                85.1             
6 threads: ---------------------------------------------------------------------------
      (4,)  |                  234                  |                84.9             

Times are in microseconds (us).

[------------ horizontal_flip_bounding_box cuda BoundingBoxFormat.CXCYWH ------------]
            |  horizontal_flip_bounding_box_old v2  |  horizontal_flip_bounding_box v2
1 threads: ---------------------------------------------------------------------------
      (4,)  |                  313                  |                74.3             
6 threads: ---------------------------------------------------------------------------
      (4,)  |                  313                  |                74.5             

Times are in microseconds (us).

[------------ vertical_flip_bounding_box cuda BoundingBoxFormat.XYXY ------------]
            |  vertical_flip_bounding_box_old v2  |  vertical_flip_bounding_box v2
1 threads: -----------------------------------------------------------------------
      (4,)  |                 163                 |               164             
6 threads: -----------------------------------------------------------------------
      (4,)  |                 163                 |               163             

Times are in microseconds (us).

[------------ vertical_flip_bounding_box cuda BoundingBoxFormat.XYWH ------------]
            |  vertical_flip_bounding_box_old v2  |  vertical_flip_bounding_box v2
1 threads: -----------------------------------------------------------------------
      (4,)  |                 234                 |               85.1            
6 threads: -----------------------------------------------------------------------
      (4,)  |                 234                 |               85.3            

Times are in microseconds (us).

[----------- vertical_flip_bounding_box cuda BoundingBoxFormat.CXCYWH -----------]
            |  vertical_flip_bounding_box_old v2  |  vertical_flip_bounding_box v2
1 threads: -----------------------------------------------------------------------
      (4,)  |                 313                 |               74.4            
6 threads: -----------------------------------------------------------------------
      (4,)  |                 312                 |               74.2            

Times are in microseconds (us).

Depends on #6876

cc @datumbox @bjuncek @pmeier

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great Victor, just one more optimization:

torchvision/prototype/transforms/functional/_geometry.py Outdated Show resolved Hide resolved
torchvision/prototype/transforms/functional/_geometry.py Outdated Show resolved Hide resolved
bounding_box = convert_format_bounding_box(
bounding_box.clone(), old_format=format, new_format=features.BoundingBoxFormat.XYXY, inplace=True
).reshape(-1, 4)
bounding_box = bounding_box.clone().reshape(-1, 4)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, there is an implicit "bug fix" or feature when previously we were transforming CXCYWH into XYXY and back with long dtype we had +/- 1 data loss. This does not happen here anymore...

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@vfdev-5 vfdev-5 merged commit 72c5952 into pytorch:main Nov 1, 2022
@vfdev-5 vfdev-5 deleted the proto-speedup-flips-bbox branch November 1, 2022 16:28
@github-actions
Copy link

github-actions bot commented Nov 1, 2022

Hey @vfdev-5!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

@vfdev-5 vfdev-5 added module: transforms Perf For performance improvements prototype labels Nov 1, 2022
facebook-github-bot pushed a commit that referenced this pull request Nov 4, 2022
Summary:
* [proto][tests] Added ref functions for h/v flips

* Better dtype handling in reference_affine_bounding_box_helper

* [proto] Speed-up h/v bboxes flip ops

* Use more inplace ops

* Removed _old methods

* Fixed jit issue using a bit slower version

Reviewed By: datumbox

Differential Revision: D41020548

fbshipit-source-id: de13c57a20c3dd7c3c6c41f6ad16fd59499bcb86
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants