Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace numpy transpose with torch permute to speed-up #9533

Merged
merged 12 commits into from Jan 4, 2023

Conversation

Min-Sheng
Copy link

Motivation

numpy.transpose() is mush more slow than torch.permute() according to my benchmarks on a jupyter notebook:

  1. If the input image size (1366, 800, 3) is the size of COCO dataset image
import numpy as np
import torch
img = np.random.randn(1366, 800, 3)
%%timeit
img_np = np.ascontiguousarray(img.transpose(2, 0, 1))
input = torch.from_numpy(img_np)

Output: 7.69 ms ± 314 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
input = torch.from_numpy(img).permute(2, 0, 1).contiguous()

Output: 1.65 ms ± 123 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

  1. If the input image size is large, especially when inferencing a large image ((3648, 5472, 3) in my case)
img = np.random.randn(3648, 5472, 3)
%%timeit
img_np = np.ascontiguousarray(img.transpose(2, 0, 1))
input = torch.from_numpy(img_np)

Output: 327 ms ± 1.13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
input = torch.from_numpy(img).permute(2, 0, 1).contiguous()

Output: 93.8 ms ± 4.77 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Modification

Replace the transpose operationnumpy.transpose(2, 0, 1) with torch.permute(2, 0, 1) to in ImageToTensor and DefaultFormatBundle to speed-up the process.

@CLAassistant
Copy link

CLAassistant commented Dec 26, 2022

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ Min-Sheng
❌ vincentwu1


vincentwu1 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@ZwwWayne
Copy link
Collaborator

Hi @Min-Sheng ,
Thanks for your kind PR. Would you like to also update the docstring or code to indicate this issue?

@Min-Sheng
Copy link
Author

Hi @Min-Sheng , Thanks for your kind PR. Would you like to also update the docstring or code to indicate this issue?

I have updated both the docstring and code.
By the way, I found that if the input numpy array is non-contiguous,

img = np.random.randn(1366, 800, 3)
img = img[..., ::-1]

use

input = torch.from_numpy(np.ascontiguousarray(img.transpose(2, 0, 1)))

Output: 7.58 ms ± 118 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

is faster than

input = torch.from_numpy(np.ascontiguousarray(img)).permute(2, 0, 1).contiguous()

Output: 14.5 ms ± 669 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So, I use the numpy c_contiguous flag for array continuousness to switch the order of transpose and to_tensor operations.

@ZwwWayne
Copy link
Collaborator

Hi @Min-Sheng

Thanks for your kind PR. It seems that CLA is not signed. Could you sign the CLA so that eventually we could merge this PR after review? You can check the contents and follow the instruction in the communication box shown as below
image

@Min-Sheng
Copy link
Author

Hi @Min-Sheng

Thanks for your kind PR. It seems that CLA is not signed. Could you sign the CLA so that eventually we could merge this PR after review? You can check the contents and follow the instruction in the communication box shown as below image

Everything is really for merging.

@ZwwWayne ZwwWayne requested a review from RangiLyu January 3, 2023 03:04
@ZwwWayne ZwwWayne changed the base branch from master to dev January 3, 2023 03:30
@ZwwWayne ZwwWayne added this to the 2.28.0 milestone Jan 3, 2023
@codecov
Copy link

codecov bot commented Jan 3, 2023

Codecov Report

Base: 64.15% // Head: 64.14% // Decreases project coverage by -0.01% ⚠️

Coverage data is based on head (679284e) compared to base (31c8495).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head 679284e differs from pull request most recent head 25c6efa. Consider uploading reports for the commit 25c6efa to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #9533      +/-   ##
==========================================
- Coverage   64.15%   64.14%   -0.02%     
==========================================
  Files         361      361              
  Lines       29583    29586       +3     
  Branches     5033     5034       +1     
==========================================
- Hits        18980    18978       -2     
- Misses       9599     9601       +2     
- Partials     1004     1007       +3     
Flag Coverage Δ
unittests 64.12% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmdet/datasets/pipelines/formatting.py 68.54% <100.00%> (+0.77%) ⬆️
mmdet/core/bbox/samplers/random_sampler.py 75.00% <0.00%> (-5.56%) ⬇️
mmdet/models/roi_heads/mask_heads/maskiou_head.py 87.35% <0.00%> (-2.30%) ⬇️
mmdet/utils/misc.py 62.22% <0.00%> (-2.23%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@ZwwWayne ZwwWayne merged commit cf43a1b into open-mmlab:dev Jan 4, 2023
MeowZheng added a commit to open-mmlab/mmsegmentation that referenced this pull request Feb 15, 2023
… speed-up (#2604)

## Motivation

Original motivation was after [MMDetection PR
#9533](open-mmlab/mmdetection#9533)

With several experiments I found out that if a ndarray is contiguous,
numpy.transpose + torch.contiguous perform better, while if not, then
use numpy.ascontiguousarray + numpy.transpose

## Modification

Replace numpy.ascontiguousarray with torch.contiguous in
[PackSegInputs](https://github.com/open-mmlab/mmsegmentation/blob/1.x/mmseg/datasets/transforms/formatting.py)

Co-authored-by: MeowZheng <meowzheng@outlook.com>
@OpenMMLab-Assistant001
Copy link

Hi @Min-Sheng !First of all, we want to express our gratitude for your significant PR in the MMDet project. Your contribution is highly appreciated, and we are grateful for your efforts in helping improve this open-source project during your personal time. We believe that many developers will benefit from your PR.

We would also like to invite you to join our Special Interest Group (SIG) private channel on Discord, where you can share your experiences, ideas, and build connections with like-minded peers. To join the SIG channel, simply message moderator— OpenMMLab on Discord or briefly share your open-source contributions in the #introductions channel and we will assist you. Look forward to seeing you there! Join us :https://discord.gg/UjgXkPWNqA

If you have WeChat account,welcome to join our community on WeChat. You can add our assistant :openmmlabwx. Please add "mmsig + Github ID" as a remark when adding friends:)
Thank you again for your contribution!❤

thmegy pushed a commit to thmegy/mmdetection that referenced this pull request May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants