Skip to content

Conversation

oToToT
Copy link
Contributor

@oToToT oToToT commented Sep 10, 2021

Fixes #62690

Summary

Like the way unique_consecutive_cpu_template implemented, this PR reimplements _unique_dim_cpu_impl to get better performance.
Also, because the overhead of unique_dim_consecutive_cpu is quite large, directly call unique_consecutive_cpu_template when we know the given input is a 1d-array.

Benchmark

Script

import torch
import time

torch.manual_seed(0)
t = torch.randint(500, (10000000, ))
t = torch.sort(t)[0]

start = time.time()
uniques, inverse, counts = torch.unique_consecutive(t, dim=0, return_inverse=True, return_counts=True)
end = time.time()
print("torch.unique_consecutive(dim=0) time:", end - start)

start = time.time()
uniques2, inverse2, counts2 = torch.unique_consecutive(t, return_inverse=True, return_counts=True)
end = time.time()
print("torch.unique_consecutive() time:", end - start)


t = torch.randint(500, (10000000, 2))
t = torch.sort(t)[0]

start = time.time()
uniques, inverse, counts = torch.unique_consecutive(t, dim=0, return_inverse=True, return_counts=True)
end = time.time()
print("torch.unique_consecutive(dim=0) time:", end - start)

start = time.time()
uniques, inverse, counts = torch.unique_consecutive(t, dim=1, return_inverse=True, return_counts=True)
end = time.time()
print("torch.unique_consecutive(dim=1) time:", end - start)

Before

torch.unique_consecutive(dim=0) time: 78.64345622062683
torch.unique_consecutive() time: 0.029544353485107422
torch.unique_consecutive(dim=0) time: 91.49796152114868
torch.unique_consecutive(dim=1) time: 0.30872368812561035

After

torch.unique_consecutive(dim=0) time: 0.08256125450134277
torch.unique_consecutive() time: 0.08162403106689453
torch.unique_consecutive(dim=0) time: 35.58408498764038
torch.unique_consecutive(dim=1) time: 1.6258199214935303

System Information

Collecting environment information...
PyTorch version: 1.10.0a0+git7f1932e
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8.10 (default, Jun  2 2021, 10:49:15)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.11.0-34-generic-x86_64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.2
[pip3] torch==1.10.0a0+gitbe09195
[conda] Could not collect

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Sep 10, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 89e6648 (more details on the Dr. CI page):


  • 3/3 failures introduced in this PR

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (1/3)

Step: "Run test scripts" (full log | diagnosis details | 🔁 rerun)

2021-09-12T21:58:32.7031718Z ls: cannot access ...d/win_tmp/ci_scripts/*': No such file or directory
2021-09-12T21:58:32.6003280Z + PYTORCH_FINAL_PACKAGE_DIR=/c/1227301952/build-results/
2021-09-12T21:58:32.6060433Z ++ cygpath -w /c/1227301952/build-results/
2021-09-12T21:58:32.6153009Z + PYTORCH_FINAL_PACKAGE_DIR_WIN='C:\1227301952\build-results\'
2021-09-12T21:58:32.6153516Z + export PYTORCH_FINAL_PACKAGE_DIR_WIN
2021-09-12T21:58:32.6153925Z + export PYTORCH_TEST_SKIP_NOARCH=1
2021-09-12T21:58:32.6154281Z + PYTORCH_TEST_SKIP_NOARCH=1
2021-09-12T21:58:32.6154836Z + mkdir -p /c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/build/torch
2021-09-12T21:58:32.6543413Z + CI_SCRIPTS_DIR=/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/ci_scripts
2021-09-12T21:58:32.6544742Z + mkdir -p /c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/ci_scripts
2021-09-12T21:58:32.6718763Z ++ ls '/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/ci_scripts/*'
2021-09-12T21:58:32.7031718Z ls: cannot access '/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/ci_scripts/*': No such file or directory
2021-09-12T21:58:32.7034450Z + '[' -n '' ']'
2021-09-12T21:58:32.7035120Z + export SCRIPT_HELPERS_DIR=/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/.jenkins/pytorch/win-test-helpers
2021-09-12T21:58:32.7036057Z + SCRIPT_HELPERS_DIR=/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/.jenkins/pytorch/win-test-helpers
2021-09-12T21:58:32.7036654Z + IN_PULL_REQUEST=
2021-09-12T21:58:32.7036913Z + '[' -n '' ']'
2021-09-12T21:58:32.7037237Z + [[ win-vs2019-cpu-py3 == *cuda11* ]]
2021-09-12T21:58:32.7037560Z + run_tests
2021-09-12T21:58:32.7038115Z + for path in '/c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe' /c/Windows/System32/nvidia-smi.exe
2021-09-12T21:58:32.7038827Z + [[ -x /c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe ]]
2021-09-12T21:58:32.7039415Z + '/c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe'

See GitHub Actions build win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge) (2/3)

Step: "Run test scripts" (full log | diagnosis details | 🔁 rerun)

2021-09-12T22:02:39.1576869Z ls: cannot access ...d/win_tmp/ci_scripts/*': No such file or directory
2021-09-12T22:02:39.1060383Z + PYTORCH_FINAL_PACKAGE_DIR=/c/1227301952/build-results/
2021-09-12T22:02:39.1116809Z ++ cygpath -w /c/1227301952/build-results/
2021-09-12T22:02:39.1207262Z + PYTORCH_FINAL_PACKAGE_DIR_WIN='C:\1227301952\build-results\'
2021-09-12T22:02:39.1207773Z + export PYTORCH_FINAL_PACKAGE_DIR_WIN
2021-09-12T22:02:39.1208165Z + export PYTORCH_TEST_SKIP_NOARCH=1
2021-09-12T22:02:39.1208538Z + PYTORCH_TEST_SKIP_NOARCH=1
2021-09-12T22:02:39.1209076Z + mkdir -p /c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/build/torch
2021-09-12T22:02:39.1341820Z + CI_SCRIPTS_DIR=/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/ci_scripts
2021-09-12T22:02:39.1343075Z + mkdir -p /c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/ci_scripts
2021-09-12T22:02:39.1514914Z ++ ls '/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/ci_scripts/*'
2021-09-12T22:02:39.1576869Z ls: cannot access '/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/ci_scripts/*': No such file or directory
2021-09-12T22:02:39.1579031Z + '[' -n '' ']'
2021-09-12T22:02:39.1579714Z + export SCRIPT_HELPERS_DIR=/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/.jenkins/pytorch/win-test-helpers
2021-09-12T22:02:39.1580651Z + SCRIPT_HELPERS_DIR=/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/.jenkins/pytorch/win-test-helpers
2021-09-12T22:02:39.1581241Z + IN_PULL_REQUEST=
2021-09-12T22:02:39.1581514Z + '[' -n '' ']'
2021-09-12T22:02:39.1581840Z + [[ win-vs2019-cpu-py3 == *cuda11* ]]
2021-09-12T22:02:39.1582166Z + run_tests
2021-09-12T22:02:39.1582718Z + for path in '/c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe' /c/Windows/System32/nvidia-smi.exe
2021-09-12T22:02:39.1583419Z + [[ -x /c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe ]]
2021-09-12T22:02:39.1584009Z + '/c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe'

See GitHub Actions build win-vs2019-cuda10.2-py3 / test (default, 1, 1, windows.8xlarge.nvidia.gpu) (3/3)

Step: "Run test scripts" (full log | diagnosis details | 🔁 rerun)

2021-09-12T22:27:33.6724146Z ls: cannot access ...d/win_tmp/ci_scripts/*': No such file or directory
2021-09-12T22:27:33.5802828Z + PYTORCH_FINAL_PACKAGE_DIR=/c/1227301955/build-results/
2021-09-12T22:27:33.5897462Z ++ cygpath -w /c/1227301955/build-results/
2021-09-12T22:27:33.6071361Z + PYTORCH_FINAL_PACKAGE_DIR_WIN='C:\1227301955\build-results\'
2021-09-12T22:27:33.6072580Z + export PYTORCH_FINAL_PACKAGE_DIR_WIN
2021-09-12T22:27:33.6073559Z + export PYTORCH_TEST_SKIP_NOARCH=1
2021-09-12T22:27:33.6074503Z + PYTORCH_TEST_SKIP_NOARCH=1
2021-09-12T22:27:33.6075286Z + mkdir -p /c/actions-runner/_work/pytorch/pytorch/pytorch-1227301955/build/win_tmp/build/torch
2021-09-12T22:27:33.6315325Z + CI_SCRIPTS_DIR=/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301955/build/win_tmp/ci_scripts
2021-09-12T22:27:33.6317476Z + mkdir -p /c/actions-runner/_work/pytorch/pytorch/pytorch-1227301955/build/win_tmp/ci_scripts
2021-09-12T22:27:33.6614795Z ++ ls '/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301955/build/win_tmp/ci_scripts/*'
2021-09-12T22:27:33.6724146Z ls: cannot access '/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301955/build/win_tmp/ci_scripts/*': No such file or directory
2021-09-12T22:27:33.6727466Z + '[' -n '' ']'
2021-09-12T22:27:33.6729642Z + export SCRIPT_HELPERS_DIR=/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301955/.jenkins/pytorch/win-test-helpers
2021-09-12T22:27:33.6732313Z + SCRIPT_HELPERS_DIR=/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301955/.jenkins/pytorch/win-test-helpers
2021-09-12T22:27:33.6733887Z + IN_PULL_REQUEST=
2021-09-12T22:27:33.6734871Z + '[' -n '' ']'
2021-09-12T22:27:33.6735475Z + [[ win-vs2019-cuda10.2-py3 == *cuda11* ]]
2021-09-12T22:27:33.6736139Z + run_tests
2021-09-12T22:27:33.6737063Z + for path in '/c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe' /c/Windows/System32/nvidia-smi.exe
2021-09-12T22:27:33.6738116Z + [[ -x /c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe ]]
2021-09-12T22:27:33.6739000Z + '/c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe'

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@codecov
Copy link

codecov bot commented Sep 10, 2021

Codecov Report

Merging #64835 (2d204dc) into master (be09195) will increase coverage by 0.03%.
The diff coverage is n/a.

❗ Current head 2d204dc differs from pull request most recent head 89e6648. Consider uploading reports for the commit 89e6648 to get more accurate results

@@            Coverage Diff             @@
##           master   #64835      +/-   ##
==========================================
+ Coverage   66.65%   66.68%   +0.03%     
==========================================
  Files         714      714              
  Lines       92546    92598      +52     
==========================================
+ Hits        61685    61752      +67     
+ Misses      30861    30846      -15     

@ngimel ngimel self-requested a review September 11, 2021 00:55
@ngimel ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 11, 2021
std::tuple<Tensor, Tensor, Tensor>
unique_consecutive_cpu(const Tensor& self, const bool return_inverse, const bool return_counts, c10::optional<int64_t> dim) {
if (!dim.has_value()) {
if (!dim.has_value() || (dim.value() == 0 && self.sizes().size() == 1)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this check, the result would be

torch.unique_consecutive(dim=0) time: 16.88133478164673
torch.unique_consecutive() time: 0.024564027786254883
torch.unique_consecutive(dim=0) time: 28.210450410842896
torch.unique_consecutive(dim=1) time: 0.30884695053100586

As you could see, this check slightly slows down torch.unique_consecutive(dim=1), but it could also speed up torch.unique_consecutive(dim=0) very much.
Hence, I still leave it here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this check affect dim=1 case? With or without this check dim=1 case should go to unique_dim_consecutive_cpu in line 287. 5x slowdown (from 0.3 to 1.6) seems significant, why is your algo slower than the previous one for dim=1?

Copy link
Contributor Author

@oToToT oToToT Sep 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe it should always go to unique_dim_consecutive_cpu in this case.
Therefore, I am also curious about this situation. I didn't expect a huge differences at first.

Maybe it is some CPU branching issue?

If I tried loop the last case 5 times, the result become

torch.unique_consecutive(dim=0) time: 0.06868529319763184
torch.unique_consecutive() time: 0.06931805610656738
torch.unique_consecutive(dim=0) time: 26.346323251724243
torch.unique_consecutive(dim=1) time: 1.2163631916046143
torch.unique_consecutive(dim=1) time: 0.2443101406097412
torch.unique_consecutive(dim=1) time: 0.24067211151123047
torch.unique_consecutive(dim=1) time: 0.25301051139831543
torch.unique_consecutive(dim=1) time: 0.24738121032714844

I will try torch.utils.benchmark or something else to get a more thorough analysis if possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, it seems like a warm-up issue then, but please post torch.utils.benchmark timings here, so that we have a record.

Copy link
Contributor Author

@oToToT oToToT Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

For 1d case, >100x more faster on dim=0 case with (dim.value() == 0 && self.sizes().size() == 1).
For 2d case, there is no significant impact with or without (dim.value() == 0 && self.sizes().size() == 1).
In general, the new implementation (89e6648) is 3x more faster than the original implementation (be09195).

Result

The script I test test_1d.py, test_2d.py.

Copy link
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement!

@facebook-github-bot
Copy link
Contributor

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ngimel merged this pull request in ed30afd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

torch.unique_consecutive() is very slow when dim is specified even with 1-d tensors
4 participants