Speed up torch.unique_consecutive() #64835

oToToT · 2021-09-10T18:30:17Z

Summary

Like the way unique_consecutive_cpu_template implemented, this PR reimplements _unique_dim_cpu_impl to get better performance.
Also, because the overhead of unique_dim_consecutive_cpu is quite large, directly call unique_consecutive_cpu_template when we know the given input is a 1d-array.

Benchmark

Script

import torch
import time

torch.manual_seed(0)
t = torch.randint(500, (10000000, ))
t = torch.sort(t)[0]

start = time.time()
uniques, inverse, counts = torch.unique_consecutive(t, dim=0, return_inverse=True, return_counts=True)
end = time.time()
print("torch.unique_consecutive(dim=0) time:", end - start)

start = time.time()
uniques2, inverse2, counts2 = torch.unique_consecutive(t, return_inverse=True, return_counts=True)
end = time.time()
print("torch.unique_consecutive() time:", end - start)


t = torch.randint(500, (10000000, 2))
t = torch.sort(t)[0]

start = time.time()
uniques, inverse, counts = torch.unique_consecutive(t, dim=0, return_inverse=True, return_counts=True)
end = time.time()
print("torch.unique_consecutive(dim=0) time:", end - start)

start = time.time()
uniques, inverse, counts = torch.unique_consecutive(t, dim=1, return_inverse=True, return_counts=True)
end = time.time()
print("torch.unique_consecutive(dim=1) time:", end - start)

Before

torch.unique_consecutive(dim=0) time: 78.64345622062683
torch.unique_consecutive() time: 0.029544353485107422
torch.unique_consecutive(dim=0) time: 91.49796152114868
torch.unique_consecutive(dim=1) time: 0.30872368812561035

After

torch.unique_consecutive(dim=0) time: 0.08256125450134277
torch.unique_consecutive() time: 0.08162403106689453
torch.unique_consecutive(dim=0) time: 35.58408498764038
torch.unique_consecutive(dim=1) time: 1.6258199214935303

System Information

Collecting environment information...
PyTorch version: 1.10.0a0+git7f1932e
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8.10 (default, Jun  2 2021, 10:49:15)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.11.0-34-generic-x86_64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.2
[pip3] torch==1.10.0a0+gitbe09195
[conda] Could not collect

facebook-github-bot · 2021-09-10T18:30:23Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/64835
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 89e6648 (more details on the Dr. CI page):

3/3 failures introduced in this PR

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (1/3)

Step: "Run test scripts" (full log | diagnosis details | 🔁 rerun)

2021-09-12T21:58:32.7031718Z ls: cannot access ...d/win_tmp/ci_scripts/*': No such file or directory

2021-09-12T21:58:32.6003280Z + PYTORCH_FINAL_PACKAGE_DIR=/c/1227301952/build-results/
2021-09-12T21:58:32.6060433Z ++ cygpath -w /c/1227301952/build-results/
2021-09-12T21:58:32.6153009Z + PYTORCH_FINAL_PACKAGE_DIR_WIN='C:\1227301952\build-results\'
2021-09-12T21:58:32.6153516Z + export PYTORCH_FINAL_PACKAGE_DIR_WIN
2021-09-12T21:58:32.6153925Z + export PYTORCH_TEST_SKIP_NOARCH=1
2021-09-12T21:58:32.6154281Z + PYTORCH_TEST_SKIP_NOARCH=1
2021-09-12T21:58:32.6154836Z + mkdir -p /c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/build/torch
2021-09-12T21:58:32.6543413Z + CI_SCRIPTS_DIR=/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/ci_scripts
2021-09-12T21:58:32.6544742Z + mkdir -p /c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/ci_scripts
2021-09-12T21:58:32.6718763Z ++ ls '/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/ci_scripts/*'
2021-09-12T21:58:32.7031718Z ls: cannot access '/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/ci_scripts/*': No such file or directory
2021-09-12T21:58:32.7034450Z + '[' -n '' ']'
2021-09-12T21:58:32.7035120Z + export SCRIPT_HELPERS_DIR=/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/.jenkins/pytorch/win-test-helpers
2021-09-12T21:58:32.7036057Z + SCRIPT_HELPERS_DIR=/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/.jenkins/pytorch/win-test-helpers
2021-09-12T21:58:32.7036654Z + IN_PULL_REQUEST=
2021-09-12T21:58:32.7036913Z + '[' -n '' ']'
2021-09-12T21:58:32.7037237Z + [[ win-vs2019-cpu-py3 == *cuda11* ]]
2021-09-12T21:58:32.7037560Z + run_tests
2021-09-12T21:58:32.7038115Z + for path in '/c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe' /c/Windows/System32/nvidia-smi.exe
2021-09-12T21:58:32.7038827Z + [[ -x /c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe ]]
2021-09-12T21:58:32.7039415Z + '/c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe'

win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge) (2/3)

Step: "Run test scripts" (full log | diagnosis details | 🔁 rerun)

2021-09-12T22:02:39.1576869Z ls: cannot access ...d/win_tmp/ci_scripts/*': No such file or directory

2021-09-12T22:02:39.1060383Z + PYTORCH_FINAL_PACKAGE_DIR=/c/1227301952/build-results/
2021-09-12T22:02:39.1116809Z ++ cygpath -w /c/1227301952/build-results/
2021-09-12T22:02:39.1207262Z + PYTORCH_FINAL_PACKAGE_DIR_WIN='C:\1227301952\build-results\'
2021-09-12T22:02:39.1207773Z + export PYTORCH_FINAL_PACKAGE_DIR_WIN
2021-09-12T22:02:39.1208165Z + export PYTORCH_TEST_SKIP_NOARCH=1
2021-09-12T22:02:39.1208538Z + PYTORCH_TEST_SKIP_NOARCH=1
2021-09-12T22:02:39.1209076Z + mkdir -p /c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/build/torch
2021-09-12T22:02:39.1341820Z + CI_SCRIPTS_DIR=/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/ci_scripts
2021-09-12T22:02:39.1343075Z + mkdir -p /c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/ci_scripts
2021-09-12T22:02:39.1514914Z ++ ls '/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/ci_scripts/*'
2021-09-12T22:02:39.1576869Z ls: cannot access '/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/build/win_tmp/ci_scripts/*': No such file or directory
2021-09-12T22:02:39.1579031Z + '[' -n '' ']'
2021-09-12T22:02:39.1579714Z + export SCRIPT_HELPERS_DIR=/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/.jenkins/pytorch/win-test-helpers
2021-09-12T22:02:39.1580651Z + SCRIPT_HELPERS_DIR=/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301952/.jenkins/pytorch/win-test-helpers
2021-09-12T22:02:39.1581241Z + IN_PULL_REQUEST=
2021-09-12T22:02:39.1581514Z + '[' -n '' ']'
2021-09-12T22:02:39.1581840Z + [[ win-vs2019-cpu-py3 == *cuda11* ]]
2021-09-12T22:02:39.1582166Z + run_tests
2021-09-12T22:02:39.1582718Z + for path in '/c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe' /c/Windows/System32/nvidia-smi.exe
2021-09-12T22:02:39.1583419Z + [[ -x /c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe ]]
2021-09-12T22:02:39.1584009Z + '/c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe'

win-vs2019-cuda10.2-py3 / test (default, 1, 1, windows.8xlarge.nvidia.gpu) (3/3)

Step: "Run test scripts" (full log | diagnosis details | 🔁 rerun)

2021-09-12T22:27:33.6724146Z ls: cannot access ...d/win_tmp/ci_scripts/*': No such file or directory

2021-09-12T22:27:33.5802828Z + PYTORCH_FINAL_PACKAGE_DIR=/c/1227301955/build-results/
2021-09-12T22:27:33.5897462Z ++ cygpath -w /c/1227301955/build-results/
2021-09-12T22:27:33.6071361Z + PYTORCH_FINAL_PACKAGE_DIR_WIN='C:\1227301955\build-results\'
2021-09-12T22:27:33.6072580Z + export PYTORCH_FINAL_PACKAGE_DIR_WIN
2021-09-12T22:27:33.6073559Z + export PYTORCH_TEST_SKIP_NOARCH=1
2021-09-12T22:27:33.6074503Z + PYTORCH_TEST_SKIP_NOARCH=1
2021-09-12T22:27:33.6075286Z + mkdir -p /c/actions-runner/_work/pytorch/pytorch/pytorch-1227301955/build/win_tmp/build/torch
2021-09-12T22:27:33.6315325Z + CI_SCRIPTS_DIR=/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301955/build/win_tmp/ci_scripts
2021-09-12T22:27:33.6317476Z + mkdir -p /c/actions-runner/_work/pytorch/pytorch/pytorch-1227301955/build/win_tmp/ci_scripts
2021-09-12T22:27:33.6614795Z ++ ls '/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301955/build/win_tmp/ci_scripts/*'
2021-09-12T22:27:33.6724146Z ls: cannot access '/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301955/build/win_tmp/ci_scripts/*': No such file or directory
2021-09-12T22:27:33.6727466Z + '[' -n '' ']'
2021-09-12T22:27:33.6729642Z + export SCRIPT_HELPERS_DIR=/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301955/.jenkins/pytorch/win-test-helpers
2021-09-12T22:27:33.6732313Z + SCRIPT_HELPERS_DIR=/c/actions-runner/_work/pytorch/pytorch/pytorch-1227301955/.jenkins/pytorch/win-test-helpers
2021-09-12T22:27:33.6733887Z + IN_PULL_REQUEST=
2021-09-12T22:27:33.6734871Z + '[' -n '' ']'
2021-09-12T22:27:33.6735475Z + [[ win-vs2019-cuda10.2-py3 == *cuda11* ]]
2021-09-12T22:27:33.6736139Z + run_tests
2021-09-12T22:27:33.6737063Z + for path in '/c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe' /c/Windows/System32/nvidia-smi.exe
2021-09-12T22:27:33.6738116Z + [[ -x /c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe ]]
2021-09-12T22:27:33.6739000Z + '/c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe'

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

codecov · 2021-09-10T20:52:40Z

Codecov Report

Merging #64835 (2d204dc) into master (be09195) will increase coverage by 0.03%.
The diff coverage is n/a.

❗ Current head 2d204dc differs from pull request most recent head 89e6648. Consider uploading reports for the commit 89e6648 to get more accurate results

@@            Coverage Diff             @@
##           master   #64835      +/-   ##
==========================================
+ Coverage   66.65%   66.68%   +0.03%     
==========================================
  Files         714      714              
  Lines       92546    92598      +52     
==========================================
+ Hits        61685    61752      +67     
+ Misses      30861    30846      -15

aten/src/ATen/native/Unique.cpp

oToToT · 2021-09-12T12:41:03Z

aten/src/ATen/native/Unique.cpp

 std::tuple<Tensor, Tensor, Tensor>
 unique_consecutive_cpu(const Tensor& self, const bool return_inverse, const bool return_counts, c10::optional<int64_t> dim) {
-  if (!dim.has_value()) {
+  if (!dim.has_value() || (dim.value() == 0 && self.sizes().size() == 1)) {


Without this check, the result would be

torch.unique_consecutive(dim=0) time: 16.88133478164673 torch.unique_consecutive() time: 0.024564027786254883 torch.unique_consecutive(dim=0) time: 28.210450410842896 torch.unique_consecutive(dim=1) time: 0.30884695053100586

As you could see, this check slightly slows down torch.unique_consecutive(dim=1), but it could also speed up torch.unique_consecutive(dim=0) very much.
Hence, I still leave it here.

How can this check affect dim=1 case? With or without this check dim=1 case should go to unique_dim_consecutive_cpu in line 287. 5x slowdown (from 0.3 to 1.6) seems significant, why is your algo slower than the previous one for dim=1?

Yes, I believe it should always go to unique_dim_consecutive_cpu in this case.
Therefore, I am also curious about this situation. I didn't expect a huge differences at first.

Maybe it is some CPU branching issue?

If I tried loop the last case 5 times, the result become

torch.unique_consecutive(dim=0) time: 0.06868529319763184 torch.unique_consecutive() time: 0.06931805610656738 torch.unique_consecutive(dim=0) time: 26.346323251724243 torch.unique_consecutive(dim=1) time: 1.2163631916046143 torch.unique_consecutive(dim=1) time: 0.2443101406097412 torch.unique_consecutive(dim=1) time: 0.24067211151123047 torch.unique_consecutive(dim=1) time: 0.25301051139831543 torch.unique_consecutive(dim=1) time: 0.24738121032714844

I will try torch.utils.benchmark or something else to get a more thorough analysis if possible.

Ah ok, it seems like a warm-up issue then, but please post torch.utils.benchmark timings here, so that we have a record.

Summary

For 1d case, >100x more faster on dim=0 case with (dim.value() == 0 && self.sizes().size() == 1).
For 2d case, there is no significant impact with or without (dim.value() == 0 && self.sizes().size() == 1).
In general, the new implementation (89e6648) is 3x more faster than the original implementation (be09195).

Result

The script I test test_1d.py, test_2d.py.

Result on be09195: 1d result, 2d result

Result on 89e6648 w/ (dim.value() == 0 && self.sizes().size() == 1): 1d result, 2d result

Result on 89e6648 w/o (dim.value() == 0 && self.sizes().size() == 1): 1d result, 2d result

ngimel

Thanks for the improvement!

facebook-github-bot · 2021-09-13T04:54:29Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-09-14T02:01:56Z

@ngimel merged this pull request in ed30afd.

Speed up torch.unique_consecutive()

a526fac

facebook-github-bot added the cla signed label Sep 10, 2021

pytorchbot added the open source label Sep 10, 2021

ngimel self-requested a review September 11, 2021 00:55

ngimel reviewed Sep 11, 2021

View reviewed changes

aten/src/ATen/native/Unique.cpp Show resolved Hide resolved

ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 11, 2021

oToToT added 2 commits September 11, 2021 13:05

Add assertion in _unique_dim_cpu_impl()

4e7da18

Revert unexpected modification

2d204dc

oToToT commented Sep 12, 2021

View reviewed changes

Use dim() instead of sizes().size()

89e6648

ngimel approved these changes Sep 13, 2021

View reviewed changes

facebook-github-bot closed this in ed30afd Sep 14, 2021

facebook-github-bot added the Merged label Sep 14, 2021

ngimel mentioned this pull request Jan 10, 2022

Segfault using torch.unique on tensor with NaNs with dim=0 on PyTorch 1.10 #71089

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up torch.unique_consecutive() #64835

Speed up torch.unique_consecutive() #64835

Uh oh!

oToToT commented Sep 10, 2021

Uh oh!

facebook-github-bot commented Sep 10, 2021 •

edited

Loading

Uh oh!

codecov bot commented Sep 10, 2021 •

edited

Loading

Uh oh!

Uh oh!

oToToT Sep 12, 2021

Uh oh!

ngimel Sep 12, 2021

Uh oh!

oToToT Sep 12, 2021 •

edited

Loading

Uh oh!

ngimel Sep 13, 2021

Uh oh!

oToToT Sep 13, 2021 •

edited

Loading

Uh oh!

ngimel left a comment

Uh oh!

facebook-github-bot commented Sep 13, 2021

Uh oh!

facebook-github-bot commented Sep 14, 2021

Uh oh!

Uh oh!

Speed up torch.unique_consecutive() #64835

Speed up torch.unique_consecutive() #64835

Uh oh!

Conversation

oToToT commented Sep 10, 2021

Summary

Benchmark

Script

Before

After

System Information

Uh oh!

facebook-github-bot commented Sep 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

🕵️ 3 new failures recognized by patterns

win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (1/3)

win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge) (2/3)

win-vs2019-cuda10.2-py3 / test (default, 1, 1, windows.8xlarge.nvidia.gpu) (3/3)

Uh oh!

codecov bot commented Sep 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

oToToT Sep 12, 2021

Choose a reason for hiding this comment

Uh oh!

ngimel Sep 12, 2021

Choose a reason for hiding this comment

Uh oh!

oToToT Sep 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngimel Sep 13, 2021

Choose a reason for hiding this comment

Uh oh!

oToToT Sep 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Summary

Result

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Sep 13, 2021

Uh oh!

facebook-github-bot commented Sep 14, 2021

Uh oh!

Uh oh!

facebook-github-bot commented Sep 10, 2021 •

edited

Loading

codecov bot commented Sep 10, 2021 •

edited

Loading

oToToT Sep 12, 2021 •

edited

Loading

oToToT Sep 13, 2021 •

edited

Loading