Running `torch.sort` can corrupt memory #111189

malfet · 2023-10-13T06:13:31Z

🐛 Describe the bug

Running following code will result in crash/memory corruption (and the cause for S369412):

import torch
torch.set_num_threads(1)
x = torch.full((32768,), -1, dtype=torch.int32)
x[:100] = torch.iinfo(x.dtype).max
uv=x.sort().values.unique()
print(uv)
assert uv.size(0) == 2

This is a regression introduced by #100081 that uses fbgemm::radix_sort_parallel , that in turn results in out-of-bound writes for arrays that contains a lots of elements starting with 0x80 and 0x7f

Versions

2.1.0/Nightly

cc @ezyang @gchanan @zou3519 @kadeng

The text was updated successfully, but these errors were encountered:

Setting `histogram_ps[RDX_HIST_SIZE * (nthreads - 1) + 127] = offset;` in `combine_prefix_sum_for_msb` is guaranteed to result in `heap-buffer-overflow` if bucket is not empty during the scatter stage (as all values of `histogram_ps` should be strictly less than `element_count` Will fix pytorch/pytorch#111189 once FBGEMM is updated to the correct version

malfet · 2023-10-13T15:50:35Z

Another interesting aspect, is that ASAN in clang-12 fails to capture it, perhaps time to migrate to clang-15

Hopefully it will align with internal system and they will detect heap-overlow access reported in #111189 Also, do not build neither Triton, nor protobuf nor DB dependencies (as they are not neded for ASAN builds/tess)

Hopefully it will align with internal system and they will detect heap-overlow access reported in #111189 Also, do not build neither Triton, nor protobuf nor DB dependencies (as they are not needed for ASAN builds/tests) Pull Request resolved: #111218 Approved by: https://github.com/Skylion007

If `USE_ASAN` is set, compile FBGEMM with ASAN as well, by setting `USE_SANITIZER` to `address,undefined` This fixes regression in sanitizer coverage introduced by #93147 that change effects of sanitizer from the entire project to just torch libraries, and finally allows one to reliably catch regression reported in #111189 Pull Request resolved: #111266 Approved by: https://github.com/huydhn

Hopefully it will align with internal system and they will detect heap-overlow access reported in pytorch#111189 Also, do not build neither Triton, nor protobuf nor DB dependencies (as they are not needed for ASAN builds/tests) Pull Request resolved: pytorch#111218 Approved by: https://github.com/Skylion007

If `USE_ASAN` is set, compile FBGEMM with ASAN as well, by setting `USE_SANITIZER` to `address,undefined` This fixes regression in sanitizer coverage introduced by pytorch#93147 that change effects of sanitizer from the entire project to just torch libraries, and finally allows one to reliably catch regression reported in pytorch#111189 Pull Request resolved: pytorch#111266 Approved by: https://github.com/huydhn

Setting `histogram_ps[RDX_HIST_SIZE * (nthreads - 1) + 127] = offset;` in `combine_prefix_sum_for_msb` is guaranteed to result in `heap-buffer-overflow` if bucket is not empty during the scatter stage (as all values of `histogram_ps` should be strictly less than `element_count` Will fix pytorch/pytorch#111189 once FBGEMM is updated to the correct version

Summary: Setting `histogram_ps[RDX_HIST_SIZE * (nthreads - 1) + 127] = offset;` in `combine_prefix_sum_for_msb` is guaranteed to result in `heap-buffer-overflow` if bucket is not empty during the scatter stage (as all values of `histogram_ps` should be strictly less than `element_count` Factor out common code from `RadixSortTest.cc` into `test_tempalte` and add regression test for buffer overflow, which before the test will fail as follows: ``` [ RUN ] cpuKernelTest.raidx_sort_heap_overflow /home/nshulga/git/pytorch/FBGEMM/test/RadixSortTest.cc:36: Failure Expected equality of these values: expected_keys Which is: { 2, 3, 5, -1, -1, 2147483647, 2147483647, 2147483647 } keys Which is: { -1, -1, -1, -1, -1, -1, -1, -1 } /home/nshulga/git/pytorch/FBGEMM/test/RadixSortTest.cc:37: Failure Expected equality of these values: expected_values Which is: { 1, 4, 6, 7, 8, 2, 3, 5 } values Which is: { 2147483647, 4, 6, 7, 8, 6, 7, 8 } [ FAILED ] cpuKernelTest.raidx_sort_heap_overflow (0 ms) ``` Will fix pytorch/pytorch#111189 once FBGEMM is updated to the correct version Reviewed By: kit1980, jianyuh Differential Revision: D50256504 Pulled By: malfet

By updating fbgemm Add regression test for it Fixes #111189

By updating fbgemm submodule Add regression test for it (though it can probably be limited to just CPU as reproducer only works if num_threads is 1) Also, update call-sites to `fbgemm:: GenerateEmbeddingSpMDM` to pass `isbf16` twice, to match API changes introduced in pytorch/FBGEMM#1851 Fixes pytorch#111189 and pytorch#111710 Pull Request resolved: pytorch#111672 Approved by: https://github.com/Skylion007

Summary: Setting `histogram_ps[RDX_HIST_SIZE * (nthreads - 1) + 127] = offset;` in `combine_prefix_sum_for_msb` is guaranteed to result in `heap-buffer-overflow` if bucket is not empty during the scatter stage (as all values of `histogram_ps` should be strictly less than `element_count` Factor out common code from `RadixSortTest.cc` into `test_tempalte` and add regression test for buffer overflow, which before the test will fail as follows: ``` [ RUN ] cpuKernelTest.raidx_sort_heap_overflow /home/nshulga/git/pytorch/FBGEMM/test/RadixSortTest.cc:36: Failure Expected equality of these values: expected_keys Which is: { 2, 3, 5, -1, -1, 2147483647, 2147483647, 2147483647 } keys Which is: { -1, -1, -1, -1, -1, -1, -1, -1 } /home/nshulga/git/pytorch/FBGEMM/test/RadixSortTest.cc:37: Failure Expected equality of these values: expected_values Which is: { 1, 4, 6, 7, 8, 2, 3, 5 } values Which is: { 2147483647, 4, 6, 7, 8, 6, 7, 8 } [ FAILED ] cpuKernelTest.raidx_sort_heap_overflow (0 ms) ``` Will fix pytorch/pytorch#111189 once FBGEMM is updated to the correct version Pull Request resolved: #2075 Reviewed By: kit1980, jianyuh Differential Revision: D50256504 Pulled By: malfet fbshipit-source-id: f805607595e324999cea07dcacdee8317a008221 (cherry picked from commit 70c6e83)

By updating fbgemm submodule to `pytorch/release/2.1` branch, that contains following two cherry-picks: pytorch/FBGEMM@30f09a2 (formatting) and pytorch/FBGEMM@70c6e83 (actual fix for the regression) Add regression test for it (though it can probably be limited to just CPU as reproducer only works if num_threads is 1) Fixes #111189 Cherry-pick of #111672 into release/2.1 branch, but with more targeted fbgemm updated (cherry picked from commit 03da069)

By updating fbgemm submodule to `pytorch/release/2.1` branch, that contains following two cherry-picks: - pytorch/FBGEMM@30f09a2 (formatting) - pytorch/FBGEMM@70c6e83 (actual fix for the regression) Add regression test for it (though it can probably be limited to just CPU as reproducer only works if num_threads is 1) Fixes #111189 Cherry-pick of #111672 into release/2.1 branch, but with more targeted fbgemm updated (cherry picked from commit 03da069)

By updating fbgemm submodule Add regression test for it (though it can probably be limited to just CPU as reproducer only works if num_threads is 1) Also, update call-sites to `fbgemm:: GenerateEmbeddingSpMDM` to pass `isbf16` twice, to match API changes introduced in pytorch/FBGEMM#1851 Fixes pytorch#111189 and pytorch#111710 Pull Request resolved: pytorch#111672 Approved by: https://github.com/Skylion007

By updating fbgemm submodule to `pytorch/release/2.1` branch, that contains following two cherry-picks: - pytorch/FBGEMM@30f09a2 (formatting) - pytorch/FBGEMM@70c6e83 (actual fix for the regression) Add regression test for it (though it can probably be limited to just CPU as reproducer only works if num_threads is 1) Fixes pytorch#111189 Cherry-pick of pytorch#111672 into release/2.1 branch, but with more targeted fbgemm updated (cherry picked from commit 03da069)

malfet added high priority module: crash Problem manifests as a hard crash, as opposed to a RuntimeError module: regression It used to work, and now it doesn't module: correctness (silent) issue that returns an incorrect result silently module: sorting and selection labels Oct 13, 2023

malfet added this to the 2.1.1 milestone Oct 13, 2023

pytorch-bot bot added the triage review label Oct 13, 2023

malfet mentioned this issue Oct 13, 2023

Fix heap-buffer-overflow in radix_sort_parallel pytorch/FBGEMM#2075

Closed

malfet self-assigned this Oct 13, 2023

malfet mentioned this issue Oct 13, 2023

improve unique performance on CPU #107846

Closed

malfet mentioned this issue Oct 13, 2023

[BE] Move ASAN from clang-12 to clang-15 #111218

Closed

malfet mentioned this issue Oct 14, 2023

[BE] Compile FBGEMM with ASAN #111266

Closed

facebook-github-bot closed this as completed in pytorch/FBGEMM@70c6e83 Oct 17, 2023

malfet reopened this Oct 19, 2023

malfet added a commit that referenced this issue Oct 20, 2023

Fix buffer overflow in torch.sort

7c8f4f0

By updating fbgemm Add regression test for it Fixes #111189

malfet mentioned this issue Oct 20, 2023

Fix buffer overflow in torch.sort #111672

Closed

pytorchmergebot closed this as completed in 03da069 Oct 21, 2023

mingfeima mentioned this issue Oct 23, 2023

Make 1D integer sorting work in parallel #100081

Closed

malfet mentioned this issue Nov 2, 2023

Fix buffer overflow in torch.sort #112784

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running `torch.sort` can corrupt memory #111189

Running `torch.sort` can corrupt memory #111189

malfet commented Oct 13, 2023 •

edited

Loading

malfet commented Oct 13, 2023

Running torch.sort can corrupt memory #111189

Running torch.sort can corrupt memory #111189

Comments

malfet commented Oct 13, 2023 • edited Loading

🐛 Describe the bug

Versions

malfet commented Oct 13, 2023

Running `torch.sort` can corrupt memory #111189

Running `torch.sort` can corrupt memory #111189

malfet commented Oct 13, 2023 •

edited

Loading