extend `nonzero` to int64 #125850

bhack · 2024-05-09T16:23:38Z

Fixes #51871

pytorch-bot · 2024-05-09T16:23:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125850

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 3 Unrelated Failures

As of commit a46722a with merge base 8f30f36 ():

NEW FAILURES - The following jobs have failed:

pull / linux-focal-cuda12.1-py3.10-gcc9-bazel-test / build-and-test (default, 1, 1, linux.4xlarge.nvidia.gpu) (gh)
/var/lib/jenkins/workspace/BUILD.bazel:430:11: Compiling aten/src/ATen/native/cuda/Nonzero.cu failed: (Exit 1): nvcc failed: error executing command (from target //:aten_cuda) external/local_cuda/cuda/bin/nvcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-std=c++11' -MD -MF ... (remaining 332 arguments skipped)
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / build (gh)
Process completed with exit code 1.
pull / linux-focal-rocm6.1-py3.8 / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/native/hip/Nonzero.hip:46:9: error: 'InputIteratorT' does not refer to a value

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / linux-focal-cuda11.8-py3.10-gcc9 / build (gh) (similar failure)
Process completed with exit code 1.
pull / linux-focal-cuda12.1-py3.10-gcc9 / build (gh) (similar failure)
Process completed with exit code 1.
pull / linux-jammy-cuda11.8-cudnn8-py3.8-clang12 / build (gh) (similar failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

bhack · 2024-05-09T16:42:51Z

/cc @ezyang @eqy This is an explorative blackbox PR as I don't have free cuda resources right now and we don't have a quick way to setup the env to contribute sparse c++/cuda PR (see #125297).

But I made this editable on your side in the case you have the env ready and a quick fix is enough.

linux-foundation-easycla · 2024-05-10T15:07:52Z

The committers listed above are authorized under a signed CLA.

✅ login: bhack (0841181, bb7240c, a47573b, 637532c, 65dbb6c, 8cce2c7, 79ad3d3, 052bb61, a2554de, 679b9fd, dc29c0f, 5332381, a46722a, 2362079, 58eeb41, faa531f)

eqy

Should we add a (presumably large tensor) test for this?

bhack · 2024-05-10T18:27:15Z

Should we add a (presumably large tensor) test for this?

Do we had an INT_MAX test already somewhere that we could expand?

eqy · 2024-05-10T18:49:58Z

Unfortunately these are not really unified at the moment, but this should surface some examples: https://github.com/search?q=repo%3Apytorch%2Fpytorch+64bit+language%3APython+path%3A%2F%5Etest%5C%2F%2F&type=code

bhack · 2024-05-10T18:58:07Z

As we don't have a specific CUDA test do we want to find a workaround from python?

Can you suggest one from grep -R torch.nonzero test/?

bhack · 2024-05-10T21:42:31Z

I think I am going to close this as cub::DispatchSelectIf probably it will be slower then cub::DeviceSelect::Flagged we are currently using.

Probably we need to wait upstream for NVIDIA/cccl#1422

What do you think?

bhack · 2024-05-11T13:02:37Z

@ezyang Do you think we can we open a new ticket to lower this with Trition where and sum?
https://github.com/pytorch/pytorch/blob/a174c536f8f32b41da9efa647e364196423468a5/torch/_inductor/lowering.py#L2187C20-L2187C35

Edit:
The ticket is at #126003

bhack · 2024-05-11T15:53:25Z

aten/src/ATen/native/cuda/Nonzero.cu

+    using flag_iterator_t = cub::NullType*;
+    using equality_op_t   = cub::NullType;
+
+    return cub::DispatchSelectIf<


Does this requries cub/cccl 2.4.0?

ezyang · 2024-05-11T21:08:36Z

yes, need a big tensor test. @eqy's link is good for examples

bhack · 2024-05-11T21:42:37Z

Ok thanks,
so I am going to close it as I don't have the env and currently spare GPU computing to write a brand new test and recompile it.
At least if we don't identify another python test that it is already indirectly using nonzero and that we could modify it with a big input.

bhack · 2024-05-11T21:55:49Z

Just to check if it could compile at least with the current CUB version.
Do you know what is this CI failure?

/usr/local/cuda/include/cub/agent/agent_select_if.cuh(264): error: function "at::native::<unnamed>::NonZeroOp<T>::operator() [with T=c10::complex<c10::Half>]" cannot be called with the given argument list
            argument types are: (int64_t)
            object type is: at::native::<unnamed>::NonZeroOp<c10::complex<c10::Half>>
                  selection_flags[ITEM] = select_op(items[ITEM]);

bhack · 2024-05-13T14:43:53Z

I think we need cub/cub/agent/agent_select_if.cuh changes introduced at NVIDIA/cccl#1379

So this mean that we need to wait for the next cuda 12.4 update and make it also conditional.

ezyang · 2024-05-14T04:10:17Z

This PR seems fine. I agree you may need to preprocessor your way to victory. CI will say.

bhack · 2024-05-23T13:31:17Z

@ezyang the new [CUDA] 12.5](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html) delivers CUB 2.4.0 so it could be enough for this workaround.

extend nonzero to int64

65dbb6c

pytorch-bot bot added the release notes: cuda release notes category label May 9, 2024

Merge branch 'main' into nonzero

bb7240c

pytorchbot added the open source label May 9, 2024

bhack added 2 commits May 9, 2024 18:33

Merge branch 'pytorch:main' into nonzero

79ad3d3

Reintroduce inline comments

8cce2c7

bhack marked this pull request as ready for review May 9, 2024 16:52

bhack requested a review from eqy as a code owner May 9, 2024 16:52

bhack added 3 commits May 9, 2024 21:53

Remove macro

a2554de

Remove typo

2362079

Refactor

0841181

bhack force-pushed the nonzero branch from 9a9543f to 2655143 Compare May 10, 2024 15:31

Fix typo

a47573b

bhack force-pushed the nonzero branch from 2655143 to a47573b Compare May 10, 2024 15:38

eqy reviewed May 10, 2024

View reviewed changes

Add complex template

dc29c0f

bhack added 3 commits May 11, 2024 00:18

Reformat code

58eeb41

Add extra comments

052bb61

Format TODO

faa531f

bhack mentioned this pull request May 11, 2024

Compile/Lowering nonzero #126003

Closed

bhack commented May 11, 2024

View reviewed changes

bhack added 2 commits May 13, 2024 13:15

Add bool op for complex

637532c

Merge branch 'pytorch:main' into nonzero

5332381

ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 14, 2024

ezyang self-requested a review May 14, 2024 18:02

bhack added 2 commits May 15, 2024 00:37

Merge branch 'pytorch:main' into nonzero

679b9fd

Merge branch 'pytorch:main' into nonzero

a46722a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extend `nonzero` to int64 #125850

extend `nonzero` to int64 #125850

bhack commented May 9, 2024

pytorch-bot bot commented May 9, 2024 •

edited

bhack commented May 9, 2024

linux-foundation-easycla bot commented May 10, 2024 •

edited

eqy left a comment

bhack commented May 10, 2024 •

edited

eqy commented May 10, 2024

bhack commented May 10, 2024

bhack commented May 10, 2024

bhack commented May 11, 2024 •

edited

bhack May 11, 2024 •

edited

ezyang commented May 11, 2024

bhack commented May 11, 2024

bhack commented May 11, 2024 •

edited

bhack commented May 13, 2024

ezyang commented May 14, 2024

bhack commented May 23, 2024 •

edited

extend nonzero to int64 #125850

Are you sure you want to change the base?

extend nonzero to int64 #125850

Conversation

bhack commented May 9, 2024

pytorch-bot bot commented May 9, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125850

❌ 3 New Failures, 3 Unrelated Failures

bhack commented May 9, 2024

linux-foundation-easycla bot commented May 10, 2024 • edited

eqy left a comment

Choose a reason for hiding this comment

bhack commented May 10, 2024 • edited

eqy commented May 10, 2024

bhack commented May 10, 2024

bhack commented May 10, 2024

bhack commented May 11, 2024 • edited

bhack May 11, 2024 • edited

Choose a reason for hiding this comment

ezyang commented May 11, 2024

bhack commented May 11, 2024

bhack commented May 11, 2024 • edited

bhack commented May 13, 2024

ezyang commented May 14, 2024

bhack commented May 23, 2024 • edited

extend `nonzero` to int64 #125850

extend `nonzero` to int64 #125850

pytorch-bot bot commented May 9, 2024 •

edited

linux-foundation-easycla bot commented May 10, 2024 •

edited

bhack commented May 10, 2024 •

edited

bhack commented May 11, 2024 •

edited

bhack May 11, 2024 •

edited

bhack commented May 11, 2024 •

edited

bhack commented May 23, 2024 •

edited