Add 64bit indexing support for softmax #52713

zasdfgbnm · 2021-02-24T01:37:48Z

fixes #52715 #52716

split across batch dimension

facebook-github-bot · 2021-02-24T01:37:58Z

💊 CI failures summary and remediations

As of commit 48f0eba (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 1/2 non-scanned failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_xla_linux_bionic_py3_6_clang9_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Feb 24 19:00:20 sccache: error: couldn't connect to server

Feb 24 19:00:20 +++ eval 'extract_trap_cmd '
Feb 24 19:00:20 ++++ extract_trap_cmd
Feb 24 19:00:20 ++++ printf '%s\n' ''
Feb 24 19:00:20 +++ printf '%s\n' cleanup
Feb 24 19:00:20 ++ trap -- '
Feb 24 19:00:20 cleanup' EXIT
Feb 24 19:00:20 ++ [[ pytorch-xla-linux-bionic-py3.6-clang9-test != *pytorch-win-* ]]
Feb 24 19:00:20 ++ which sccache
Feb 24 19:00:20 ++ sccache --stop-server
Feb 24 19:00:20 Stopping sccache server...
Feb 24 19:00:20 sccache: error: couldn't connect to server
Feb 24 19:00:20 sccache: caused by: Connection refused (os error 111)
Feb 24 19:00:20 ++ true
Feb 24 19:00:20 ++ rm /var/lib/jenkins/sccache_error.log
Feb 24 19:00:20 ++ [[ pytorch-xla-linux-bionic-py3.6-clang9-test == *rocm* ]]
Feb 24 19:00:20 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
Feb 24 19:00:20 ++ SCCACHE_IDLE_TIMEOUT=1200
Feb 24 19:00:20 ++ RUST_LOG=sccache::server=error
Feb 24 19:00:20 ++ sccache --start-server
Feb 24 19:00:20 sccache: Starting the server...
Feb 24 19:00:20 ++ sccache --zero-stats

1 job timed out:

pytorch_xla_linux_bionic_py3_6_clang9_test

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

zasdfgbnm · 2021-02-24T02:44:17Z

cc: @ptrblck

ngimel

Can you please add tests?

aten/src/ATen/native/cuda/SoftMax.cu

zasdfgbnm · 2021-02-24T17:53:38Z

@ngimel I have fixed the bug you catch, and added a test. The test passes on my 3090.

zasdfgbnm · 2021-02-24T17:54:48Z

test/test_nn.py

@@ -11975,6 +11975,25 @@ def test_softmax_results(self, device, dtype):
                        self.assertEqual(grad_input, ref_grad_input)
                        self.assertEqual(input.grad, ref_input.grad)

+    @onlyCUDA
+    @dtypesIfCUDA(torch.float, torch.half)
+    @largeTensorTest("20GB")


On my 3090, half takes ~18GB mem, and float takes ~19.8GB

Will these tests run in your CI?

Our CI has A100 and 3090, so yes!

ngimel · 2021-02-24T18:11:18Z

test/test_nn.py

@@ -11975,6 +11975,25 @@ def test_softmax_results(self, device, dtype):
                        self.assertEqual(grad_input, ref_grad_input)
                        self.assertEqual(input.grad, ref_input.grad)

+    @onlyCUDA
+    @dtypesIfCUDA(torch.float, torch.half)
+    @largeTensorTest("20GB")


Will these tests run in your CI?

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-02-25T05:42:59Z

@ngimel merged this pull request in a6b7da7.

Summary: fixes pytorch#52715 pytorch#52716 split across batch dimension Pull Request resolved: pytorch#52713 Reviewed By: ailzhang Differential Revision: D26640033 Pulled By: ngimel fbshipit-source-id: f169cb0d6abc1cfbddf658d9775759a7d56f5c12

Add 64bit indexing support for softmax

a4ab234

facebook-github-bot added the cla signed label Feb 24, 2021

pytorchbot added the open source label Feb 24, 2021

zasdfgbnm added 6 commits February 23, 2021 17:56

save

730e38a

save

700a8f6

save

5f589cf

save

b99c58e

save

76b290a

save

a0cffcb

zasdfgbnm added the module: cuda Related to torch.cuda, and CUDA support in general label Feb 24, 2021

zasdfgbnm requested a review from ngimel February 24, 2021 02:45

zasdfgbnm linked an issue Feb 24, 2021 that may be closed by this pull request

CUDA error: invalid configuration argument for softmax #52716

Closed

ngimel reviewed Feb 24, 2021

View reviewed changes

aten/src/ATen/native/cuda/SoftMax.cu Outdated Show resolved Hide resolved

zasdfgbnm added 2 commits February 24, 2021 09:42

save

35ddb2d

save

fa90c67

zasdfgbnm mentioned this pull request Feb 24, 2021

[v.1.8.0] Release Tracker #51886

Closed

save

48f0eba

zasdfgbnm commented Feb 24, 2021

View reviewed changes

ngimel approved these changes Feb 24, 2021

View reviewed changes

facebook-github-bot reviewed Feb 24, 2021

View reviewed changes

malfet added this to the 1.8.1 milestone Feb 24, 2021

facebook-github-bot closed this in a6b7da7 Feb 25, 2021

facebook-github-bot added the Merged label Feb 25, 2021

zasdfgbnm deleted the ima-softmax branch February 25, 2021 14:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 64bit indexing support for softmax #52713

Add 64bit indexing support for softmax #52713

zasdfgbnm commented Feb 24, 2021 •

edited

facebook-github-bot commented Feb 24, 2021 •

edited

zasdfgbnm commented Feb 24, 2021

ngimel left a comment

zasdfgbnm commented Feb 24, 2021

zasdfgbnm Feb 24, 2021

ngimel Feb 24, 2021

zasdfgbnm Feb 24, 2021

ngimel Feb 24, 2021

facebook-github-bot left a comment

facebook-github-bot commented Feb 25, 2021

Add 64bit indexing support for softmax #52713

Add 64bit indexing support for softmax #52713

Conversation

zasdfgbnm commented Feb 24, 2021 • edited

facebook-github-bot commented Feb 24, 2021 • edited

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_xla_linux_bionic_py3_6_clang9_test (1/1)

zasdfgbnm commented Feb 24, 2021

ngimel left a comment

Choose a reason for hiding this comment

zasdfgbnm commented Feb 24, 2021

zasdfgbnm Feb 24, 2021

Choose a reason for hiding this comment

ngimel Feb 24, 2021

Choose a reason for hiding this comment

zasdfgbnm Feb 24, 2021

Choose a reason for hiding this comment

ngimel Feb 24, 2021

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Feb 25, 2021

zasdfgbnm commented Feb 24, 2021 •

edited

facebook-github-bot commented Feb 24, 2021 •

edited