Skip to content

Conversation

lixin-sxty
Copy link
Contributor

Implementation of the softmax_backward_data operator for the CPU backend produces incorrect results when the output argument is non-contiguous.

Here is a test case that demonstrates this issue:

torch.manual_seed(0)
op = torch.ops.aten._softmax_backward_data
grad_output = torch.ones(3, 3, 3)
temp = torch.randn(3, 10, 3)
out = temp[:, :3, :]
out = out.contiguous()
print(out.is_contiguous())
grad_input = op(grad_output, out, 1, torch.float32)
print(grad_input)

In this test case, the variable grad_input yields incorrect results if the line out = out.contiguous() is commented out. With this fix, grad_input consistently produces the same results whenever output is contiguous.

Copy link

pytorch-bot bot commented Nov 5, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139740

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 404a078 with merge base 12d225d (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link

linux-foundation-easycla bot commented Nov 5, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@lixin-sxty
Copy link
Contributor Author

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Nov 5, 2024
@lixin-sxty
Copy link
Contributor Author

@pytorchbot rebase

Copy link

pytorch-bot bot commented Nov 5, 2024

You don't have permissions to rebase this PR since you are a first time contributor. If you think this is a mistake, please contact PyTorch Dev Infra.

@lixin-sxty
Copy link
Contributor Author

@pytorchbot drci

@lixin-sxty
Copy link
Contributor Author

Cc @scw @svenstaro @JackDanger This pull request encounters a check label failure, and the error message is unclear. Can you assist me? Thank you.

@svenstaro
Copy link
Contributor

@lixin-sxty why ping me? I have nothing to do with this project.

@lixin-sxty
Copy link
Contributor Author

@lixin-sxty why ping me? I have nothing to do with this project.

Sorry for the misunderstanding; when I typed @, your ID auto-completed immediately, making me think you were a related contributor.

@lixin-sxty lixin-sxty changed the title Fix softmax_backward op error when offset is uncontinguous Fix softmax_backward_data cpu implementation error when offset is uncontinguous Nov 6, 2024
…ontinguous

Signed-off-by: xin.li <xin.li@enflame-tech.com>
@bdhirsh
Copy link
Contributor

bdhirsh commented Nov 8, 2024

@lixin-sxty can you add a test, maybe in test/test_torch.py? Also - I see some noncontiguous tests for softmax_backward_data are skipped here, can you try removing that skip() and see if your fix causes those tests to pass? https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_methods_invocations.py#L14538

@bdhirsh bdhirsh self-requested a review November 8, 2024 15:48
@bdhirsh bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 8, 2024
@lixin-sxty lixin-sxty changed the title Fix softmax_backward_data cpu implementation error when offset is uncontinguous Fix softmax_backward_data cpu implementation error when argument output is uncontinguous Nov 9, 2024
@lixin-sxty
Copy link
Contributor Author

@lixin-sxty can you add a test, maybe in test/test_torch.py? Also - I see some noncontiguous tests for softmax_backward_data are skipped here, can you try removing that skip() and see if your fix causes those tests to pass? https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_methods_invocations.py#L14538

I will check the tests for softmax_backward_data in common_methods_invocations.py and add a test case if necessary.

@lixin-sxty lixin-sxty requested a review from mruberry as a code owner November 11, 2024 08:00
@lixin-sxty lixin-sxty changed the title Fix softmax_backward_data cpu implementation error when argument output is uncontinguous Fix softmax_backward_data cpu implementation error when argument output is noncontinguous Nov 11, 2024
@lixin-sxty
Copy link
Contributor Author

According to https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_methods_invocations.py#L14538, the test_noncontiguous_samples was failing on CPU. This case now passes with the fix, so I have removed this flag. @bdhirsh

@lixin-sxty
Copy link
Contributor Author

@bdhirsh @mruberry Could you please review this change? Thank you!

@zou3519 zou3519 added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 15, 2024
@bdhirsh
Copy link
Contributor

bdhirsh commented Nov 15, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
…ut is noncontinguous (pytorch#139740)

Implementation of the `softmax_backward_data` operator for the CPU backend produces incorrect results when the `output` argument is non-contiguous.

Here is a test case that demonstrates this issue:

```python
torch.manual_seed(0)
op = torch.ops.aten._softmax_backward_data
grad_output = torch.ones(3, 3, 3)
temp = torch.randn(3, 10, 3)
out = temp[:, :3, :]
out = out.contiguous()
print(out.is_contiguous())
grad_input = op(grad_output, out, 1, torch.float32)
print(grad_input)
```

In this test case, the variable `grad_input` yields incorrect results if the line `out = out.contiguous()` is commented out. With this fix, `grad_input` consistently produces the same results whenever `output` is contiguous.

Pull Request resolved: pytorch#139740
Approved by: https://github.com/zou3519
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants