Skip to content

Conversation

pragupta
Copy link
Contributor

@pragupta pragupta commented Oct 28, 2024

On ROCm, hipification converts std::min to ::min, but ::min is not returning the right result. This impacts index_add_ operation on a large tensor, we end up picking the large values instead of max supported block size (128). This leads to GPU accessing memory out of bounds.

While we wait for ::min to be fixed, we can use < operator to compare instead of relying on ::min.

Example Code w/ failure:

D=6144
hidden_states = torch.zeros([16384, 6144],           device="cuda:0", dtype=torch.bfloat16)
index         = torch.randint(0, 16384, (1, 32, 16384), device="cuda:0", dtype=torch.int64)
output        = torch.empty([1, 32, 16384, 6144],    device="cuda:0", dtype=torch.bfloat16)
hidden_states.index_add_(0, index.view(-1), output.view(-1, D))
Traceback (most recent call last):
RuntimeError: HIP error: invalid configuration argument

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

@pytorch-bot pytorch-bot bot added module: rocm AMD GPU support for Pytorch release notes: cuda release notes category labels Oct 28, 2024
Copy link

pytorch-bot bot commented Oct 28, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139087

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit bd6b78d with merge base 6bdbc86 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@jithunnair-amd jithunnair-amd added rocm This tag is for PRs from ROCm team ciflow/rocm Trigger "default" config CI on ROCm labels Oct 29, 2024
@pruthvistony pruthvistony marked this pull request as ready for review October 30, 2024 01:34
@pragupta
Copy link
Contributor Author

pragupta commented Nov 4, 2024

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased pg-msft-fix onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout pg-msft-fix && git pull --rebase)

Copy link
Collaborator

@eqy eqy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a reproducer snippet, add a test case please

@pragupta
Copy link
Contributor Author

pragupta commented Nov 4, 2024

If there is a reproducer snippet, add a test case please

@eqy -- I've added the reproducer code and the error generated with it in the PR description.

Example Code w/ failure:

D=6144
hidden_states = torch.zeros([16384, 6144],           device="cuda:0", dtype=torch.bfloat16)
index         = torch.randint(0, 16384, (1, 32, 16384), device="cuda:0", dtype=torch.int64)
output        = torch.empty([1, 32, 16384, 6144],    device="cuda:0", dtype=torch.bfloat16)
hidden_states.index_add_(0, index.view(-1), output.view(-1, D))
Traceback (most recent call last):
RuntimeError: HIP error: invalid configuration argument

@pragupta
Copy link
Contributor Author

pragupta commented Nov 5, 2024

If there is a reproducer snippet, add a test case please

@eqy -- I misunderstood earlier. Added UT for this edge case. Thanks

pragupta and others added 3 commits November 6, 2024 17:10
On ROCm, hipification converts std::min to ::min, but ::min is not
returning the right result. In the meantime, use < operator to comapre.
@pruthvistony pruthvistony added topic: not user facing topic category and removed release notes: cuda release notes category labels Nov 6, 2024
@pragupta
Copy link
Contributor Author

pragupta commented Nov 8, 2024

@eq -- if the UT looks good now, can you please provide your review on this one?

@pruthvistony
Copy link
Collaborator

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 19, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
On ROCm, hipification converts std::min to ::min, but ::min is not returning the right result. This impacts index_add_ operation on a large tensor, we end up picking the large values instead of max supported block size (128). This leads to GPU accessing memory out of bounds.

While we wait for ::min to be fixed, we can use < operator to compare instead of relying on ::min.

Example Code w/ failure:
```
D=6144
hidden_states = torch.zeros([16384, 6144],           device="cuda:0", dtype=torch.bfloat16)
index         = torch.randint(0, 16384, (1, 32, 16384), device="cuda:0", dtype=torch.int64)
output        = torch.empty([1, 32, 16384, 6144],    device="cuda:0", dtype=torch.bfloat16)
hidden_states.index_add_(0, index.view(-1), output.view(-1, D))
```

```
Traceback (most recent call last):
RuntimeError: HIP error: invalid configuration argument
```

Pull Request resolved: pytorch#139087
Approved by: https://github.com/jeffdaily, https://github.com/pruthvistony
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request Merged module: rocm AMD GPU support for Pytorch open source rocm This tag is for PRs from ROCm team topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants