[ROCm] Fix largeIndexBlockSize #139087

pragupta · 2024-10-28T17:42:38Z

On ROCm, hipification converts std::min to ::min, but ::min is not returning the right result. This impacts index_add_ operation on a large tensor, we end up picking the large values instead of max supported block size (128). This leads to GPU accessing memory out of bounds.

While we wait for ::min to be fixed, we can use < operator to compare instead of relying on ::min.

Example Code w/ failure:

D=6144
hidden_states = torch.zeros([16384, 6144],           device="cuda:0", dtype=torch.bfloat16)
index         = torch.randint(0, 16384, (1, 32, 16384), device="cuda:0", dtype=torch.int64)
output        = torch.empty([1, 32, 16384, 6144],    device="cuda:0", dtype=torch.bfloat16)
hidden_states.index_add_(0, index.view(-1), output.view(-1, D))

Traceback (most recent call last):
RuntimeError: HIP error: invalid configuration argument

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

pytorch-bot · 2024-10-28T17:42:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139087

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

✅ No Failures

As of commit bd6b78d with merge base 6bdbc86 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pragupta · 2024-11-04T17:46:12Z

@pytorchbot rebase

pytorchmergebot · 2024-11-04T17:47:40Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-11-04T17:47:46Z

Successfully rebased pg-msft-fix onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout pg-msft-fix && git pull --rebase)

eqy

If there is a reproducer snippet, add a test case please

pragupta · 2024-11-04T19:46:55Z

If there is a reproducer snippet, add a test case please

@eqy -- I've added the reproducer code and the error generated with it in the PR description.

Example Code w/ failure:

D=6144
hidden_states = torch.zeros([16384, 6144],           device="cuda:0", dtype=torch.bfloat16)
index         = torch.randint(0, 16384, (1, 32, 16384), device="cuda:0", dtype=torch.int64)
output        = torch.empty([1, 32, 16384, 6144],    device="cuda:0", dtype=torch.bfloat16)
hidden_states.index_add_(0, index.view(-1), output.view(-1, D))

Traceback (most recent call last):
RuntimeError: HIP error: invalid configuration argument

pragupta · 2024-11-05T03:53:36Z

If there is a reproducer snippet, add a test case please

@eqy -- I misunderstood earlier. Added UT for this edge case. Thanks

On ROCm, hipification converts std::min to ::min, but ::min is not returning the right result. In the meantime, use < operator to comapre.

pragupta · 2024-11-08T16:27:48Z

@eq -- if the UT looks good now, can you please provide your review on this one?

pruthvistony · 2024-11-19T04:10:13Z

@pytorchbot merge

pytorchmergebot · 2024-11-19T04:12:02Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

On ROCm, hipification converts std::min to ::min, but ::min is not returning the right result. This impacts index_add_ operation on a large tensor, we end up picking the large values instead of max supported block size (128). This leads to GPU accessing memory out of bounds. While we wait for ::min to be fixed, we can use < operator to compare instead of relying on ::min. Example Code w/ failure: ``` D=6144 hidden_states = torch.zeros([16384, 6144], device="cuda:0", dtype=torch.bfloat16) index = torch.randint(0, 16384, (1, 32, 16384), device="cuda:0", dtype=torch.int64) output = torch.empty([1, 32, 16384, 6144], device="cuda:0", dtype=torch.bfloat16) hidden_states.index_add_(0, index.view(-1), output.view(-1, D)) ``` ``` Traceback (most recent call last): RuntimeError: HIP error: invalid configuration argument ``` Pull Request resolved: pytorch#139087 Approved by: https://github.com/jeffdaily, https://github.com/pruthvistony

pytorch-bot bot added module: rocm AMD GPU support for Pytorch release notes: cuda release notes category labels Oct 28, 2024

pytorchbot added the open source label Oct 28, 2024

jeffdaily approved these changes Oct 28, 2024

View reviewed changes

jithunnair-amd added rocm This tag is for PRs from ROCm team ciflow/rocm Trigger "default" config CI on ROCm labels Oct 29, 2024

pruthvistony approved these changes Oct 30, 2024

View reviewed changes

pruthvistony marked this pull request as ready for review October 30, 2024 01:34

pruthvistony requested review from eqy and syed-ahmed as code owners October 30, 2024 01:34

pytorchmergebot force-pushed the pg-msft-fix branch from c0266db to f44faf5 Compare November 4, 2024 17:47

eqy reviewed Nov 4, 2024

View reviewed changes

pragupta force-pushed the pg-msft-fix branch from f44faf5 to 2f0b828 Compare November 5, 2024 03:52

pragupta and others added 3 commits November 6, 2024 17:10

[ROCm] Fix largeIndexBlockSize

573690a

On ROCm, hipification converts std::min to ::min, but ::min is not returning the right result. In the meantime, use < operator to comapre.

Add UT for large input sizes to index_add_

0159a6e

Update UT to use torch.ones for deterministic results

f67d674

pragupta force-pushed the pg-msft-fix branch from 8ee6c06 to f67d674 Compare November 6, 2024 17:14

pruthvistony added topic: not user facing topic category and removed release notes: cuda release notes category labels Nov 6, 2024

skip UT for cuda

bd6b78d

pragupta force-pushed the pg-msft-fix branch from b7eb581 to bd6b78d Compare November 7, 2024 20:50

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 19, 2024

pytorchmergebot added the merging label Nov 19, 2024

pytorchmergebot added the Merged label Nov 19, 2024

pytorchmergebot closed this in 7156d08 Nov 19, 2024

pytorchmergebot removed the merging label Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] Fix largeIndexBlockSize #139087

[ROCm] Fix largeIndexBlockSize #139087

Uh oh!

pragupta commented Oct 28, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 28, 2024 •

edited

Loading

Uh oh!

pragupta commented Nov 4, 2024

Uh oh!

pytorchmergebot commented Nov 4, 2024

Uh oh!

pytorchmergebot commented Nov 4, 2024

Uh oh!

eqy left a comment

Uh oh!

pragupta commented Nov 4, 2024

Uh oh!

pragupta commented Nov 5, 2024

Uh oh!

pragupta commented Nov 8, 2024

Uh oh!

pruthvistony commented Nov 19, 2024

Uh oh!

pytorchmergebot commented Nov 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[ROCm] Fix largeIndexBlockSize #139087

[ROCm] Fix largeIndexBlockSize #139087

Uh oh!

Conversation

pragupta commented Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139087

❗ 1 Active SEVs

✅ No Failures

Uh oh!

pragupta commented Nov 4, 2024

Uh oh!

pytorchmergebot commented Nov 4, 2024

Uh oh!

pytorchmergebot commented Nov 4, 2024

Uh oh!

eqy left a comment

Choose a reason for hiding this comment

Uh oh!

pragupta commented Nov 4, 2024

Uh oh!

pragupta commented Nov 5, 2024

Uh oh!

pragupta commented Nov 8, 2024

Uh oh!

pruthvistony commented Nov 19, 2024

Uh oh!

pytorchmergebot commented Nov 19, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

pragupta commented Oct 28, 2024 •

edited

Loading

pytorch-bot bot commented Oct 28, 2024 •

edited

Loading