Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add batch impl. for inplace index_add operation #112276

Closed

Conversation

guilhermeleobas
Copy link
Collaborator

@guilhermeleobas guilhermeleobas commented Oct 27, 2023

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 27, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112276

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit e9bd676 with merge base c120e56 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

guilhermeleobas added a commit that referenced this pull request Oct 27, 2023
ghstack-source-id: a845ffe9edc81ea23ffc7ac99fc1dd2527b6e7ec
Pull Request resolved: #112276
@guilhermeleobas
Copy link
Collaborator Author

ref: #105539

guilhermeleobas added a commit that referenced this pull request Oct 27, 2023
ghstack-source-id: 5c48a2db610752992519c4f1bd97d571bf0dffb1
Pull Request resolved: #112276
@guilhermeleobas guilhermeleobas self-assigned this Oct 28, 2023
@guilhermeleobas guilhermeleobas marked this pull request as ready for review October 28, 2023 18:06
@guilhermeleobas guilhermeleobas added module: functorch Pertaining to torch.func or pytorch/functorch release notes: functorch release notes category; Pertaining to torch.func or pytorch/functorch labels Oct 30, 2023
Copy link
Collaborator

@kshitij12345 kshitij12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but have couple of questions. Thank you!

@@ -956,6 +955,7 @@ def vjp_of_vjp(*args_and_cotangents):
{torch.float32: tol(atol=5e-04, rtol=1e-04)}, device_type="cuda"),
))
@skipOps('TestOperators', 'test_vmapvjp', vmapvjp_fail.union({
xfail('as_strided'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are just moving above xfail here? Is that correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. Before the xfail was in vmapvjp_fail, which is used in other places of this file. With the addition of index_add, there is a batch rule for as_strided, but it fails on some specific cases.

@@ -977,11 +991,40 @@ std::tuple<Tensor,optional<int64_t>> index_add_batch_rule(
other.select(*other_bdim, i) : other;
const auto& index_slice = index_bdim.has_value() ?
index.select(*index_bdim, i) : index;
results.push_back(at::index_add(self_slice, dim, index_slice, other_slice, alpha));

if (inplace) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a simple way to not initialize results for inplace case (as it is not used then).

Or maybe we should guard call to reserve with if (!inplace) { results.reserver(batch_size)}

Wdyt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I've guarded the initialization as you suggested.

cc zou3519 Chillee samdow kshitij12345 janeyx99

[ghstack-poisoned]
guilhermeleobas added a commit that referenced this pull request Oct 30, 2023
ghstack-source-id: 426b5d4e54220922af1d693c77eefa2d3a450336
Pull Request resolved: #112276
Copy link
Collaborator

@kshitij12345 kshitij12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @guilhermeleobas

@guilhermeleobas
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 30, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@PaliC
Copy link
Contributor

PaliC commented Oct 31, 2023

@pytorchbot revert -m "breaking linux binary builds" -c "nosignal"

You can find the breakage here: https://hud.pytorch.org/pytorch/pytorch/commit/e3c8c63deaf594699d827e84869a3ecd7e2ab494

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@guilhermeleobas your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Oct 31, 2023
This reverts commit e3c8c63.

Reverted #112276 on behalf of https://github.com/PaliC due to breaking linux binary builds ([comment](#112276 (comment)))
@guilhermeleobas
Copy link
Collaborator Author

Hi @PaliC, failure doesn't seem to be related to the changes I did in this PR:

...
2023-10-31T04:53:40.4122964Z ++ echo 'Copied /usr/lib64/libgomp.so.1 to torch/lib/libgomp-a34b3233.so.1'
2023-10-31T04:53:40.4123779Z Copied /usr/lib64/libgomp.so.1 to torch/lib/libgomp-a34b3233.so.1
2023-10-31T04:53:40.4124389Z ++ for filepath in '"${DEPS_LIST[@]}"'
2023-10-31T04:53:40.4124860Z +++ basename /usr/local/cuda/lib64/libcusparseLt.so.0
2023-10-31T04:53:40.4133181Z ++ filename=libcusparseLt.so.0
2023-10-31T04:53:40.4133594Z ++ destpath=torch/lib/libcusparseLt.so.0
2023-10-31T04:53:40.4134414Z ++ [[ /usr/local/cuda/lib64/libcusparseLt.so.0 != \t\o\r\c\h\/\l\i\b\/\l\i\b\c\u\s\p\a\r\s\e\L\t\.\s\o\.\0 ]]
2023-10-31T04:53:40.4135402Z ++ cp /usr/local/cuda/lib64/libcusparseLt.so.0 torch/lib/libcusparseLt.so.0
2023-10-31T04:53:40.4143847Z cp: cannot stat ‘/usr/local/cuda/lib64/libcusparseLt.so.0’: No such file or directory

Can I just rebase and merge it again?

@malfet
Copy link
Contributor

malfet commented Oct 31, 2023

@pytorchbot merge -f "Revert was in error, sorry about that"

@malfet
Copy link
Contributor

malfet commented Oct 31, 2023

@pytorchbot revert -m "breaking linux binary builds" -c "nosignal"

You can find the breakage here: https://hud.pytorch.org/pytorch/pytorch/commit/e3c8c63deaf594699d827e84869a3ecd7e2ab494

I think failure is unrelated, as revert clearly did not help, cause of regression is pytorch/builder@7790132 that modified docker images for main, even though it should have targeted only 2.1

Also, nosignal is wrong classification here, as those builds were triggered at the time of merge, see https://github.com/pytorch/pytorch/actions/runs/6698655783/job/18201217605

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@kit1980 kit1980 removed the Reverted label Nov 1, 2023
@facebook-github-bot facebook-github-bot deleted the gh/guilhermeleobas/9/head branch November 3, 2023 14:27
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023
…2276)"

This reverts commit e3c8c63.

Reverted pytorch#112276 on behalf of https://github.com/PaliC due to breaking linux binary builds ([comment](pytorch#112276 (comment)))
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
…2276)"

This reverts commit e3c8c63.

Reverted pytorch#112276 on behalf of https://github.com/PaliC due to breaking linux binary builds ([comment](pytorch#112276 (comment)))
andreigh pushed a commit to andreigh/pytorch that referenced this pull request Nov 19, 2023
andreigh pushed a commit to andreigh/pytorch that referenced this pull request Nov 19, 2023
…2276)"

This reverts commit e3c8c63.

Reverted pytorch#112276 on behalf of https://github.com/PaliC due to breaking linux binary builds ([comment](pytorch#112276 (comment)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged module: functorch Pertaining to torch.func or pytorch/functorch open source release notes: functorch release notes category; Pertaining to torch.func or pytorch/functorch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants