Add vmap support for torch.index_fill #91364

qqaatw · 2022-12-23T17:18:53Z

Fixes #91177

pytorch-bot · 2022-12-23T17:18:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91364

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Failures

As of commit c05b326:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

zou3519

Thanks for the Pull Request, @qqaatw. torch.index_fill is a tricky one to handle.

Your code looks correct to me. For some of the cases -- vmap is supposed to "eliminate" the for-loop. It looks like we're still doing a for-loop here. It should be possible to avoid the for-loop by modifying the index and then doing a single call to index_fill (see my inline comments), please let me know your thoughts.

aten/src/ATen/functorch/BatchRulesScatterOps.cpp

zou3519 · 2022-12-27T15:40:11Z

aten/src/ATen/functorch/BatchRulesScatterOps.cpp

+std::tuple<Tensor,optional<int64_t>> index_fill__int_scalar_batch_rule(
+    Tensor & self, optional<int64_t> self_bdim,
+    int dim,
+    const Tensor & index, optional<int64_t> index_bdim,
+    const Scalar & value) {
+    return index_fill_int_scalar_batch_rule_impl(self, self_bdim, dim, index, index_bdim, value, true);
+}
+
+std::tuple<Tensor,optional<int64_t>> index_fill__int_tensor_batch_rule(
+    Tensor & self, optional<int64_t> self_bdim,
+    int dim,
+    const Tensor & index, optional<int64_t> index_bdim,
+    const Tensor & value, optional<int64_t> value_bdim) {
+    return index_fill_int_tensor_batch_rule_impl(self, self_bdim, dim, index, index_bdim, value, value_bdim, true);
+}


The signature for inplace batching rules is actually

void index_fill__int_scalar_batch_rule(...) { ... }

Since the operation is in-place, we just return self and self_bdim. This happens somewhere in the codegen for vmap (I can link it if you're interested). The codegen ends up ignoring the return value of this function. To make it clearer, we should change the signature to return void.

This makes sense. I'm indeed interested in the codegen part, can you please point out the link?

I should probably toss this into a guide somewhere, but:

here is the codegen

it generates a {operator}_generated_plumbing for each PyTorch ATen operator. You can see the output in the local build/aten/src/ATen/VmapGeneratedPlumbing.h file after you build PyTorch. example

the VMAP_SUPPORT(operator, batch_rule) macro is just {operator}_generated_plumbing<decltype(batch_rule), batch_rule>.

you'll notice that the plumbing doesn't actually use the return value of batch_rule, so it can be whatever. To avoid confusion, we should return nothing from index_fill__int_tensor_batch_rule

Thanks for the info! So based on what I found, since the in-place plumbing doesn't take any return value from the batch_rule, the batch dims that seem to me are expected to be no change after the batch_rule. As a result, should we move the batch dims back before returning from the batch_rule?

From the implementation of _index_put_impl__batch_rule, for example, it seems not moving the batch dims back.

When we move the batch dims to the front, usually we use at::movedim (or equivalent). This produces a new Tensor that is a view of the original one. In this case, there's no need to move the batch dims back -- the original tensor still has the batch dim in the correct position.

aten/src/ATen/functorch/BatchRulesScatterOps.cpp

zou3519 · 2022-12-28T22:16:52Z

aten/src/ATen/functorch/BatchRulesScatterOps.cpp

+    for (const auto i : c10::irange(0, batch_size)) {
+      const auto& self_slice = self_.select(0, i);
+      const auto& index_slice = index_.select(0, i);
+      const auto& value_slice = value_.select(0, i);
+      self_slice.index_fill_(
+        dim,
+        index_slice,
+        value_slice
+      );
+    }


I haven't thought about this case as much, but, can we do something (the arange + single index_fill_ ) in the out-of-place case here?

I think we can. But since value can only be 1-element tensor to be fed into index_fill_, this path is used only when value is not batched.

Another thing is that the test framework doesn't include a test sample with a tensor value, i.e. they're all scalar value, so currently we don't test this batch rule. I'll open a PR to add one later. (#91534)

pytorch/torch/testing/_internal/common_methods_invocations.py

Lines 4195 to 4197 in 5030929

elif fill:

# A weird number to catch errors

args.append(make_arg((1,)).item())

Thank you for checking. My suggestion would be to get this PR merged first (with this for-loop, and after index_fill_int_scalar_batch_rule_impl is in a good state), and then we can work out how to improve the for-loop in index_fill_int_tensor_batch_rule_impl in a follow-up (if you're interested).

ok, sounds reasonable.

aten/src/ATen/functorch/BatchRulesScatterOps.cpp

qqaatw · 2022-12-29T07:56:52Z

aten/src/ATen/functorch/BatchRulesScatterOps.cpp

+    for (const auto i : c10::irange(0, batch_size)) {
+      const auto& self_slice = self_.select(0, i);
+      const auto& index_slice = index_.select(0, i);
+      const auto& value_slice = value_.select(0, i);
+      self_slice.index_fill_(
+        dim,
+        index_slice,
+        value_slice
+      );
+    }


I think we can. But since value can only be 1-element tensor to be fed into index_fill_, this path is used only when value is not batched.

Another thing is that the test framework doesn't include a test sample with a tensor value, i.e. they're all scalar value, so currently we don't test this batch rule. I'll open a PR to add one later. (#91534)

pytorch/torch/testing/_internal/common_methods_invocations.py

Lines 4195 to 4197 in 5030929

elif fill:

# A weird number to catch errors

args.append(make_arg((1,)).item())

aten/src/ATen/functorch/BatchRulesScatterOps.cpp

zou3519

Happy New Year @qqaatw. My apologies for the delayed reply, I was out for the past couple of days. I left some more comments in the PR, it's looking pretty good so far! index_fill is one of the more complicated batching rules; thank you for taking this on.

aten/src/ATen/functorch/BatchRulesScatterOps.cpp

zou3519 · 2023-01-03T15:56:41Z

aten/src/ATen/functorch/BatchRulesScatterOps.cpp

+        // If self.dim() is 0 or 1, the batch dim is certainly 0, and we must apply batched indices to each row.
+        index_ = reshape_dim_into(0, 0, index_);
+        self_.unsqueeze_(-1).index_fill_(dim + 1, index_, value).squeeze_(-1);


Is this case just to improve performance? If so, I'd prefer to remove it so that all of our code goes through the above case: it makes the code a bit simpler to reason about if there's just a single case for out-of-place

No it's not just for performance improvement (or maybe I'm missing something). In case of self.dim() == 0 or 1 because the only dim is the batch dim, we need to add another dim at the last in order to apply batched index to it.

For example:

If self is [1, 2] and a batched index, where the batch dim is 0, is [[0], [0]], we should fill value to self[0] and self[1] instead of only self[0].

zou3519 · 2023-01-03T15:59:03Z

aten/src/ATen/functorch/BatchRulesScatterOps.cpp

+    for (const auto i : c10::irange(0, batch_size)) {
+      const auto& self_slice = self_.select(0, i);
+      const auto& index_slice = index_.select(0, i);
+      const auto& value_slice = value_.select(0, i);
+      self_slice.index_fill_(
+        dim,
+        index_slice,
+        value_slice
+      );
+    }


Thank you for checking. My suggestion would be to get this PR merged first (with this for-loop, and after index_fill_int_scalar_batch_rule_impl is in a good state), and then we can work out how to improve the for-loop in index_fill_int_tensor_batch_rule_impl in a follow-up (if you're interested).

qqaatw

Hello, Happy New Year @zou3519. No worries about the delay, and thank you for the comments!

qqaatw · 2023-01-05T07:29:22Z

aten/src/ATen/functorch/BatchRulesScatterOps.cpp

+    for (const auto i : c10::irange(0, batch_size)) {
+      const auto& self_slice = self_.select(0, i);
+      const auto& index_slice = index_.select(0, i);
+      const auto& value_slice = value_.select(0, i);
+      self_slice.index_fill_(
+        dim,
+        index_slice,
+        value_slice
+      );
+    }


ok, sounds reasonable.

aten/src/ATen/functorch/BatchRulesScatterOps.cpp

qqaatw · 2023-01-06T10:16:31Z

aten/src/ATen/functorch/BatchRulesScatterOps.cpp

+        // If self.dim() is 0 or 1, the batch dim is certainly 0, and we must apply batched indices to each row.
+        index_ = reshape_dim_into(0, 0, index_);
+        self_.unsqueeze_(-1).index_fill_(dim + 1, index_, value).squeeze_(-1);


No it's not just for performance improvement (or maybe I'm missing something). In case of self.dim() == 0 or 1 because the only dim is the batch dim, we need to add another dim at the last in order to apply batched index to it.

For example:

If self is [1, 2] and a batched index, where the batch dim is 0, is [[0], [0]], we should fill value to self[0] and self[1] instead of only self[0].

qqaatw · 2023-01-09T17:22:58Z

Hello @zou3519, the current state should be ok, thanks for your suggestions. I'm happy to open follow-up PRs for index_fill_int_tensor_batch_rule_impl improvements.

zou3519 · 2023-01-10T15:14:37Z

Thank you! I'll take a look. I might not be very responsive this week, so my apologies in advance

zou3519

I read through the code, and I'm not completely sure that the out-of-place path where index is batched is correct on the edge cases. In particular, the OpInfo testing appears to be missing SampleInputs for those edge cases.

I've suggested some edge cases that we should manually test via vmap_opinfo_test. If your code passes the tests, then I am happy, if not, then my suggestion might be relevant.

test/functorch/test_vmap.py

aten/src/ATen/functorch/BatchRulesScatterOps.cpp

zou3519 · 2023-01-18T19:07:37Z

aten/src/ATen/functorch/BatchRulesScatterOps.cpp

+std::tuple<Tensor,optional<int64_t>> index_fill_int_scalar_batch_rule_impl(
+    Tensor & self, optional<int64_t> self_bdim,
+    int64_t dim,
+    const Tensor & index, optional<int64_t> index_bdim,
+    const Scalar & value,
+    const bool inplace) {
+    const auto self_logical_rank = rankWithoutBatchDim(self, self_bdim);
+    const auto index_logical_rank = rankWithoutBatchDim(index, index_bdim);
+    Tensor self_ = moveBatchDimToFront(self, self_bdim);
+    Tensor index_ = moveBatchDimToFront(index, index_bdim);
+    dim = maybe_wrap_dim(dim, self_logical_rank);
+
+    if (inplace && !self_bdim.has_value()) {
+      vmapIncompatibleInplaceError("index_fill_");
+    }
+


Some style guide nits: the intents should be at different levels, otherwise it gets a bit difficult to read. Ditto for the other functions you added:

std::tuple<Tensor,optional<int64_t>> index_fill_int_scalar_batch_rule_impl( Tensor & self, optional<int64_t> self_bdim, Tensor & self, optional<int64_t> self_bdim, int64_t dim, const Tensor & index, optional<int64_t> index_bdim, const Scalar & value, const bool inplace) { const auto self_logical_rank = rankWithoutBatchDim(self, self_bdim); const auto index_logical_rank = rankWithoutBatchDim(index, index_bdim); Tensor self_ = moveBatchDimToFront(self, self_bdim); ... }

qqaatw · 2023-01-19T08:42:31Z

@zou3519 Done. Thank you for all your help on this PR!

@zou3519 @Chillee I have opened another PR #91534 that adds a test sample of index_fill to the OpInfo. But there are too many failures that are hard to xfail all of them. The reason of those failures, I guess, is that most implementations are not correctly taking the value gradient calculation into account. Could you point out some suggestions there? If there are other parts of functorch that I could contribute, please let me know. I'm happy and interested in contributing more. I'm on PyTorch's Slack channel also if you prefer to talk there to prevent mail flooding. Thanks.

qqaatw · 2023-01-19T08:43:00Z

@pytorchbot merge

pytorchmergebot · 2023-01-19T08:44:50Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-01-19T10:20:57Z

Merge failed

Reason: GraphQL query
fragment PRReviews on PullRequestReviewConnection {
nodes {
author {
login
}
state
}
pageInfo {
startCursor
hasPreviousPage
}
}

fragment PRCheckSuites on CheckSuiteConnection {
edges {
node {
app {
name
databaseId
}
workflowRun {
workflow {
name
}
url
}
checkRuns(first: 50) {
nodes {
name
conclusion
detailsUrl
}
pageInfo {
endCursor
hasNextPage
}
}
conclusion
}
cursor
}
pageInfo {
hasNextPage
}
}

fragment CommitAuthors on PullRequestCommitConnection {
nodes {
commit {
author {
user {
login
}
email
name
}
oid
}
}
pageInfo {
endCursor
hasNextPage
}
}

query ($owner: String!, $name: String!, $number: Int!) {
repository(owner: $owner, name: $name) {
pullRequest(number: $number) {
closed
isCrossRepository
author {
login
}
title
body
headRefName
headRepository {
nameWithOwner
}
baseRefName
baseRepository {
nameWithOwner
isPrivate
defaultBranchRef {
name
}
}
mergeCommit {
oid
}
commits_with_authors: commits(first: 100) {
...CommitAuthors
totalCount
}
commits(last: 1) {
nodes {
commit {
checkSuites(first: 10) {
...PRCheckSuites
}
status {
contexts {
context
state
targetUrl
}
}
pushedDate
oid
}
}
}
changedFiles
files(first: 100) {
nodes {
path
}
pageInfo {
endCursor
hasNextPage
}
}
reviews(last: 100) {
...PRReviews
}
comments(last: 5) {
nodes {
bodyText
createdAt
author {
login
}
authorAssociation
editor {
login
}
databaseId
}
pageInfo {
startCursor
hasPreviousPage
}
}
labels(first: 100) {
edges {
node {
name
}
}
}
}
}
}
, args {'name': 'pytorch', 'owner': 'pytorch', 'number': 91364} failed: [{'message': 'Something went wrong while executing your query. Please include 0401:674C:11ECF17:24D7532:63C91988 when reporting this issue.'}]

Details for Dev Infra team

Raised by workflow job

qqaatw · 2023-01-20T19:41:43Z

The failing tests seem to be unrelated to the changes this PR brings.

cc @Chillee @kshitij12345

qqaatw · 2023-01-29T09:41:20Z

@zou3519 @Chillee @kshitij12345 Hello, can you help merge this? the test failures seem to be unrelated!

kshitij12345 · 2023-01-29T09:46:54Z

@pytorchbot rebase

pytorchmergebot · 2023-01-29T09:48:46Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2023-01-29T09:48:48Z

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/91364/head returned non-zero exit code 1

Rebasing (1/12)
Auto-merging test/functorch/test_vmap.py
CONFLICT (content): Merge conflict in test/functorch/test_vmap.py
error: could not apply 94d71204b9... Add vmap support for torch.index_fill
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply 94d71204b9... Add vmap support for torch.index_fill

Raised by https://github.com/pytorch/pytorch/actions/runs/4036185466

kshitij12345 · 2023-01-29T09:52:43Z

@qqaatw can you please rebase on latest viable/strict branch and push the code. Post that we can merge it.

Thank you and sorry it slipped through the notifications :)

qqaatw · 2023-01-29T10:02:30Z

@kshitij12345 merged. didn't rebase as there are conflicts on multiple commits.
Can you please also provide some suggestions on this? #91364 (comment)

Thanks!

kshitij12345 · 2023-01-29T10:10:12Z

@qqaatw will take a look at the comment on Monday :)

kshitij12345 · 2023-01-29T10:10:21Z

@pytorchbot merge

pytorchmergebot · 2023-01-29T10:11:56Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-01-29T10:11:58Z

Merge failed

Reason: 1 jobs have failed, first few of them are: linux-binary-libtorch-pre-cxx11 / libtorch-cpu-shared-with-deps-pre-cxx11-build / build

Details for Dev Infra team

Raised by workflow job

kshitij12345 · 2023-01-30T08:06:48Z

@pytorchbot merge -f"Unrelated failures in libtorch CI"

pytorchmergebot · 2023-01-30T08:08:29Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…fill batch rule" A follow-up PR for #91364 (comment) [ghstack-poisoned]

…fill batch rule" A follow-up PR for #91364 (comment) cc zou3519 Chillee samdow soumith kshitij12345 janeyx99 [ghstack-poisoned]

…rule (#99229) A follow-up PR for #91364 (comment) Pull Request resolved: #99229 Approved by: https://github.com/kshitij12345

qqaatw requested a review from zou3519 as a code owner December 23, 2022 17:18

pytorchbot added the open source label Dec 23, 2022

qqaatw added 5 commits December 27, 2022 18:51

Add vmap support for torch.index_fill

94d7120

up

4b21310

Remove xfail

5466075

wip

1ec685e

Fix tests

b0cbf0f

qqaatw force-pushed the vmap_index_fill branch from c871aa8 to b0cbf0f Compare December 27, 2022 10:52

zou3519 reviewed Dec 27, 2022

View reviewed changes

bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 27, 2022

qqaatw added 3 commits December 29, 2022 02:55

Add a batched calculation path

79bac60

Fix incorrect output positions in the test

ca841c6

Merge branch 'master' into vmap_index_fill

cbffd81

zou3519 self-requested a review December 28, 2022 19:56

zou3519 reviewed Dec 28, 2022

View reviewed changes

qqaatw commented Dec 29, 2022

View reviewed changes

zou3519 reviewed Jan 3, 2023

View reviewed changes

wip

8e536aa

qqaatw commented Jan 6, 2023

View reviewed changes

Fix everything

37c2bd4

qqaatw force-pushed the vmap_index_fill branch from 3e7a655 to 37c2bd4 Compare January 9, 2023 10:29

Fix test

092de73

zou3519 self-requested a review January 10, 2023 15:14

zou3519 reviewed Jan 10, 2023

View reviewed changes

test/functorch/test_vmap.py Outdated Show resolved Hide resolved

aten/src/ATen/functorch/BatchRulesScatterOps.cpp Outdated Show resolved Hide resolved

Fix edge cases and add more tests

6dc5332

qqaatw requested a review from zou3519 January 15, 2023 16:21

zou3519 reviewed Jan 18, 2023

View reviewed changes

qqaatw requested a review from Chillee as a code owner January 19, 2023 08:19

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 19, 2023

Merge branch 'viable/strict' into vmap_index_fill

c05b326

pytorchmergebot added the Merged label Jan 30, 2023

pytorchmergebot closed this in 5112f44 Jan 30, 2023

qqaatw mentioned this pull request Apr 15, 2023

[functorch] Prevent using for-loop for out-of-place index_fill batch rule #99229

Closed

qqaatw added a commit that referenced this pull request Apr 17, 2023

Update on "[functorch] Prevent using for-loop for out-of-place index_…

7f62fac

…fill batch rule" A follow-up PR for #91364 (comment) [ghstack-poisoned]

qqaatw added a commit that referenced this pull request Apr 17, 2023

Update on "[functorch] Prevent using for-loop for out-of-place index_…

beaaaaa

…fill batch rule" A follow-up PR for #91364 (comment) [ghstack-poisoned]

qqaatw added a commit that referenced this pull request Apr 18, 2023

Update on "[functorch] Prevent using for-loop for out-of-place index_…

4c97544

…fill batch rule" A follow-up PR for #91364 (comment) cc zou3519 Chillee samdow soumith kshitij12345 janeyx99 [ghstack-poisoned]

qqaatw added a commit that referenced this pull request Apr 18, 2023

Update on "[functorch] Prevent using for-loop for out-of-place index_…

c155efb

…fill batch rule" A follow-up PR for #91364 (comment) cc zou3519 Chillee samdow soumith kshitij12345 janeyx99 [ghstack-poisoned]

qqaatw added a commit that referenced this pull request Apr 19, 2023

Update on "[functorch] Prevent using for-loop for out-of-place index_…

b77fce9

…fill batch rule" A follow-up PR for #91364 (comment) cc zou3519 Chillee samdow soumith kshitij12345 janeyx99 [ghstack-poisoned]

pytorchmergebot pushed a commit that referenced this pull request Apr 20, 2023

[functorch] Prevent using for-loop for out-of-place index_fill batch …

edd2507

…rule (#99229) A follow-up PR for #91364 (comment) Pull Request resolved: #99229 Approved by: https://github.com/kshitij12345

	elif fill:
	# A weird number to catch errors
	args.append(make_arg((1,)).item())

Add vmap support for torch.index_fill #91364

Add vmap support for torch.index_fill #91364

Conversation

qqaatw commented Dec 23, 2022

pytorch-bot bot commented Dec 23, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91364

❌ 2 Failures

zou3519 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qqaatw Dec 29, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qqaatw Dec 29, 2022 • edited

Choose a reason for hiding this comment

zou3519 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qqaatw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qqaatw commented Jan 9, 2023

zou3519 commented Jan 10, 2023

zou3519 left a comment

Choose a reason for hiding this comment

zou3519 Jan 18, 2023 • edited

Choose a reason for hiding this comment

qqaatw commented Jan 19, 2023

qqaatw commented Jan 19, 2023

pytorchmergebot commented Jan 19, 2023

Merge started

pytorchmergebot commented Jan 19, 2023

Merge failed

qqaatw commented Jan 20, 2023 • edited

qqaatw commented Jan 29, 2023

kshitij12345 commented Jan 29, 2023

pytorchmergebot commented Jan 29, 2023

pytorchmergebot commented Jan 29, 2023

kshitij12345 commented Jan 29, 2023

qqaatw commented Jan 29, 2023

kshitij12345 commented Jan 29, 2023

kshitij12345 commented Jan 29, 2023

pytorchmergebot commented Jan 29, 2023

Merge started

pytorchmergebot commented Jan 29, 2023

Merge failed

kshitij12345 commented Jan 30, 2023

pytorchmergebot commented Jan 30, 2023

Merge started

pytorch-bot bot commented Dec 23, 2022 •

edited

qqaatw Dec 29, 2022 •

edited

qqaatw Dec 29, 2022 •

edited

zou3519 Jan 18, 2023 •

edited

qqaatw commented Jan 20, 2023 •

edited