[mta] Backward of unary foreach functions #89591

crcrpar · 2022-11-23T21:11:33Z

as per title, this PR defines backward of those.

This doesn't implement forward-mode automatic differentiation as the current codegen doesn't seem to handle ArrayRef<Tensor>.

Rel:

cc @mcarilli @ngimel

pytorch-bot · 2022-11-23T21:11:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89591

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit 01d6f9e:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

crcrpar · 2022-11-30T00:38:34Z

regarding inplace ops, I locally confirmed the version seems to be bumped appropriately

In [8]: tensors = [torch.randn(3).requires_grad_() for _ in range(2)]
In [9]: print([t._version for t in tensors])
[0, 0]
In [10]: torch._foreach_exp_(tensors)
In [11]: print([t._version for t in tensors])
[1, 1]

ngimel · 2022-12-05T21:31:22Z

So the backward of foreach will always fall back to slow version, and codegen would make sure that we properly error out if versions don't match? That looks fine, but I'll let @albanD take a look.

albanD · 2022-12-28T12:57:03Z

regarding inplace ops, I locally confirmed the version seems to be bumped appropriately

I think this is only because you're on CPU: the slow code calls the single-Tensor op which properly bumps the version counter.
Running the same thing on CUDA won't work I think.

albanD

This is a lot of code duplication.
Should we just have the codegen take care of it? The same way we re-use the foo formula for foo_ when no explicit formula for foo_ is provided. We could codegen a slow _foreach_foo formula when it is not provided. WDYT?

torch/_torch_docs.py

albanD · 2022-12-28T13:04:32Z

torch/_torch_docs.py

+    Only CPU and CUDA are supported. Forward-mode AD is not supported.
+
+Args:
+    self (list of Tensors): Input list of Tensors. Each Tensor can have an arbitrary shape, dtype, device, and strides.


It works for difference dtype and device? Why do we collect grads to zero out by device/dtype here then:

pytorch/torch/optim/optimizer.py

Line 394 in 274d3b2

per_device_and_dtype_grads = defaultdict(lambda: defaultdict(list))

?

To reduce the number of CUDA kernels launched as possible, I guess.

I'm not sure to follow? How does the two have different number of kernel launched?

If we pass a list of tensors of different dtypes and/or devices to a foreach function, it'll call the corresponding aten native function the number of Tensors times, i.e. len(self) cuda kernels. With the grouping, it could reduce the number to len(per_device_and_dtype_grads), assuming all the tensors are a CUDA tensor

@albanD _for_each ops support args on the different device and of the different dtypes, but they'll fall back to slow implementation. If, instead of just calling for_each we pre-sort the args, we can call for_each with a subset of tensors that'll go to the fast path. Really this sorting should be done in for_each itself though, and not in the optimizer

Ok!
Should we really advertise this then as it will not do what you expect these functions to do.

torch/_torch_docs.py

albanD · 2023-01-03T10:17:43Z

docs/source/torch.rst

+~~~~~~~~~~~~~~~~~~
+
+.. warning::
+    This API is in beta and subject to future changes.


nit: beta or prototype?

tools/autograd/derivatives.yaml

crcrpar · 2023-01-16T14:18:55Z

Added a codegen and a version bump

albanD

Thanks for the update!
All the codegen part looks good! Only questions about the InplaceOrView key side.

aten/src/ATen/native/cuda/ForeachFunctors.cuh

albanD · 2023-01-18T18:32:04Z

aten/src/ATen/native/cuda/ForeachUnaryOp.cu

@@ -73,6 +73,7 @@ template <typename scalar_t, template<class> class Op> void foreach_unary_op_(Te
                                         /* r_args_depth */ 1,
                                         /* res_arg_index */ 0>(),
                          Op<opmath_t>());
+    maybe_increment_version(tensors);


Do I misremember our discussion where you showed me an example where the version was bumped properly? Why does this need to be added?

That happened when inputs tensors are a CPU tensor not when fast path is chosen.
Even the functions registered to native_functions.yaml with CUDA key could go into the slow path which just calls aten native function. Therefore I decided to put this manual version bump here

Ho that's right!
I'm still not sure why the InplaceOrView kernel doesn't get generated automatically to do that already? But we can look into that later.

I thought with

pytorch/torchgen/model.py

Lines 1388 to 1405 in b33d9e2

if self.name.name.inplace:

self_a = self.arguments.self_arg

assert (

self_a

and self_a.argument.annotation

and self_a.argument.annotation.is_write

)

if self_a.argument.type == BaseType(BaseTy.Tensor):

# All inplace ops with an ordinary `Tensor self` argument should return self,

# to allow for method chaining.

assert (

len(self.returns) == 1

and self.returns[0].annotation == self_a.argument.annotation

)

else:

# You can't method chain on non-tensor self arguments though (like a List[Tensor])

# so in all other cases we expect the return type to be none.

assert len(self.returns) == 0

, inplace functions that modify (and return) multiple Tensors don't get caught by

pytorch/tools/autograd/gen_inplace_or_view_type.py

Lines 481 to 482 in 9cf8434

for r in cpp.return_names(f):

inplace_view_body.append(f"increment_version({r});")

following pytorch#89591 (comment) Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

aten/src/ATen/native/cuda/ForeachFunctors.cuh

albanD

Looks good!
Sorry for all the back and forth, this is definitely a challenging one...

crcrpar · 2023-01-20T04:03:34Z

@pytorchbot merge

(knowing this won’t work this time but want to trigger more jobs)

pytorchmergebot · 2023-01-20T04:05:20Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-01-20T04:05:24Z

Merge failed

Reason: 2 mandatory check(s) failed (Rule superuser). The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

following pytorch#89591 (comment) Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

albanD · 2023-01-21T02:28:58Z

You have a @pytorchbot label if you need to add label (like the ciflow/trunk one) but you are not allowed.

with a new CodeTemplate of DERIVATIVE_SINGLE_FOREACH Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

Pushing the heavy lifting to torchgen Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

following pytorch#89591 (comment) Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

crcrpar · 2023-01-23T08:26:22Z

@pytorchbot merge

pytorchmergebot · 2023-01-23T08:28:00Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…possible) (#93901) ## summary - increment tensor versions in inplace foreach functions - add a logic to take care of `ArrayRef<Scalar>` rel: #58833, #89591 Pull Request resolved: #93901 Approved by: https://github.com/albanD

…possible) (pytorch#93901) ## summary - increment tensor versions in inplace foreach functions - add a logic to take care of `ArrayRef<Scalar>` rel: pytorch#58833, pytorch#89591 Pull Request resolved: pytorch#93901 Approved by: https://github.com/albanD

pytorch-bot bot added the release notes: foreach_frontend release notes category label Nov 23, 2022

pytorchbot added the open source label Nov 23, 2022

crcrpar marked this pull request as ready for review November 30, 2022 00:37

crcrpar requested review from mruberry, ngimel, albanD and soulitzer as code owners November 30, 2022 00:37

drisspg added module: mta Issues related to multi-tensor apply kernels and foreach functions triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Dec 1, 2022

albanD reviewed Dec 28, 2022

View reviewed changes

crcrpar force-pushed the foreach/unary-bwd branch from 63ee0dc to 4c132ba Compare January 1, 2023 03:13

albanD reviewed Jan 3, 2023

View reviewed changes

crcrpar force-pushed the foreach/unary-bwd branch 3 times, most recently from 246cad6 to a0df3ef Compare January 16, 2023 09:03

crcrpar requested review from albanD and removed request for soulitzer, ngimel and mruberry January 18, 2023 00:23

albanD reviewed Jan 18, 2023

View reviewed changes

crcrpar added a commit to crcrpar/pytorch that referenced this pull request Jan 19, 2023

bump version if GradMode::is_enabled

308acab

following pytorch#89591 (comment) Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

albanD reviewed Jan 19, 2023

View reviewed changes

aten/src/ATen/native/cuda/ForeachFunctors.cuh Outdated Show resolved Hide resolved

albanD approved these changes Jan 20, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 20, 2023

crcrpar force-pushed the foreach/unary-bwd branch from 2f20cef to 0aac180 Compare January 20, 2023 06:56

crcrpar added a commit to crcrpar/pytorch that referenced this pull request Jan 20, 2023

bump version if GradMode::is_enabled

72120ce

following pytorch#89591 (comment) Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

crcrpar added a commit to crcrpar/pytorch that referenced this pull request Jan 20, 2023

bump version if GradMode::is_enabled

be6cfac

following pytorch#89591 (comment) Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

crcrpar force-pushed the foreach/unary-bwd branch 2 times, most recently from b0c89f2 to 42ee873 Compare January 21, 2023 02:25

crcrpar added a commit to crcrpar/pytorch that referenced this pull request Jan 21, 2023

bump version if GradMode::is_enabled

424bcb3

following pytorch#89591 (comment) Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

crcrpar added 9 commits January 22, 2023 18:14

Define unary foreach functions' backward

46bfc6a

with a new CodeTemplate of DERIVATIVE_SINGLE_FOREACH Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

add test

cb69cca

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

docs

ce15fa6

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

clean up docs gen

7eb8aee

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

Codegen unary foreach functions' backward

7a743e6

Pushing the heavy lifting to torchgen Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

manually bump version of TensorList self

aada7be

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

bump version if GradMode::is_enabled

aaf44b8

following pytorch#89591 (comment) Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

always bump version counter

71a8e66

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

cosmetic

01d6f9e

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

crcrpar force-pushed the foreach/unary-bwd branch from 42ee873 to 01d6f9e Compare January 23, 2023 04:40

pytorchmergebot added the Merged label Jan 23, 2023

pytorchmergebot closed this in 3087622 Jan 23, 2023

This was referenced Feb 2, 2023

[foreach] bump tensor's version and define backward via torchgen (as possible) #93901

Closed

[RFC] grouping tensors in C++ for fused adam(w) #94344

Closed

This was referenced Feb 23, 2023

Foreach Functions Tracking Issue #58833

Open

foreach derivatives saves redundant tensors #95431

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mta] Backward of unary foreach functions #89591

[mta] Backward of unary foreach functions #89591

crcrpar commented Nov 23, 2022 •

edited by pytorch-bot bot

pytorch-bot bot commented Nov 23, 2022 •

edited

crcrpar commented Nov 30, 2022

ngimel commented Dec 5, 2022

albanD commented Dec 28, 2022

albanD left a comment

albanD Dec 28, 2022

crcrpar Jan 1, 2023

albanD Jan 2, 2023

crcrpar Jan 3, 2023

ngimel Jan 3, 2023

albanD Jan 3, 2023

albanD Jan 3, 2023

crcrpar commented Jan 16, 2023

albanD left a comment

albanD Jan 18, 2023

crcrpar Jan 18, 2023

albanD Jan 19, 2023

crcrpar Jan 19, 2023

albanD left a comment

crcrpar commented Jan 20, 2023

pytorchmergebot commented Jan 20, 2023

pytorchmergebot commented Jan 20, 2023

albanD commented Jan 21, 2023

crcrpar commented Jan 23, 2023

pytorchmergebot commented Jan 23, 2023

	if self.name.name.inplace:
	self_a = self.arguments.self_arg
	assert (
	self_a
	and self_a.argument.annotation
	and self_a.argument.annotation.is_write
	)
	if self_a.argument.type == BaseType(BaseTy.Tensor):
	# All inplace ops with an ordinary `Tensor self` argument should return self,
	# to allow for method chaining.
	assert (
	len(self.returns) == 1
	and self.returns[0].annotation == self_a.argument.annotation
	)
	else:
	# You can't method chain on non-tensor self arguments though (like a List[Tensor])
	# so in all other cases we expect the return type to be none.
	assert len(self.returns) == 0

	for r in cpp.return_names(f):
	inplace_view_body.append(f"increment_version({r});")

[mta] Backward of unary foreach functions #89591

[mta] Backward of unary foreach functions #89591

Conversation

crcrpar commented Nov 23, 2022 • edited by pytorch-bot bot

pytorch-bot bot commented Nov 23, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89591

⏳ No Failures, 1 Pending

crcrpar commented Nov 30, 2022

ngimel commented Dec 5, 2022

albanD commented Dec 28, 2022

albanD left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crcrpar commented Jan 16, 2023

albanD left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

albanD left a comment

Choose a reason for hiding this comment

crcrpar commented Jan 20, 2023

pytorchmergebot commented Jan 20, 2023

Merge started

pytorchmergebot commented Jan 20, 2023

Merge failed

albanD commented Jan 21, 2023

crcrpar commented Jan 23, 2023

pytorchmergebot commented Jan 23, 2023

Merge started

crcrpar commented Nov 23, 2022 •

edited by pytorch-bot bot

pytorch-bot bot commented Nov 23, 2022 •

edited