Give linear an explicit autograd formula always #162411

ezyang · 2025-09-08T20:45:25Z

Stack from ghstack (oldest at bottom):

-> Give linear an explicit autograd formula always #162411

Signed-off-by: Edward Z. Yang ezyang@meta.com

Signed-off-by: Edward Z. Yang <ezyang@meta.com> [ghstack-poisoned]

pytorch-bot · 2025-09-08T20:45:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162411

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 16 New Failures

As of commit 25a4078 with merge base 8171d60 ():

NEW FAILURES - The following jobs have failed:

Check Labels / Check labels (gh)
RuntimeError: Error checking labels: PR does not have required labels
pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu) (gh)
MISSING REGRESSION TEST
pull / linux-jammy-cuda12.8-cudnn9-py3.10-clang12 / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/native/Linear.cpp:899:1: error: no previous prototype for function 'linear_backward' [-Werror,-Wmissing-prototypes]
pull / linux-jammy-py3.10-clang12 / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/native/Linear.cpp:899:1: error: no previous prototype for function 'linear_backward' [-Werror,-Wmissing-prototypes]
pull / linux-jammy-py3.10-clang12-onnx / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/native/Linear.cpp:899:1: error: no previous prototype for function 'linear_backward' [-Werror,-Wmissing-prototypes]
pull / linux-jammy-py3.10-clang18-asan / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/native/Linear.cpp:899:1: error: no previous prototype for function 'linear_backward' [-Werror,-Wmissing-prototypes]
pull / linux-jammy-py3.10-gcc11 / test (backwards_compat, 1, 1, lf.linux.2xlarge) (gh)
test_modules_can_be_imported
pull / linux-jammy-py3.10-gcc11 / test (default, 1, 5, lf.linux.2xlarge) (gh)
inductor/test_cpu_repro.py::CPUReproTests::test_mkl_linear
pull / linux-jammy-py3.10-gcc11 / test (default, 2, 5, lf.linux.2xlarge) (gh)
inductor/test_mkldnn_pattern_matcher.py::TestDynamicPatternMatcher::test_qconv2d_maxpool2d_linear_dynamic_cpu
pull / linux-jammy-py3.10-gcc11 / test (default, 3, 5, lf.linux.2xlarge) (gh)
inductor/test_mkldnn_pattern_matcher.py::TestDynamicPatternMatcherGenericCPU::test_multi_linear_share_same_input_dynamic_cpu
pull / linux-jammy-py3.10-gcc11 / test (default, 4, 5, lf.linux.2xlarge) (gh)
inductor/test_cpu_repro.py::CPUReproTests::test_linear_with_reshape
pull / linux-jammy-py3.10-gcc11 / test (default, 5, 5, lf.linux.2xlarge) (gh)
test_ops_fwd_gradients.py::TestFwdGradientsCPU::test_forward_mode_AD_nn_functional_linear_cpu_complex128
pull / linux-jammy-py3.10-gcc11 / test (distributed, 1, 2, lf.linux.2xlarge) (gh)
distributed/tensor/test_dtensor_ops.py::TestDTensorOpsCPU::test_dtensor_op_db_nn_functional_linear_cpu_float32
pull / linux-jammy-py3.10-gcc11 / test (distributed, 2, 2, lf.linux.2xlarge) (gh)
distributed/tensor/test_api.py::DTensorAPITest::test_distribute_module_input_fn_output_fn_warning
pull / linux-jammy-py3.10-gcc11 / test (jit_legacy, 1, 1, lf.linux.2xlarge) (gh)
test_jit_legacy.py::TestAutodiffSubgraphSlicing::test_constructed_bias
pull / linux-jammy-py3.13-clang12 / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/native/Linear.cpp:899:1: error: no previous prototype for function 'linear_backward' [-Werror,-Wmissing-prototypes]

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: 2eeedb0 Pull Request resolved: #162411

github-actions · 2025-09-08T20:46:03Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

ezyang · 2025-09-08T20:48:55Z

There are still test failures though.

github-actions · 2025-09-08T20:50:19Z

Attention! native_functions.yaml was changed

If you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info.

Caused by:

aten/src/ATen/native/native_functions.yaml

tianyu-l · 2025-09-08T21:39:13Z

aten/src/ATen/native/Linear.cpp

+  // dL/dW = sum_batch ( (dL/dy)ᵀ @ x )
+  // Use einsum to contract over all leading dims without reshaping:
+  if (output_mask[1]) {
+    grad_weight = at::einsum("...o,...i->oi", {grad_output, self}); // [out, in]


wouldn't this go into decomposition, since linear_backward is CompositeImplicitAutograd now?

Yes. So another PR we have to do is make einsum not decompose, but THAT is likely to be a lot more controversial. Another reason why making views work is "better" (if you can shake it)

tianyu-l · 2025-09-08T21:40:41Z

tools/autograd/derivatives.yaml


 - name: linear(Tensor input, Tensor weight, Tensor? bias=None) -> Tensor
  input, weight, bias: "grad.defined() ? linear_backward(input, grad, weight, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+  result: auto_linear


curious what does this line do?

It's for forward mode AD; it says that this function is linear and thus is forward ad formula is trivial (do the same function)

albanD

Sounds ok as long as there is no perf hit, this is a pretty hot path.

albanD · 2025-09-11T21:11:56Z

aten/src/ATen/functorch/BatchRulesLinearAlgebra.cpp

+  auto self_ = moveBatchDimToFront(self, self_bdim);
+  auto weight_ = moveBatchDimToFront(weight, weight_bdim);
+  auto bias_ = bias.has_value() ? std::make_optional<Tensor>(moveBatchDimToFront(*bias, bias_bdim)) : std::nullopt;
+  return std::make_tuple( at::linear(self_, weight_, bias_), 0 );


ho linear supports arbitrary batch dimensions on the weights?
Your backward formula seems to say no? :D

albanD · 2025-09-11T21:17:21Z

aten/src/ATen/native/Linear.cpp

+  // dL/dW = sum_batch ( (dL/dy)ᵀ @ x )
+  // Use einsum to contract over all leading dims without reshaping:
+  if (output_mask[1]) {
+    grad_weight = at::einsum("...o,...i->oi", {grad_output, self}); // [out, in]


What is the perf hit of this for a regular nn.Linear() layer?

This is potentially actually pretty bad lol. And it doesn't even do what I want, because I want the einsum to also show up as its own operator LOL.

albanD · 2025-09-12T15:10:56Z

Also I expect a lot more changes in the PT2 compilation stack to handle the new op and remove special casing for linear decomp.

Give linear an explicit autograd formula always

25a4078

Signed-off-by: Edward Z. Yang <ezyang@meta.com> [ghstack-poisoned]

ezyang added a commit that referenced this pull request Sep 8, 2025

Give linear an explicit autograd formula always

e7fc429

Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: 2eeedb0 Pull Request resolved: #162411

github-actions bot requested review from SherlockNoMad, albanD, antoniojkim, bdhirsh and miladm September 8, 2025 20:45

ezyang added the keep-going Don't stop on first failure, keep running tests until the end label Sep 8, 2025

ezyang requested a review from tianyu-l September 8, 2025 20:48

tianyu-l reviewed Sep 8, 2025

View reviewed changes

albanD reviewed Sep 11, 2025

View reviewed changes

Give linear an explicit autograd formula always #162411

Are you sure you want to change the base?

Give linear an explicit autograd formula always #162411

Uh oh!

Conversation

ezyang commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162411

❌ 16 New Failures

Uh oh!

github-actions bot commented Sep 8, 2025

This PR needs a release notes: label

Uh oh!

ezyang commented Sep 8, 2025

Uh oh!

github-actions bot commented Sep 8, 2025

Attention! native_functions.yaml was changed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albanD commented Sep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ezyang commented Sep 8, 2025 •

edited

Loading

pytorch-bot bot commented Sep 8, 2025 •

edited

Loading

This PR needs a `release notes:` label