Fix autograd issue with identity conversions #92022

alexsamardzic · 2023-01-11T20:12:06Z

Stack from ghstack (oldest at bottom):

-> Fix autograd issue with identity conversions #92022

cc @nikitaved @pearu @cpuhrsch @amjames @bhosmer @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78

[ghstack-poisoned]

pytorch-bot · 2023-01-11T20:12:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92022

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e5505c0:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 625186d Pull Request resolved: #92022

alexsamardzic · 2023-01-11T20:16:55Z

This fixes an issue with autograd, for case when the target layout is the same as the original layout. Namely, corresponding conversion methods were doing return self that would result in RuntimeError: leaf variable has been moved into the graph interior for backward operation. Here is a snippet to reproduce the issue:

csr = torch.sparse_csr_tensor((0, 1, 2), (0, 1), (1, 1), dtype=torch.float32, requires_grad=True)
csr2 = csr.to_sparse(layout=torch.sparse_csr)
x = torch.ones((2, 1), dtype=torch.float32)
y = torch.matmul(csr2, x)
z = torch.sum(y)
z.backward()
print(csr.grad)

ghstack-source-id: 625186d Pull Request resolved: #92022

cc nikitaved pearu cpuhrsch amjames bhosmer [ghstack-poisoned]

ghstack-source-id: 841e51f Pull Request resolved: #92022

cpuhrsch · 2023-01-12T17:11:41Z

The additional clone will introduce a memory copy. Maybe using alias or such can help. @albanD - do you have time to take a look?

cc nikitaved pearu cpuhrsch amjames bhosmer [ghstack-poisoned]

pearu · 2023-01-12T21:32:38Z

I have always wondered why sparse compressed conversions return a copy while sparse or strided conversions return self when layouts do not change. For instance:

>>> a=torch.tensor([[1, 2], [3, 4]])
>>> a.to_dense() is a
True

that makes me assume that csr->csr, csc->csc, ... conversions should also be able to return self (provided that all attributes such as dense_dim, layout, dtype, and device, are unchanged).

Is there a reason for this discrepancy (returning self or self.clone()) between strided and sparse layouts (when tensor attributes are unchanged)?

nikitaved · 2023-01-12T21:39:09Z

I think it used to break autograd, but that might have changed, cc @albanD .

albanD · 2023-01-12T22:14:37Z

At a high level the idea here is that autograd elementary ops must respect the aliasing from their schema.
And op is an autograd elementary op iif its key is NOT CompositeImplicitAutograd.

btw this is why ops like reshape must remain CompositeImplicitAutograd. Because they sometimes return a view and sometimes not.

I have always wondered why sparse compressed conversions return a copy while sparse or strided conversions return self when layouts do not change.

t.to_dense() is CompositeImplicitAutograd so you can do either. Returning self here allows you to be faster so we do it.

Any op that is CompositeExplicitAutograd or has special CPU/Sparse dispatch (and thus an autograd formula) cannot do that and must either always be a view or never.

The special note here is that sparse layouts break that rule today for view ops where they return non-views. So when autograd is involved you can get arbitrarily wrong results indeed!

pearu · 2023-01-13T11:38:27Z

Notice that the following works:

csr = torch.sparse_csr_tensor((0, 1, 2), (0, 1), (1, 1), dtype=torch.float32, requires_grad=True)
csr2 = csr.to_sparse(layout=torch.sparse_csr).detach().requires_grad_(True)
x = torch.ones((2, 1), dtype=torch.float32)
y = torch.matmul(csr2, x)
z = torch.sum(y)
z.backward()
print(csr2.grad)   # UPDATED

What is the autograd interpretation of using .detach().requires_grad_(True) instead of .clone() in .to_sparse()? Would this be a meaningful fix to the autograd issue that this PR targets?

nikitaved · 2023-01-13T11:44:44Z

csr = torch.sparse_csr_tensor((0, 1, 2), (0, 1), (1, 1), dtype=torch.float32, requires_grad=True)
csr2 = csr.to_sparse(layout=torch.sparse_csr).detach().requires_grad_(True)
x = torch.ones((2, 1), dtype=torch.float32)
y = torch.matmul(csr2, x)
z = torch.sum(y)
z.backward()
print(csr.grad)

It does not seem to work for me, prints None.

pearu · 2023-01-13T11:54:08Z

It does not seem to work for me, prints None.

Yes, returning None from csr.grad makes sense. I have updated the example, the print statement should read print(csr2.grad).

This actually answers my question in a way: replacing clone with detach() is not meaningful for autograd.

To keep to_sparse efficient for non-autograd context, an option is to introduce copy=True to to_sparse method. Does this makes sense?

alexsamardzic · 2023-01-13T12:03:00Z

So this PR is basically replacing return self; with return self.clone(); on couple places in order to have the gradient calculations correct. How about following instead then?

return self.requires_grad() ? self.clone() : self;

cc nikitaved pearu cpuhrsch amjames bhosmer [ghstack-poisoned]

aten/src/ATen/native/TensorConversions.cpp

cc nikitaved pearu cpuhrsch amjames bhosmer [ghstack-poisoned]

amjames

Tests are green and this looks good from my end. Pending @pearu indicating his concerns have been addressed we can land.

Also regarding the issue of the sparse scalar concept that @pearu pointed out I would say it is out of scope for this work and can be discussed elsewhere at a time when we have a motivation for such a feature.

pearu

I have a number of nits and questions. Overall, looks good to me. Thanks, @alexsamardzic!

aten/src/ATen/native/TensorConversions.cpp

pearu · 2023-06-20T08:17:25Z

aten/src/ATen/native/TensorConversions.cpp

+  auto layout_to = layout.value_or(kSparse);
+  if (self.layout() == layout_to) {
+    _to_sparse_check_arguments("to_sparse", self, layout, blocksize, dense_dim_opt);
+    return self;


Btw, here and below, will need return self.alias() but not in this PR as it is handled by #103810

Great! I should remember to make the change when this PR of yours landed.

pearu · 2023-06-20T08:21:21Z

test/forward_backward_compatibility/check_forward_backward_compatibility.py

+    ("aten::to_sparse.out", datetime.date(2023, 6, 30)),
+    ("aten::to_sparse.sparse_dim_out", datetime.date(2023, 6, 30)),
+    ("aten::to_sparse_bsc.out", datetime.date(2023, 6, 30)),
+    ("aten::to_sparse_bsr.out", datetime.date(2023, 6, 30)),
+    ("aten::to_sparse_csc.out", datetime.date(2023, 6, 30)),
+    ("aten::to_sparse_csr.out", datetime.date(2023, 6, 30)),


Should these dates be updated, say, increment by half a year?

One more comment here: I've modeled this PR according to the to_dense()/_to_dense() separation. Here, in native_functions.yaml file to_dense is written so that to_dense.out is not generated, and this is the reason I've removed .out versions for to_sparse*() methods. But it could be a backward incompatible change in case someone used this. Please advise if it would be preferable to keep these.

ghstack-source-id: 0c03d92 Pull Request resolved: #92022

cc nikitaved pearu cpuhrsch amjames bhosmer voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 [ghstack-poisoned]

test/forward_backward_compatibility/check_forward_backward_compatibility.py

pearu · 2023-06-20T20:21:20Z

@alexsamardzic any idea why CI tests fail? It looks like to_sparse aten implementation is never called...

cc nikitaved pearu cpuhrsch amjames bhosmer voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 [ghstack-poisoned]

ghstack-source-id: 1271184 Pull Request resolved: #92022

alexsamardzic · 2023-06-20T20:42:11Z

@alexsamardzic any idea why CI tests fail? It looks like to_sparse aten implementation is never called...

Typo 😳

pearu

LGTM. Thanks, @alexsamardzic!

alexsamardzic · 2023-06-21T21:21:05Z

@pytorchbot merge

pytorchmergebot · 2023-06-21T21:22:58Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fix autograd issue with identity conversions

d722ff0

[ghstack-poisoned]

alexsamardzic mentioned this pull request Jan 11, 2023

Add missing conversion for to_sparse.sparse_dim #92006

Closed

alexsamardzic added a commit that referenced this pull request Jan 11, 2023

Fix autograd issue with identity conversions

00c69c3

ghstack-source-id: 625186d Pull Request resolved: #92022

alexsamardzic added the topic: not user facing topic category label Jan 11, 2023

pytorchbot added the open source label Jan 11, 2023

alexsamardzic requested a review from amjames January 12, 2023 10:13

alexsamardzic added the module: sparse Related to torch.sparse label Jan 12, 2023

alexsamardzic added a commit that referenced this pull request Jan 12, 2023

Fix autograd issue with identity conversions

c79619e

ghstack-source-id: 625186d Pull Request resolved: #92022

Update on "Fix autograd issue with identity conversions"

192d49c

cc nikitaved pearu cpuhrsch amjames bhosmer [ghstack-poisoned]

alexsamardzic added a commit that referenced this pull request Jan 12, 2023

Fix autograd issue with identity conversions

0f0f5ee

ghstack-source-id: 841e51f Pull Request resolved: #92022

Update on "Fix autograd issue with identity conversions"

c18226d

cc nikitaved pearu cpuhrsch amjames bhosmer [ghstack-poisoned]

Update on "Fix autograd issue with identity conversions"

00faeae

cc nikitaved pearu cpuhrsch amjames bhosmer [ghstack-poisoned]

Update on "Fix autograd issue with identity conversions"

7d06717

cc nikitaved pearu cpuhrsch amjames bhosmer [ghstack-poisoned]

nikitaved reviewed Jan 13, 2023

View reviewed changes

aten/src/ATen/native/TensorConversions.cpp Outdated Show resolved Hide resolved

Update on "Fix autograd issue with identity conversions"

fc8eb11

cc nikitaved pearu cpuhrsch amjames bhosmer [ghstack-poisoned]

nikitaved added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 13, 2023

amjames approved these changes Jun 19, 2023

View reviewed changes

pearu requested changes Jun 20, 2023

View reviewed changes

alexsamardzic added a commit that referenced this pull request Jun 20, 2023

Fix autograd issue with identity conversions

27ca9f7

ghstack-source-id: 0c03d92 Pull Request resolved: #92022

Update on "Fix autograd issue with identity conversions"

c7c5ae6

cc nikitaved pearu cpuhrsch amjames bhosmer voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 [ghstack-poisoned]

Update on "Fix autograd issue with identity conversions"

666bc53

cc nikitaved pearu cpuhrsch amjames bhosmer voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 [ghstack-poisoned]

Update on "Fix autograd issue with identity conversions"

80fa09b

cc nikitaved pearu cpuhrsch amjames bhosmer voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 [ghstack-poisoned]

pearu reviewed Jun 20, 2023

View reviewed changes

test/forward_backward_compatibility/check_forward_backward_compatibility.py Outdated Show resolved Hide resolved

Update on "Fix autograd issue with identity conversions"

e5505c0

cc nikitaved pearu cpuhrsch amjames bhosmer voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 [ghstack-poisoned]

alexsamardzic added a commit that referenced this pull request Jun 20, 2023

Fix autograd issue with identity conversions

7f82188

ghstack-source-id: 1271184 Pull Request resolved: #92022

alexsamardzic mentioned this pull request Jun 21, 2023

Conversion from CSR to BSR without blocksize leads to misleading exception #103877

Closed

cpuhrsch added topic: bug fixes topic category and removed topic: not user facing topic category labels Jun 21, 2023

pytorch-bot bot added the release notes: sparse release notes category label Jun 21, 2023

pearu approved these changes Jun 21, 2023

View reviewed changes

cpuhrsch approved these changes Jun 21, 2023

View reviewed changes

pytorchmergebot added the merging label Jun 21, 2023

pytorchmergebot added Merged and removed merging labels Jun 21, 2023

pytorchmergebot closed this in 09fdea8 Jun 21, 2023

pearu mentioned this pull request Jun 24, 2023

Failing debug assert: RuntimeError: result.storage().use_count() == 1 #99655

Closed

facebook-github-bot deleted the gh/alexsamardzic/7/head branch June 25, 2023 14:16

alexsamardzic mentioned this pull request Jun 26, 2023

Add alias support for sparse tensors. #103810

Closed

Fix autograd issue with identity conversions #92022

Fix autograd issue with identity conversions #92022

Uh oh!

Conversation

alexsamardzic commented Jan 11, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92022

✅ No Failures

Uh oh!

alexsamardzic commented Jan 11, 2023

Uh oh!

cpuhrsch commented Jan 12, 2023

Uh oh!

pearu commented Jan 12, 2023

Uh oh!

nikitaved commented Jan 12, 2023

Uh oh!

albanD commented Jan 12, 2023

Uh oh!

pearu commented Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikitaved commented Jan 13, 2023

Uh oh!

pearu commented Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexsamardzic commented Jan 13, 2023

Uh oh!

Uh oh!

amjames left a comment

Choose a reason for hiding this comment

Uh oh!

pearu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pearu Jun 20, 2023

Choose a reason for hiding this comment

Uh oh!

alexsamardzic Jun 20, 2023

Choose a reason for hiding this comment

Uh oh!

pearu Jun 20, 2023

Choose a reason for hiding this comment

Uh oh!

alexsamardzic Jun 20, 2023

Choose a reason for hiding this comment

Uh oh!

alexsamardzic Jun 20, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pearu commented Jun 20, 2023

Uh oh!

alexsamardzic commented Jun 20, 2023

Uh oh!

pearu left a comment

Choose a reason for hiding this comment

Uh oh!

alexsamardzic commented Jun 21, 2023

Uh oh!

pytorchmergebot commented Jun 21, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

alexsamardzic commented Jan 11, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jan 11, 2023 •

edited

Loading

pearu commented Jan 13, 2023 •

edited

Loading

pearu commented Jan 13, 2023 •

edited

Loading