Skip to content

Conversation

alexsamardzic
Copy link
Collaborator

@alexsamardzic alexsamardzic commented Jan 11, 2023

@pytorch-bot
Copy link

pytorch-bot bot commented Jan 11, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92022

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e5505c0:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

alexsamardzic added a commit that referenced this pull request Jan 11, 2023
ghstack-source-id: 625186d
Pull Request resolved: #92022
@alexsamardzic alexsamardzic added the topic: not user facing topic category label Jan 11, 2023
@alexsamardzic
Copy link
Collaborator Author

This fixes an issue with autograd, for case when the target layout is the same as the original layout. Namely, corresponding conversion methods were doing return self that would result in RuntimeError: leaf variable has been moved into the graph interior for backward operation. Here is a snippet to reproduce the issue:

csr = torch.sparse_csr_tensor((0, 1, 2), (0, 1), (1, 1), dtype=torch.float32, requires_grad=True)
csr2 = csr.to_sparse(layout=torch.sparse_csr)
x = torch.ones((2, 1), dtype=torch.float32)
y = torch.matmul(csr2, x)
z = torch.sum(y)
z.backward()
print(csr.grad)

@alexsamardzic alexsamardzic requested a review from amjames January 12, 2023 10:13
@alexsamardzic alexsamardzic added the module: sparse Related to torch.sparse label Jan 12, 2023
alexsamardzic added a commit that referenced this pull request Jan 12, 2023
ghstack-source-id: 625186d
Pull Request resolved: #92022
cc nikitaved pearu cpuhrsch amjames bhosmer

[ghstack-poisoned]
alexsamardzic added a commit that referenced this pull request Jan 12, 2023
ghstack-source-id: 841e51f
Pull Request resolved: #92022
@cpuhrsch
Copy link
Contributor

The additional clone will introduce a memory copy. Maybe using alias or such can help. @albanD - do you have time to take a look?

cc nikitaved pearu cpuhrsch amjames bhosmer

[ghstack-poisoned]
@pearu
Copy link
Collaborator

pearu commented Jan 12, 2023

I have always wondered why sparse compressed conversions return a copy while sparse or strided conversions return self when layouts do not change. For instance:

>>> a=torch.tensor([[1, 2], [3, 4]])
>>> a.to_dense() is a
True

that makes me assume that csr->csr, csc->csc, ... conversions should also be able to return self (provided that all attributes such as dense_dim, layout, dtype, and device, are unchanged).

Is there a reason for this discrepancy (returning self or self.clone()) between strided and sparse layouts (when tensor attributes are unchanged)?

@nikitaved
Copy link
Collaborator

I think it used to break autograd, but that might have changed, cc @albanD .

@albanD
Copy link
Collaborator

albanD commented Jan 12, 2023

At a high level the idea here is that autograd elementary ops must respect the aliasing from their schema.
And op is an autograd elementary op iif its key is NOT CompositeImplicitAutograd.

btw this is why ops like reshape must remain CompositeImplicitAutograd. Because they sometimes return a view and sometimes not.

I have always wondered why sparse compressed conversions return a copy while sparse or strided conversions return self when layouts do not change.

t.to_dense() is CompositeImplicitAutograd so you can do either. Returning self here allows you to be faster so we do it.

Any op that is CompositeExplicitAutograd or has special CPU/Sparse dispatch (and thus an autograd formula) cannot do that and must either always be a view or never.

The special note here is that sparse layouts break that rule today for view ops where they return non-views. So when autograd is involved you can get arbitrarily wrong results indeed!

@pearu
Copy link
Collaborator

pearu commented Jan 13, 2023

Notice that the following works:

csr = torch.sparse_csr_tensor((0, 1, 2), (0, 1), (1, 1), dtype=torch.float32, requires_grad=True)
csr2 = csr.to_sparse(layout=torch.sparse_csr).detach().requires_grad_(True)
x = torch.ones((2, 1), dtype=torch.float32)
y = torch.matmul(csr2, x)
z = torch.sum(y)
z.backward()
print(csr2.grad)   # UPDATED

What is the autograd interpretation of using .detach().requires_grad_(True) instead of .clone() in .to_sparse()? Would this be a meaningful fix to the autograd issue that this PR targets?

@nikitaved
Copy link
Collaborator

csr = torch.sparse_csr_tensor((0, 1, 2), (0, 1), (1, 1), dtype=torch.float32, requires_grad=True)
csr2 = csr.to_sparse(layout=torch.sparse_csr).detach().requires_grad_(True)
x = torch.ones((2, 1), dtype=torch.float32)
y = torch.matmul(csr2, x)
z = torch.sum(y)
z.backward()
print(csr.grad)

It does not seem to work for me, prints None.

@pearu
Copy link
Collaborator

pearu commented Jan 13, 2023

It does not seem to work for me, prints None.

Yes, returning None from csr.grad makes sense. I have updated the example, the print statement should read print(csr2.grad).

This actually answers my question in a way: replacing clone with detach() is not meaningful for autograd.

To keep to_sparse efficient for non-autograd context, an option is to introduce copy=True to to_sparse method. Does this makes sense?

@alexsamardzic
Copy link
Collaborator Author

So this PR is basically replacing return self; with return self.clone(); on couple places in order to have the gradient calculations correct. How about following instead then?

return self.requires_grad() ? self.clone() : self;

cc nikitaved pearu cpuhrsch amjames bhosmer

[ghstack-poisoned]
cc nikitaved pearu cpuhrsch amjames bhosmer

[ghstack-poisoned]
cc nikitaved pearu cpuhrsch amjames bhosmer

[ghstack-poisoned]
@nikitaved nikitaved added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 13, 2023
Copy link
Collaborator

@amjames amjames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are green and this looks good from my end. Pending @pearu indicating his concerns have been addressed we can land.

Also regarding the issue of the sparse scalar concept that @pearu pointed out I would say it is out of scope for this work and can be discussed elsewhere at a time when we have a motivation for such a feature.

Copy link
Collaborator

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a number of nits and questions. Overall, looks good to me. Thanks, @alexsamardzic!

auto layout_to = layout.value_or(kSparse);
if (self.layout() == layout_to) {
_to_sparse_check_arguments("to_sparse", self, layout, blocksize, dense_dim_opt);
return self;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, here and below, will need return self.alias() but not in this PR as it is handled by #103810

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I should remember to make the change when this PR of yours landed.

Comment on lines 305 to 310
("aten::to_sparse.out", datetime.date(2023, 6, 30)),
("aten::to_sparse.sparse_dim_out", datetime.date(2023, 6, 30)),
("aten::to_sparse_bsc.out", datetime.date(2023, 6, 30)),
("aten::to_sparse_bsr.out", datetime.date(2023, 6, 30)),
("aten::to_sparse_csc.out", datetime.date(2023, 6, 30)),
("aten::to_sparse_csr.out", datetime.date(2023, 6, 30)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these dates be updated, say, increment by half a year?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment here: I've modeled this PR according to the to_dense()/_to_dense() separation. Here, in native_functions.yaml file to_dense is written so that to_dense.out is not generated, and this is the reason I've removed .out versions for to_sparse*() methods. But it could be a backward incompatible change in case someone used this. Please advise if it would be preferable to keep these.

alexsamardzic added a commit that referenced this pull request Jun 20, 2023
ghstack-source-id: 0c03d92
Pull Request resolved: #92022
cc nikitaved pearu cpuhrsch amjames bhosmer voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225

[ghstack-poisoned]
cc nikitaved pearu cpuhrsch amjames bhosmer voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225

[ghstack-poisoned]
cc nikitaved pearu cpuhrsch amjames bhosmer voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225

[ghstack-poisoned]
@pearu
Copy link
Collaborator

pearu commented Jun 20, 2023

@alexsamardzic any idea why CI tests fail? It looks like to_sparse aten implementation is never called...

cc nikitaved pearu cpuhrsch amjames bhosmer voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225

[ghstack-poisoned]
alexsamardzic added a commit that referenced this pull request Jun 20, 2023
ghstack-source-id: 1271184
Pull Request resolved: #92022
@alexsamardzic
Copy link
Collaborator Author

@alexsamardzic any idea why CI tests fail? It looks like to_sparse aten implementation is never called...

Typo 😳

Copy link
Collaborator

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, @alexsamardzic!

@alexsamardzic
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor module: sparse Related to torch.sparse open source release notes: sparse release notes category topic: bug fixes topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants