-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Fix autograd issue with identity conversions #92022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92022
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit e5505c0: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This fixes an issue with autograd, for case when the target layout is the same as the original layout. Namely, corresponding conversion methods were doing csr = torch.sparse_csr_tensor((0, 1, 2), (0, 1), (1, 1), dtype=torch.float32, requires_grad=True)
csr2 = csr.to_sparse(layout=torch.sparse_csr)
x = torch.ones((2, 1), dtype=torch.float32)
y = torch.matmul(csr2, x)
z = torch.sum(y)
z.backward()
print(csr.grad) |
cc nikitaved pearu cpuhrsch amjames bhosmer [ghstack-poisoned]
The additional clone will introduce a memory copy. Maybe using alias or such can help. @albanD - do you have time to take a look? |
cc nikitaved pearu cpuhrsch amjames bhosmer [ghstack-poisoned]
I have always wondered why sparse compressed conversions return a copy while sparse or strided conversions return >>> a=torch.tensor([[1, 2], [3, 4]])
>>> a.to_dense() is a
True that makes me assume that csr->csr, csc->csc, ... conversions should also be able to return Is there a reason for this discrepancy (returning |
I think it used to break autograd, but that might have changed, cc @albanD . |
At a high level the idea here is that autograd elementary ops must respect the aliasing from their schema. btw this is why ops like reshape must remain CompositeImplicitAutograd. Because they sometimes return a view and sometimes not.
Any op that is CompositeExplicitAutograd or has special CPU/Sparse dispatch (and thus an autograd formula) cannot do that and must either always be a view or never. The special note here is that sparse layouts break that rule today for view ops where they return non-views. So when autograd is involved you can get arbitrarily wrong results indeed! |
Notice that the following works: csr = torch.sparse_csr_tensor((0, 1, 2), (0, 1), (1, 1), dtype=torch.float32, requires_grad=True)
csr2 = csr.to_sparse(layout=torch.sparse_csr).detach().requires_grad_(True)
x = torch.ones((2, 1), dtype=torch.float32)
y = torch.matmul(csr2, x)
z = torch.sum(y)
z.backward()
print(csr2.grad) # UPDATED What is the autograd interpretation of using |
It does not seem to work for me, prints |
Yes, returning This actually answers my question in a way: replacing To keep |
So this PR is basically replacing return self.requires_grad() ? self.clone() : self; |
cc nikitaved pearu cpuhrsch amjames bhosmer [ghstack-poisoned]
cc nikitaved pearu cpuhrsch amjames bhosmer [ghstack-poisoned]
cc nikitaved pearu cpuhrsch amjames bhosmer [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are green and this looks good from my end. Pending @pearu indicating his concerns have been addressed we can land.
Also regarding the issue of the sparse scalar concept that @pearu pointed out I would say it is out of scope for this work and can be discussed elsewhere at a time when we have a motivation for such a feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a number of nits and questions. Overall, looks good to me. Thanks, @alexsamardzic!
auto layout_to = layout.value_or(kSparse); | ||
if (self.layout() == layout_to) { | ||
_to_sparse_check_arguments("to_sparse", self, layout, blocksize, dense_dim_opt); | ||
return self; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, here and below, will need return self.alias()
but not in this PR as it is handled by #103810
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I should remember to make the change when this PR of yours landed.
("aten::to_sparse.out", datetime.date(2023, 6, 30)), | ||
("aten::to_sparse.sparse_dim_out", datetime.date(2023, 6, 30)), | ||
("aten::to_sparse_bsc.out", datetime.date(2023, 6, 30)), | ||
("aten::to_sparse_bsr.out", datetime.date(2023, 6, 30)), | ||
("aten::to_sparse_csc.out", datetime.date(2023, 6, 30)), | ||
("aten::to_sparse_csr.out", datetime.date(2023, 6, 30)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these dates be updated, say, increment by half a year?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more comment here: I've modeled this PR according to the to_dense()
/_to_dense()
separation. Here, in native_functions.yaml
file to_dense
is written so that to_dense.out
is not generated, and this is the reason I've removed .out
versions for to_sparse*()
methods. But it could be a backward incompatible change in case someone used this. Please advise if it would be preferable to keep these.
cc nikitaved pearu cpuhrsch amjames bhosmer voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 [ghstack-poisoned]
cc nikitaved pearu cpuhrsch amjames bhosmer voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 [ghstack-poisoned]
cc nikitaved pearu cpuhrsch amjames bhosmer voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 [ghstack-poisoned]
test/forward_backward_compatibility/check_forward_backward_compatibility.py
Outdated
Show resolved
Hide resolved
@alexsamardzic any idea why CI tests fail? It looks like |
cc nikitaved pearu cpuhrsch amjames bhosmer voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 [ghstack-poisoned]
Typo 😳 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks, @alexsamardzic!
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
cc @nikitaved @pearu @cpuhrsch @amjames @bhosmer @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78