Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make detach return an alias even under inference mode #59633

Closed
wants to merge 3 commits into from

Conversation

ezyang
Copy link
Contributor

@ezyang ezyang commented Jun 8, 2021

Stack from ghstack:

Fixes #59614

This fix isn't 100% correct but it appears to stem the bleeding.
A better fix would be understand how to detect when function
implementations don't uphold required invariants, leading to
refcount disaster.

Signed-off-by: Edward Z. Yang ezyang@fb.com

Differential Revision: D28962183

Fixes #59614

This fix isn't 100% correct but it appears to stem the bleeding.
A better fix would be understand how to detect when function
implementations don't uphold required invariants, leading to
refcount disaster.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jun 8, 2021

💊 CI failures summary and remediations

As of commit 67e5ad9 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-scanned failure(s)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Fixes #59614

This fix isn't 100% correct but it appears to stem the bleeding.
A better fix would be understand how to detect when function
implementations don't uphold required invariants, leading to
refcount disaster.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
@ezyang ezyang requested review from albanD and cpuhrsch June 8, 2021 15:04
@ezyang
Copy link
Contributor Author

ezyang commented Jun 8, 2021

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@gchanan
Copy link
Contributor

gchanan commented Jun 8, 2021

this is BC-breaking, no?

@gchanan
Copy link
Contributor

gchanan commented Jun 8, 2021

I guess only if AutoNonVariableType was on, which is rare.

@gchanan
Copy link
Contributor

gchanan commented Jun 8, 2021

This fix isn't 100% correct but it appears to stem the bleeding.

why isn't it correct? Btw does this work with sparse? I think it will, I just recall we didn't bind alias in python previously and x[:] didn't work.

A better fix would be understand how to detect when function implementations don't uphold required invariants, leading to refcount disaster.

is it actually not legal to return non-aliases?

@ezyang
Copy link
Contributor Author

ezyang commented Jun 8, 2021

this is BC-breaking, no? I guess only if AutoNonVariableType was on, which is rare.

Yeah, this shouldn't actually be BC breaking, because prior to inference_mode it wasn't actually possible to trigger this codepath from Python, and I think in C++ we don't really exercise this. One possibility is this may cause some minor regressions on inference in production if people were calling detach.

why isn't it correct?

I think that this specific change only improves correctness, but what I am currently missing is a good explanation of how all of these "autograd only" operations should operate when AutoNonVariableType is on. For example, we also make detach_ a no op (not modified in this PR) and I think this is wrong, but I have to think a lot more carefully about what a good resolution in that case is. I'm also unsure about whether or not you should ever be actually exercising this code

Btw does this work with sparse? I think it will, I just recall we didn't bind alias in python previously and x[:] didn't work.

Nope it doesn't.

>>> with torch.inference_mode(): torch.sparse_coo_tensor([[1]], [2]).detach()
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: sparse tensors do not have strides

is it actually not legal to return non-aliases?

The correctness conditions here are a little vague (e.g., because contiguous is OK with returning the same Tensor), but in general anywhere in PyTorch where we take a returned output and mutate it, the result MUST be an alias because otherwise you will mutate an input (not intended). One concrete example of this (but not applicable to the bug this PR fixes) is in autograd, where we mutate output tensors to set their grad_fn.

cc @robieta this is the same thing we are arguing about in #59389

@ezyang
Copy link
Contributor Author

ezyang commented Jun 8, 2021

blegh mobile is throwing a hissy fit. Gonna do this slightly differently

Fixes #59614

This fix isn't 100% correct but it appears to stem the bleeding.
A better fix would be understand how to detect when function
implementations don't uphold required invariants, leading to
refcount disaster.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D28962183](https://our.internmc.facebook.com/intern/diff/D28962183)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jun 8, 2021
Fixes #59614

This fix isn't 100% correct but it appears to stem the bleeding.
A better fix would be understand how to detect when function
implementations don't uphold required invariants, leading to
refcount disaster.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 0a09b5a91ccc2900d23530ecc5b5f1a210d3ff36
Pull Request resolved: #59633
@ezyang
Copy link
Contributor Author

ezyang commented Jun 8, 2021

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang ezyang mentioned this pull request Jun 8, 2021
@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 6ca141f.

@ezyang ezyang mentioned this pull request Jun 9, 2021
deniskokarev pushed a commit to deniskokarev/pytorch that referenced this pull request Jun 9, 2021
Summary:
Pull Request resolved: pytorch#59633

Fixes pytorch#59614

This fix isn't 100% correct but it appears to stem the bleeding.
A better fix would be understand how to detect when function
implementations don't uphold required invariants, leading to
refcount disaster.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D28962183

Pulled By: ezyang

fbshipit-source-id: 6ec71994666289dadef47bac363e6902df90b094
ezyang added a commit to ezyang/pytorch that referenced this pull request Jun 9, 2021
Summary:
Pull Request resolved: pytorch#59633

Fixes pytorch#59614

This fix isn't 100% correct but it appears to stem the bleeding.
A better fix would be understand how to detect when function
implementations don't uphold required invariants, leading to
refcount disaster.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D28962183

Pulled By: ezyang

fbshipit-source-id: 6ec71994666289dadef47bac363e6902df90b094
malfet pushed a commit that referenced this pull request Jun 11, 2021
Summary:
Pull Request resolved: #59633

Fixes #59614

This fix isn't 100% correct but it appears to stem the bleeding.
A better fix would be understand how to detect when function
implementations don't uphold required invariants, leading to
refcount disaster.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D28962183

Pulled By: ezyang

fbshipit-source-id: 6ec71994666289dadef47bac363e6902df90b094
@facebook-github-bot facebook-github-bot deleted the gh/ezyang/1032/head branch June 12, 2021 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants