Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

print statement causes inplace error #99968

Closed
ptrblck opened this issue Apr 25, 2023 · 5 comments
Closed

print statement causes inplace error #99968

ptrblck opened this issue Apr 25, 2023 · 5 comments
Assignees
Labels
actionable high priority module: autograd Related to torch.autograd, and the autograd engine in general needs design triage review triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ptrblck
Copy link
Collaborator

ptrblck commented Apr 25, 2023

🐛 Describe the bug

Reported in: https://discuss.pytorch.org/t/error-with-view-no-grad-and-inplace-modify/173082
but I couldn't find the created GitHub issue and the author didn't follow up.

Code to reproduce the issue:

net = nn.Sequential(
    nn.Linear(10, 10),
    nn.ReLU(),
    nn.Linear(10, 10),
    nn.ReLU(),
    nn.Linear(10, 10),
    nn.ReLU(),
)

with torch.no_grad():
  for param in net.parameters():
    for j in param.flatten():
        #print("current j", j)
        j += 1

Comment the print statement in and the code will fail with:

RuntimeError: A view was created in no_grad mode and its base or another view of its base has been modified inplace with grad mode enabled. Given that this use case is ambiguous and error-prone, it is forbidden. You can clarify your code by moving both the view and the inplace either both inside the no_grad block (if you don't want the inplace to be tracked) or both outside (if you want the inplace to be tracked).

I would assume the inplace operation is allowed as it's in a no_grad block and no computation graph was ever created.

Also, maybe related to: https://discuss.pytorch.org/t/old-problem-but-strange-things-trying-to-backward-through-the-graph-a-second-time/178369

but no executable code snippet was posted yet.

Versions

Reproduced in a nightly build: 2.1.0.dev20230407+cu118.

CC @albanD as we talked about this issue before.

cc @ezyang @gchanan @zou3519 @kadeng @albanD @gqchen @pearu @nikitaved @soulitzer @lezcano @Varal7

@albanD albanD added module: autograd Related to torch.autograd, and the autograd engine in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 25, 2023
@albanD
Copy link
Collaborator

albanD commented Apr 25, 2023

I'm not sure from the top of my head.
But most likely that we're creating a view during printing, then the inplace happens which invalidates that view and then for some reason we try to recover it.
We would need to investigate this further.

@soulitzer
Copy link
Contributor

soulitzer commented Jun 23, 2023

Smaller repro:

a = torch.rand(1, requires_grad=True)

with torch.no_grad():
    b = a[:]
    b += 1

# Doing any of the of the following produces an error
b.sin()   # (1) 
b.grad_fn  # (2)
print(b)  # (3) the reason this fails is because it calls into t.grad_fn for printing purposes

The easy fix for (3), is just to special case on views in no-grad in the printing code, but maybe there is a more general fix.

See also #11390

@hjmshi
Copy link
Contributor

hjmshi commented Nov 14, 2023

Hi, we are running into a similar issue as we are implementing an updated version of Distributed Shampoo and seeking to apply torch.compile on it from PyTorch 2. In our new version of the code, we create many views of the parameters in the optimizer, and seek to apply in-place foreach operators on the parameter views.

When we print the list of parameter views or apply torch.compile on this list, we observe the same error, even though everything is performed under @torch.no_grad. Prior to applying the in-place add for the first time, if we print the list of views, we do observe requires_grad = True, consistent with #11390.

Any suggestions on how to proceed? Thanks in advance!

Interestingly, we found that logging the tensor does not trigger this issue, but printing does.

cc: @tsunghsienlee @shintaro-iwasaki @minddrummer @csmiler @mlazos @bdhirsh @yuchenhao

@albanD
Copy link
Collaborator

albanD commented Nov 14, 2023

Note that here we can easily fix printing (try/except around access to grad_fn and print an invalid grad_fn). The other errors are expected: This is undefined behavior to do this so we rather raise an error.

@albanD
Copy link
Collaborator

albanD commented Nov 14, 2023

Actionable to fix print.
High pri for user activity

@soulitzer soulitzer self-assigned this Nov 14, 2023
soulitzer added a commit that referenced this issue Nov 15, 2023
…d in-place in no-grad"


Fixes #99968


[ghstack-poisoned]
soulitzer added a commit that referenced this issue Nov 15, 2023
…d in-place in no-grad"


Fixes #99968


[ghstack-poisoned]
soulitzer added a commit that referenced this issue Nov 15, 2023
…d in-place in no-grad"


Fixes #99968


[ghstack-poisoned]
soulitzer added a commit that referenced this issue Nov 15, 2023
…d in-place in no-grad"


Fixes #99968


[ghstack-poisoned]
soulitzer added a commit that referenced this issue Nov 16, 2023
…d in-place in no-grad"


Fixes #99968


[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
actionable high priority module: autograd Related to torch.autograd, and the autograd engine in general needs design triage review triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants