Debugging feature for "modified by an inplace operation" errors #15803

gwgundersen · 2019-01-07T21:14:31Z

Feature

Better error logging for inplace operations that throw errors in automatic differentiation.

Motivation

In complex computational graphs, locating the operation causing the error is nontrivial. A feature for isolating the offending operation would save a lot of developer time.

Pitch

See here for a more concrete suggestion. To quote:

I wonder whether it might be worth adding a "debug" modus that records the stack of the op in the forward pass and spits it out on error in the backward. That way, it would point to the right line of code directly.

Alternatives

Just being able to log all inplace operations would be useful.

fmassa · 2019-01-14T14:49:45Z

Question: can't you use the anomaly_detection for that?

Here is an example where it shows up the exact line where the error occurs:

import torch


with torch.autograd.set_detect_anomaly(True):
    a = torch.rand(1, requires_grad=True)
    c = torch.rand(1, requires_grad=True)

    b = a ** 2 * c ** 2
    b += 1
    b *= c + a

    d = b.exp_()
    d *= 5

    b.backward()

And the stack trace:

sys:1: RuntimeWarning: Traceback of forward call that caused the error:
  File "tst.py", line 13, in <module>
    d = b.exp_()

Traceback (most recent call last):
  File "tst.py", line 16, in <module>
    b.backward()
  File "/Users/fmassa/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/Users/fmassa/anaconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

which points directly to b.exp_(), and indeed, if you change that line to be b.exp(), it all works fine.

fmassa · 2019-01-14T18:41:06Z

Closing as this can be obtained with anomaly_detection.

gwgundersen · 2019-01-14T18:51:09Z

Great, thanks. This might be a good thing to add to the section on in-place operations in the autograd mechanics tutorial. I'm happy to add a sentence or two, but the contributing doc doesn't mention how to modify tutorials or notes.

fmassa · 2019-01-14T20:15:07Z

Maybe check in https://github.com/pytorch/tutorials ?

f0k · 2019-03-27T15:12:27Z

which points directly to b.exp_(), and indeed, if you change that line to be b.exp(), it all works fine.

To clarify for other readers, the anomaly detection will not necessarily point you at the inplace operation that caused the failure. Instead, it will point you at the operation that could not compute its gradient in the backward pass. The inplace operation to blame may occur anywhere after that, modifying one of the tensors that participated in the line found by the anomaly detection.

Example:

x = torch.rand(10, 20, requires_grad=True)
y = torch.rand(10)
z = (x / y[:, np.newaxis])  # anomaly detection will point here
c = y.abs_()  # but the problem is here
z.sum().backward()

The last line will cause a RuntimeError. With anomaly detection enabled, it will point at the line performing the division, but the inplace operation came later.

redwrasse · 2020-07-15T00:11:27Z

Hi,
As @f0k has pointed out in the last comment, the anomaly detection feature does not isolate the line containing the in-place operation. I think the original poster's concerns (@gwgundersen ) still stand. Is there any interest or new developments in debugging this sort of error? Thanks!

wilsonjwcsu · 2021-06-23T16:05:00Z

Agreed with @redwrasse . anomaly detection doesn't isolate the problematic in-place operation, which can still be a real pain to track down. Definitely interested in better tools for debugging this problem.

ailzhang added the todo Not as important as medium or high priority tasks, but we will work on these. label Jan 8, 2019

fmassa added triage review and removed triage review todo Not as important as medium or high priority tasks, but we will work on these. labels Jan 14, 2019

fmassa closed this as completed Jan 14, 2019

f0k mentioned this issue Mar 27, 2019

Add helpful information to the gradient/inplace operation exception #18523

Closed

JackCaoG mentioned this issue Jul 24, 2020

Fairseq's multihead attention crashes. pytorch/xla#2369

Closed

krshrimali mentioned this issue Apr 7, 2022

using gpu cause cudnn error Lightning-AI/pytorch-lightning#12652

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debugging feature for "modified by an inplace operation" errors #15803

Debugging feature for "modified by an inplace operation" errors #15803

gwgundersen commented Jan 7, 2019

fmassa commented Jan 14, 2019

fmassa commented Jan 14, 2019

gwgundersen commented Jan 14, 2019

fmassa commented Jan 14, 2019

f0k commented Mar 27, 2019

redwrasse commented Jul 15, 2020

wilsonjwcsu commented Jun 23, 2021

Debugging feature for "modified by an inplace operation" errors #15803

Debugging feature for "modified by an inplace operation" errors #15803

Comments

gwgundersen commented Jan 7, 2019

Feature

Motivation

Pitch

Alternatives

fmassa commented Jan 14, 2019

fmassa commented Jan 14, 2019

gwgundersen commented Jan 14, 2019

fmassa commented Jan 14, 2019

f0k commented Mar 27, 2019

redwrasse commented Jul 15, 2020

wilsonjwcsu commented Jun 23, 2021