Skip to content

Conversation

@mcarilli
Copy link
Collaborator

@FDecaYed wrote some nice logic into accumulate_grad.cpp to elide unnecessary cloning or accumulation of incoming gradients. Under certain conditions, the incoming gradient can be stolen and used directly.

One of these conditions is that the incoming gradient's refcount must be 1. Nothing else can be referencing the data (which might happen, for instance, for c = b + a: the + operator's backward can simply pass c's incoming gradient tensor directly to both a and b's AccumulateGrad functions).

However, if a given param's AccumulateGrad function has post hooks (like, for example, the allreduce hooks employed by both apex and c10d DDP), the autograd engine stashes a copy of the incoming gradients, which increments the refcount, causing AccumulateGrad's hard check that the refcount is 1 to fail, and triggering an (in this case) unnecessary deep copy.

This PR updates the logic to account for the possible presence of post hooks on the AccumulateGrad function. I discussed it with @apaszke and we both agree it introduces a worrisome silent dependency between engine.cpp and accumulate_grad.cpp. I've commented the relevant pieces, but I'm open to suggestions as to how this check could be made more agnostic to the implementation of engine.cpp.

@albanD
Copy link
Collaborator

albanD commented Nov 2, 2018

Actually to be fair, the following code sample fails on master if the DDP-like hook is removed. So this maybe not be a blocker for this PR.
But the following code used to work and is not working anymore with this PR.

Code sample:

import torch

inp = torch.rand(1, 10, requires_grad=True)
dummy = torch.rand(1, 10) # Make sure we get contiguous gradients

# DPP like hook
def dpp_hook(*args):
    pass
tmp = inp.expand_as(inp)
tmp.grad_fn.next_functions[0][0].register_hook(dpp_hook)

# View your leaf tensor differently
b = inp.view(10)

# Hook on the result of this op
my_bad_ref = None
def other_hook(grad):
    global my_bad_ref
    my_bad_ref = grad
b.register_hook(other_hook)

# This first backward will make all refs the same from this PR
b.mul(dummy).sum().backward()
print(".grad data: ", inp.grad.data_ptr())
print("my_bad_ref data: ", my_bad_ref.data_ptr())

# Subsequent backward modify my awesome tensor
print(my_bad_ref)
inp.sum().backward()
print(my_bad_ref) # It changed here

With current master, the data_ptr() will not be the same and both prints at the end will print the same tensor.
With this change, all the data_ptr() are the same and the two prints at the end contain two different values.

@mcarilli
Copy link
Collaborator Author

mcarilli commented Nov 12, 2018

b is not a leaf, so other_hook gets converted to a function pre hook on b's AccumulateGrad function, as you told me in pytorch slack. In current master (without my PR) if you remove ddp_hook on inp's AccumulateGrad function, the copy elision optimization takes effect as normal. The incoming gradient is simply handed down through other_hook, then presumably through view()'s backward, and eventually stashed (without any deep copies along the way) by inp's AccumulateGrad as inp.grad, and so you observe that the data_ptr()s are the same.

In your example, the only reason that restoring ddp_hook "fixes" your issue is because in current master ddp_hook disables copy elision, which causes a deep copy to take place, so pointers become different. My PR restores the copy elision, which re-exposes your issue. In this way, by restoring the copy-eliding behavior, my PR is not "breaking" anything: it's simply re-exposing the default behavior that was already present (which, due to your issue, may have been "broken" all along). This behavior should have been spotted and discussed when grad copy elision was originally merged.

If you guys are comfortable with the way the grad copy elision currently works in master, then my PR should be acceptable, since it simply restores the existing behavior for cases where a ddp-style hook is registered. If not, then we can take a harder look at grad copy elision, keeping in mind the big picture that ddp-style hooks in particular are NOT the fundamental issue here.

@mcarilli mcarilli closed this Nov 30, 2018
@mcarilli mcarilli mentioned this pull request Nov 30, 2018
facebook-github-bot pushed a commit that referenced this pull request Nov 30, 2018
Summary:
Rebased version of #13337.

I don't think the lint errors in the original PR had to do with files I touched, so hopefully the rebase fixes them.
Pull Request resolved: #14587

Differential Revision: D13277428

Pulled By: soumith

fbshipit-source-id: f04c186b1dd4889b4250597eef87f9e9bf7b2426
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants