Pytorch 1.5.0 requires_grad being automatically set to false in C++ registered operators #37306

inspiros · 2020-04-25T17:00:32Z

🐛 Bug

Autograd no longer works on registered operators in pytorch verison 1.5.0. I'm not sure if it is a bug or new feature. If it is an API change then how this thing supposed to work from now on because my project heavily relies on it.

To Reproduce

Steps to reproduce the behavior:

Register an operator:

// Simple lambda operator that adds 2 tensors
static auto registry =
    torch::RegisterOperators()
        .op("my_ops::add", [] (torch::Tensor a, torch::Tensor b) {
            std::cout << a.options() << std::endl;  // debug
            return a + b;
        })
        ;

Load it in python side:

# copied from torchvision.extension.py
lib_dir = os.path.dirname(__file__)
loader_details = (
    importlib.machinery.ExtensionFileLoader,
    importlib.machinery.EXTENSION_SUFFIXES
)

extfinder = importlib.machinery.FileFinder(lib_dir, loader_details)
torch.ops.load_library(extfinder.find_spec("_C").origin)

Call it on tensors that require grad.

a = torch.rand(3, 3, requires_grad=True)
b = torch.rand(3, 3, requires_grad=True)
out = torch.ops.my_ops.add(a, b)
print(out)

The returned tensor has no grad and is unable to backward:

TensorOptions(dtype=float, device=cpu, layout=Strided, requires_grad=false)
tensor([[1.1142, 0.7253, 0.7918],
        [0.6234, 0.5826, 1.6480],
        [1.5144, 1.4252, 0.7863]])

After some tests, tensor options dtype, device seem to work ok but requires_grad is always set to false.

Expected behavior

In pytorch 1.4.0, the same code (given exact seed) yields:

TensorOptions(dtype=float, device=cpu, layout=Strided, requires_grad=true)
tensor([[1.1142, 0.7253, 0.7918],
        [0.6234, 0.5826, 1.6480],
        [1.5144, 1.4252, 0.7863]], grad_fn=<AddBackward0>)

Environment

PyTorch Version (e.g., 1.0): 1.5.0
OS (e.g., Linux): Windows 10
How you installed PyTorch (conda, pip, source): pip install torch-1.5.0-cp37-cp37m-win_amd64.whl
Python version: 3.7.5
CUDA/cuDNN version: 10.2

Additional context

I also opened a forum topic about this in discuss.pytorch.org, but doesn't seem to receive any response from the community so I turned to the developers.

cc @ezyang @gchanan @zou3519 @ssnl @albanD @gqchen

The text was updated successfully, but these errors were encountered:

ezyang · 2020-04-27T15:20:57Z

cc @smessmer

ezyang · 2020-04-27T15:25:26Z

It looks like these lines are the culprit:

void variable_fallback_kernel(const OperatorHandle& op, Stack* stack) {
    at::AutoNonVariableTypeMode _var_guard(true);
    Dispatcher::singleton().callBoxed(op, stack);
}

static auto registry = Dispatcher::singleton().registerBackendFallbackKernel(
    DispatchKey::VariableTensorId,
    KernelFunction::makeFromBoxedFunction<&variable_fallback_kernel>()
);

ezyang · 2020-04-27T15:25:53Z

Introduced in #30650

ezyang · 2020-04-27T15:26:46Z

Relevant comment #29934 (comment)

Looks like I didn't realize that fallthroughs were supported by this behavior previously

ezyang · 2020-04-27T15:30:08Z

@inspiros as a workaround, can you try doing this as your registration?

static auto registry =
    torch::RegisterOperators()
        .op("my_ops::add", torch::RegisterOperators::options()
           .kernel(DispatchKey::VariableTensorId, [] (torch::Tensor a, torch::Tensor b) {
            std::cout << a.options() << std::endl;  // debug
            return a + b;
        }))
        ;

ezyang · 2020-04-27T18:32:25Z

Standalone reproducer:

import torch
import torch.utils.cpp_extension
import torchvision

cpp_source = """
#include <ATen/core/op_registration/op_registration.h>
#include <iostream>

static auto registry =
    torch::RegisterOperators()
        .op("my_ops::add", [] (torch::Tensor a, torch::Tensor b) {
            std::cout << a.options() << std::endl;  // debug
            return a + b;
        })
        ;
"""

module = torch.utils.cpp_extension.load_inline(
    name="inline_jit_extension",
    cpp_sources=cpp_source,
    functions=[],
    verbose=True,
)

a = torch.rand(3, 3, requires_grad=True)
b = torch.rand(3, 3, requires_grad=True)
out = torch.ops.my_ops.add(a, b)
print(out.requires_grad)

ezyang · 2020-04-27T18:41:59Z

If the derivative is given manually, everything seems to work (which is why torchvision isn't broken):

import torch
import torch.utils.cpp_extension
import torchvision

cpp_source = """
#include <ATen/core/op_registration/op_registration.h>
#include <torch/csrc/autograd/custom_function.h>
#include <iostream>

using torch::autograd::Variable;
using torch::autograd::variable_list;
using torch::autograd::AutogradContext;

class AddFunction : public torch::autograd::Function<AddFunction> {
public:
    static variable_list forward(AutogradContext* ctx, Variable a, Variable b) {
        auto result = a + b;
        return {result};
    }
    static variable_list backward(AutogradContext* ctx, variable_list grad_output) {
        return {grad_output[0], grad_output[0]};
    }
};

static auto registry =
    torch::RegisterOperators()
        .op("my_ops::add", [] (torch::Tensor a, torch::Tensor b) {
            return AddFunction::apply(a, b)[0];
        })
        ;
"""

module = torch.utils.cpp_extension.load_inline(
    name="inline_jit_extension",
    cpp_sources=cpp_source,
    functions=[],
    verbose=True,
)

a = torch.rand(3, 3, requires_grad=True)
b = torch.rand(3, 3, requires_grad=True)
out = torch.ops.my_ops.add(a, b)
print(out.requires_grad)

This is probably an independent bug

ezyang · 2020-04-27T18:54:13Z

Demo that the workaround works:

import torch
import torch.utils.cpp_extension
import torchvision

cpp_source = """
#include <ATen/core/op_registration/op_registration.h>
#include <iostream>

static auto registry =
    torch::RegisterOperators()
        .op("my_ops::add", torch::RegisterOperators::options()
           .kernel(DispatchKey::VariableTensorId, [] (torch::Tensor a, torch::Tensor b) {
            std::cout << a.options() << std::endl;  // debug
            return a + b;
        }))
        ;
"""

module = torch.utils.cpp_extension.load_inline(
    name="inline_jit_extension",
    cpp_sources=cpp_source,
    functions=[],
    verbose=True,
)

a = torch.rand(3, 3, requires_grad=True)
b = torch.rand(3, 3, requires_grad=True)
out = torch.ops.my_ops.add(a, b)
print(out.requires_grad)

Potentially fixes #37306 Differential Revision: [D21261946](https://our.internmc.facebook.com/intern/diff/D21261946/) [ghstack-poisoned]

Potentially fixes #37306 Differential Revision: [D21261946](https://our.internmc.facebook.com/intern/diff/D21261946/) ghstack-source-id: 102971495 Pull Request resolved: #37355

inspiros · 2020-04-27T23:41:45Z

@ezyang Perfect, that should work, for now at least. I saw the new style API static auto register = torch::import("my_ops").def("add", &add) but were not able to load it as operator so I suspect the old API changed; also not sure if this thing is a feature or bug.
Thank you sir!

Potentially fixes #37306 Differential Revision: [D21261946](https://our.internmc.facebook.com/intern/diff/D21261946/) [ghstack-poisoned]

Pull Request resolved: #37355 Potentially fixes #37306 ghstack-source-id: 103073537 Differential Revision: [D21261946](https://our.internmc.facebook.com/intern/diff/D21261946/)

smessmer · 2020-04-29T20:36:08Z

Can you verify that #37355 fixes this? It was just merged into master.

ezyang · 2020-04-29T20:48:04Z

Keeping the issue open while we assess if it's necessary for 1.5 point release

ezyang · 2020-04-29T20:48:53Z

@inspiros OK. Note that the workaround syntax is probably going to stop working when next release rolls around, though I guess we will try harder not to break it now that we know at least one person is using it.

This is definitely a bug in the old API, and the new API inherits the problem too (they use the same underlying implementation).

Summary: Pull Request resolved: #37355 Potentially fixes #37306 ghstack-source-id: 103073537 Test Plan: waitforsandcastle Differential Revision: D21261946 fbshipit-source-id: 454652b528dcf942bec5438f89201822de40bbf0

gchanan · 2020-05-28T19:00:20Z

closing as this has been merged to 1.5.1.

inspiros changed the title ~~Pytorch 1.5.0 autograd does not work on registered operators~~ Pytorch 1.5.0 requires_grad being automatically set to false in C++ registered operators Apr 25, 2020

izdeby added high priority module: autograd Related to torch.autograd, and the autograd engine in general labels Apr 26, 2020

pytorch-probot bot added the triage review label Apr 26, 2020

ezyang added module: custom-operators module: internals Related to internal abstractions in c10 and ATen and removed triage review labels Apr 27, 2020

smessmer mentioned this issue Apr 27, 2020

Don't use NonVariableTypeMode in custom ops #37355

Closed

smessmer added a commit that referenced this issue Apr 27, 2020

Don't use NonVariableTypeMode in custom ops

89fede5

Potentially fixes #37306 Differential Revision: [D21261946](https://our.internmc.facebook.com/intern/diff/D21261946/) [ghstack-poisoned]

smessmer added a commit that referenced this issue Apr 27, 2020

Don't use NonVariableTypeMode in custom ops

28f3f80

Potentially fixes #37306 Differential Revision: [D21261946](https://our.internmc.facebook.com/intern/diff/D21261946/) ghstack-source-id: 102971495 Pull Request resolved: #37355

colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 28, 2020

smessmer added a commit that referenced this issue Apr 28, 2020

Update on "Don't use NonVariableTypeMode in custom ops"

05c0a0e

Potentially fixes #37306 Differential Revision: [D21261946](https://our.internmc.facebook.com/intern/diff/D21261946/) [ghstack-poisoned]

smessmer added a commit that referenced this issue Apr 28, 2020

Update on "Don't use NonVariableTypeMode in custom ops"

948df1d

Potentially fixes #37306 Differential Revision: [D21261946](https://our.internmc.facebook.com/intern/diff/D21261946/) [ghstack-poisoned]

smessmer added a commit that referenced this issue Apr 28, 2020

Update on "Don't use NonVariableTypeMode in custom ops"

76192a3

Potentially fixes #37306 Differential Revision: [D21261946](https://our.internmc.facebook.com/intern/diff/D21261946/) [ghstack-poisoned]

smessmer added a commit that referenced this issue Apr 28, 2020

Don't use NonVariableTypeMode in custom ops

0ec1e0c

Pull Request resolved: #37355 Potentially fixes #37306 ghstack-source-id: 103073537 Differential Revision: [D21261946](https://our.internmc.facebook.com/intern/diff/D21261946/)

facebook-github-bot closed this as completed in 12f5a32 Apr 29, 2020

ezyang reopened this Apr 29, 2020

smessmer mentioned this issue Apr 29, 2020

Don't use NonVariableTypeMode in custom ops (#37355) #37534

Merged

gchanan added this to the 1.5.1 milestone May 5, 2020

gchanan closed this as completed May 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pytorch 1.5.0 requires_grad being automatically set to false in C++ registered operators #37306

Pytorch 1.5.0 requires_grad being automatically set to false in C++ registered operators #37306

inspiros commented Apr 25, 2020 •

edited by pytorch-probot bot

ezyang commented Apr 27, 2020

ezyang commented Apr 27, 2020

ezyang commented Apr 27, 2020

ezyang commented Apr 27, 2020

ezyang commented Apr 27, 2020

ezyang commented Apr 27, 2020

ezyang commented Apr 27, 2020 •

edited

ezyang commented Apr 27, 2020

inspiros commented Apr 27, 2020

smessmer commented Apr 29, 2020 •

edited

ezyang commented Apr 29, 2020

ezyang commented Apr 29, 2020

gchanan commented May 28, 2020

Pytorch 1.5.0 requires_grad being automatically set to false in C++ registered operators #37306

Pytorch 1.5.0 requires_grad being automatically set to false in C++ registered operators #37306

Comments

inspiros commented Apr 25, 2020 • edited by pytorch-probot bot

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

ezyang commented Apr 27, 2020

ezyang commented Apr 27, 2020

ezyang commented Apr 27, 2020

ezyang commented Apr 27, 2020

ezyang commented Apr 27, 2020

ezyang commented Apr 27, 2020

ezyang commented Apr 27, 2020 • edited

ezyang commented Apr 27, 2020

inspiros commented Apr 27, 2020

smessmer commented Apr 29, 2020 • edited

ezyang commented Apr 29, 2020

ezyang commented Apr 29, 2020

gchanan commented May 28, 2020

inspiros commented Apr 25, 2020 •

edited by pytorch-probot bot

ezyang commented Apr 27, 2020 •

edited

smessmer commented Apr 29, 2020 •

edited