Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch 1.5.0 requires_grad being automatically set to false in C++ registered operators #37306

Closed
inspiros opened this issue Apr 25, 2020 · 13 comments
Labels
high priority module: autograd Related to torch.autograd, and the autograd engine in general module: custom-operators module: internals Related to internal abstractions in c10 and ATen triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone

Comments

@inspiros
Copy link

inspiros commented Apr 25, 2020

馃悰 Bug

Autograd no longer works on registered operators in pytorch verison 1.5.0. I'm not sure if it is a bug or new feature. If it is an API change then how this thing supposed to work from now on because my project heavily relies on it.

To Reproduce

Steps to reproduce the behavior:

  1. Register an operator:
// Simple lambda operator that adds 2 tensors
static auto registry =
    torch::RegisterOperators()
        .op("my_ops::add", [] (torch::Tensor a, torch::Tensor b) {
            std::cout << a.options() << std::endl;  // debug
            return a + b;
        })
        ;
  1. Load it in python side:
# copied from torchvision.extension.py
lib_dir = os.path.dirname(__file__)
loader_details = (
    importlib.machinery.ExtensionFileLoader,
    importlib.machinery.EXTENSION_SUFFIXES
)

extfinder = importlib.machinery.FileFinder(lib_dir, loader_details)
torch.ops.load_library(extfinder.find_spec("_C").origin)
  1. Call it on tensors that require grad.
a = torch.rand(3, 3, requires_grad=True)
b = torch.rand(3, 3, requires_grad=True)
out = torch.ops.my_ops.add(a, b)
print(out)

The returned tensor has no grad and is unable to backward:

TensorOptions(dtype=float, device=cpu, layout=Strided, requires_grad=false)
tensor([[1.1142, 0.7253, 0.7918],
        [0.6234, 0.5826, 1.6480],
        [1.5144, 1.4252, 0.7863]])

After some tests, tensor options dtype, device seem to work ok but requires_grad is always set to false.

Expected behavior

In pytorch 1.4.0, the same code (given exact seed) yields:

TensorOptions(dtype=float, device=cpu, layout=Strided, requires_grad=true)
tensor([[1.1142, 0.7253, 0.7918],
        [0.6234, 0.5826, 1.6480],
        [1.5144, 1.4252, 0.7863]], grad_fn=<AddBackward0>)

Environment

  • PyTorch Version (e.g., 1.0): 1.5.0
  • OS (e.g., Linux): Windows 10
  • How you installed PyTorch (conda, pip, source): pip install torch-1.5.0-cp37-cp37m-win_amd64.whl
  • Python version: 3.7.5
  • CUDA/cuDNN version: 10.2

Additional context

I also opened a forum topic about this in discuss.pytorch.org, but doesn't seem to receive any response from the community so I turned to the developers.

cc @ezyang @gchanan @zou3519 @ssnl @albanD @gqchen

@inspiros inspiros changed the title Pytorch 1.5.0 autograd does not work on registered operators Pytorch 1.5.0 requires_grad being automatically set to false in C++ registered operators Apr 25, 2020
@izdeby izdeby added high priority module: autograd Related to torch.autograd, and the autograd engine in general labels Apr 26, 2020
@ezyang
Copy link
Contributor

ezyang commented Apr 27, 2020

cc @smessmer

@ezyang
Copy link
Contributor

ezyang commented Apr 27, 2020

It looks like these lines are the culprit:

void variable_fallback_kernel(const OperatorHandle& op, Stack* stack) {
    at::AutoNonVariableTypeMode _var_guard(true);
    Dispatcher::singleton().callBoxed(op, stack);
}

static auto registry = Dispatcher::singleton().registerBackendFallbackKernel(
    DispatchKey::VariableTensorId,
    KernelFunction::makeFromBoxedFunction<&variable_fallback_kernel>()
);

@ezyang
Copy link
Contributor

ezyang commented Apr 27, 2020

Introduced in #30650

@ezyang
Copy link
Contributor

ezyang commented Apr 27, 2020

Relevant comment #29934 (comment)

Looks like I didn't realize that fallthroughs were supported by this behavior previously

@ezyang
Copy link
Contributor

ezyang commented Apr 27, 2020

@inspiros as a workaround, can you try doing this as your registration?

static auto registry =
    torch::RegisterOperators()
        .op("my_ops::add", torch::RegisterOperators::options()
           .kernel(DispatchKey::VariableTensorId, [] (torch::Tensor a, torch::Tensor b) {
            std::cout << a.options() << std::endl;  // debug
            return a + b;
        }))
        ;

@ezyang ezyang added module: custom-operators module: internals Related to internal abstractions in c10 and ATen and removed triage review labels Apr 27, 2020
@ezyang
Copy link
Contributor

ezyang commented Apr 27, 2020

Standalone reproducer:

import torch
import torch.utils.cpp_extension
import torchvision

cpp_source = """
#include <ATen/core/op_registration/op_registration.h>
#include <iostream>

static auto registry =
    torch::RegisterOperators()
        .op("my_ops::add", [] (torch::Tensor a, torch::Tensor b) {
            std::cout << a.options() << std::endl;  // debug
            return a + b;
        })
        ;
"""

module = torch.utils.cpp_extension.load_inline(
    name="inline_jit_extension",
    cpp_sources=cpp_source,
    functions=[],
    verbose=True,
)

a = torch.rand(3, 3, requires_grad=True)
b = torch.rand(3, 3, requires_grad=True)
out = torch.ops.my_ops.add(a, b)
print(out.requires_grad)

@ezyang
Copy link
Contributor

ezyang commented Apr 27, 2020

If the derivative is given manually, everything seems to work (which is why torchvision isn't broken):

import torch
import torch.utils.cpp_extension
import torchvision

cpp_source = """
#include <ATen/core/op_registration/op_registration.h>
#include <torch/csrc/autograd/custom_function.h>
#include <iostream>

using torch::autograd::Variable;
using torch::autograd::variable_list;
using torch::autograd::AutogradContext;

class AddFunction : public torch::autograd::Function<AddFunction> {
public:
    static variable_list forward(AutogradContext* ctx, Variable a, Variable b) {
        auto result = a + b;
        return {result};
    }
    static variable_list backward(AutogradContext* ctx, variable_list grad_output) {
        return {grad_output[0], grad_output[0]};
    }
};

static auto registry =
    torch::RegisterOperators()
        .op("my_ops::add", [] (torch::Tensor a, torch::Tensor b) {
            return AddFunction::apply(a, b)[0];
        })
        ;
"""

module = torch.utils.cpp_extension.load_inline(
    name="inline_jit_extension",
    cpp_sources=cpp_source,
    functions=[],
    verbose=True,
)

a = torch.rand(3, 3, requires_grad=True)
b = torch.rand(3, 3, requires_grad=True)
out = torch.ops.my_ops.add(a, b)
print(out.requires_grad)

This is probably an independent bug

@ezyang
Copy link
Contributor

ezyang commented Apr 27, 2020

Demo that the workaround works:

import torch
import torch.utils.cpp_extension
import torchvision

cpp_source = """
#include <ATen/core/op_registration/op_registration.h>
#include <iostream>

static auto registry =
    torch::RegisterOperators()
        .op("my_ops::add", torch::RegisterOperators::options()
           .kernel(DispatchKey::VariableTensorId, [] (torch::Tensor a, torch::Tensor b) {
            std::cout << a.options() << std::endl;  // debug
            return a + b;
        }))
        ;
"""

module = torch.utils.cpp_extension.load_inline(
    name="inline_jit_extension",
    cpp_sources=cpp_source,
    functions=[],
    verbose=True,
)

a = torch.rand(3, 3, requires_grad=True)
b = torch.rand(3, 3, requires_grad=True)
out = torch.ops.my_ops.add(a, b)
print(out.requires_grad)

smessmer added a commit that referenced this issue Apr 27, 2020
Potentially fixes #37306

Differential Revision: [D21261946](https://our.internmc.facebook.com/intern/diff/D21261946/)

[ghstack-poisoned]
smessmer added a commit that referenced this issue Apr 27, 2020
Potentially fixes #37306

Differential Revision: [D21261946](https://our.internmc.facebook.com/intern/diff/D21261946/)

ghstack-source-id: 102971495
Pull Request resolved: #37355
@inspiros
Copy link
Author

@ezyang Perfect, that should work, for now at least. I saw the new style API static auto register = torch::import("my_ops").def("add", &add) but were not able to load it as operator so I suspect the old API changed; also not sure if this thing is a feature or bug.
Thank you sir!

@colesbury colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 28, 2020
smessmer added a commit that referenced this issue Apr 28, 2020
smessmer added a commit that referenced this issue Apr 28, 2020
smessmer added a commit that referenced this issue Apr 28, 2020
smessmer added a commit that referenced this issue Apr 28, 2020
Pull Request resolved: #37355

Potentially fixes #37306
ghstack-source-id: 103073537

Differential Revision: [D21261946](https://our.internmc.facebook.com/intern/diff/D21261946/)
@smessmer
Copy link
Contributor

smessmer commented Apr 29, 2020

Can you verify that #37355 fixes this? It was just merged into master.

@ezyang
Copy link
Contributor

ezyang commented Apr 29, 2020

Keeping the issue open while we assess if it's necessary for 1.5 point release

@ezyang ezyang reopened this Apr 29, 2020
@ezyang
Copy link
Contributor

ezyang commented Apr 29, 2020

@inspiros OK. Note that the workaround syntax is probably going to stop working when next release rolls around, though I guess we will try harder not to break it now that we know at least one person is using it.

This is definitely a bug in the old API, and the new API inherits the problem too (they use the same underlying implementation).

smessmer added a commit that referenced this issue Apr 30, 2020
Summary:
Pull Request resolved: #37355

Potentially fixes #37306
ghstack-source-id: 103073537

Test Plan: waitforsandcastle

Differential Revision: D21261946

fbshipit-source-id: 454652b528dcf942bec5438f89201822de40bbf0
smessmer added a commit that referenced this issue Apr 30, 2020
Summary:
Pull Request resolved: #37355

Potentially fixes #37306
ghstack-source-id: 103073537

Test Plan: waitforsandcastle

Differential Revision: D21261946

fbshipit-source-id: 454652b528dcf942bec5438f89201822de40bbf0
@gchanan gchanan added this to the 1.5.1 milestone May 5, 2020
@gchanan
Copy link
Contributor

gchanan commented May 28, 2020

closing as this has been merged to 1.5.1.

@gchanan gchanan closed this as completed May 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority module: autograd Related to torch.autograd, and the autograd engine in general module: custom-operators module: internals Related to internal abstractions in c10 and ATen triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

6 participants