Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jacobian接口问题:oneflow的_autograd_grad()函数对比pytorch的_autograd_grad()函数缺乏相关功能 #10397

Closed
lihuizhao opened this issue Jan 4, 2024 · 4 comments · Fixed by #10399
Assignees

Comments

@lihuizhao
Copy link
Contributor

lihuizhao commented Jan 4, 2024

Description

Summary

我尝试在oneflow中添加jacobian接口。参照pytorch的文件,在/oneflow/python/oneflow/autograd/functional.py文件中添加jacobian函数。并对jacobian()函数做测试。代码分支链接:#10393

pytorch下调用torch.autograd.functional.jacobian(func, inputs, vectorize=True)函数时(在vectorize=True的条件下),会调用_autograd_grad()函数计算梯度,该函数调用torch.autograd.grad()函数进行具体计算,在调用torch.autograd.grad()前打印torch.autograd.grad()函数的输入参数new_outputs/new_grad_outputs,可以看到两者的size不同。可以得出结果。

oneflow下调用jacobian(func, inputs, vectorize=True)函数时(在vectorize=True的条件下),同样会调用_autograd_grad()函数计算梯度,该函数调用flow.autograd.grad()函数进行具体计算。我对flow.autograd.grad()函数的输入参数new_outputs/new_grad_outputs打印输出,其格式与pytorch下的格式相同,但在flow.autograd.grad()函数中会对两个参数进行进一步的size验证,此时会报错说size不匹配。

能否让oneflow下的flow.autograd.grad()像pytorch一样工作?

Code to reproduce

import torch as torch_original
import oneflow as flow

def exp_reducer(x):
    return x.exp().sum(dim=1)

inputs = flow.rand(2,2)
torch_inputs = torch_original.tensor(inputs.numpy())

jacobian_res = torch_original.autograd.functional.jacobian(exp_reducer, torch_inputs, vectorize=True)
print("torch jacobian result: ",jacobian_res)

jacobian_res = flow.autograd.jacobian(exp_reducer, inputs, vectorize=True)
print("flow jacobian result: ",jacobian_res)

System Information

  • What is your OneFlow installation (pip, source, dockerhub): source
  • OS: Ubuntu 18.04.6 LTS
  • OneFlow version : clone from github (2023.12.28)
  • Python version: Python 3.8.10
  • CUDA driver version: 535.104.05
  • GPU models: NVIDIA GeForce RTX 3090
  • Other info:

Run result

torch_outputs:  (tensor([2.4533, 4.1342], grad_fn=<ViewBackward0>),)
torch_grad_outputs:  (tensor([[1., 0.],
        [0., 1.]]),)
torch_jacobian_result:  tensor([[[1.3991, 1.0542],
         [0.0000, 0.0000]],

        [[0.0000, 0.0000],
         [1.8361, 2.2982]]])
flow_outputs:  (tensor([2.4533, 4.1342], dtype=oneflow.float32, grad_fn=<view::reshape_backward>),)
flow_grad_outputs:  (tensor([[1., 0.],
        [0., 1.]], dtype=oneflow.float32),)
Traceback (most recent call last):
  File "../../test/test.py", line 44, in <module>
    jacobian_res = flow.autograd.jacobian(exp_reducer, inputs, vectorize=True) # doctest: +ELLIPSIS
  File "/workspace/software/oneflow/python/oneflow/autograd/functional.py", line 759, in jacobian
    jacobians_of_flat_output = vjp(grad_outputs)
  File "/workspace/software/oneflow/python/oneflow/autograd/functional.py", line 744, in vjp
    _autograd_grad(
  File "/workspace/software/oneflow/python/oneflow/autograd/functional.py", line 205, in _autograd_grad
    return flow.autograd.grad(
  File "/workspace/software/oneflow/python/oneflow/autograd/autograd.py", line 63, in grad
    in_grads = grad_api(
oneflow._oneflow_internal.exception.Exception: out_grad's shape must be same as output's ((2,) vs (2,2))
  File "oneflow/api/python/autograd/autograd.cpp", line 113, in Grad
    CheckAndInitOutGrads(outputs, out_grads)
  File "oneflow/api/python/autograd/autograd.cpp", line 73, in CheckAndInitOutGrads
    CHECK_OR_RETURN(*(outputs.at(i)->shape()) == *(out_grads.at(i)->shape()))
Error Type: oneflow.ErrorProto.check_failed_error
@wyg1997
Copy link
Contributor

wyg1997 commented Jan 5, 2024

是否可以直接构造一个调用 autograd.grad 接口的例子呢?我们把这个接口对齐一下,再来验证 jacobian 接口。

@lihuizhao
Copy link
Contributor Author

Summary

torch.autograd.grad()接口与oneflow.autograd.grad()接口对比

Code to reproduce

import torch as torch_original
import oneflow as flow

def exp_reducer(x):
    return x.exp().sum(dim=1)

print("torch result: ")
inputs = torch_original.rand(2,2,requires_grad=True)
outputs = exp_reducer(inputs)


torch_inputs = (inputs,)
torch_outputs = (outputs,)
torch_grad_outputs = (torch_original.eye(2),)
torch_result = torch_original.autograd.grad(
                    torch_outputs,
                    torch_inputs,
                    torch_grad_outputs,
                    allow_unused=True,
                    create_graph=False,
                    retain_graph=None,
                    is_grads_batched=True,
                )
print(torch_result)

print("oneflow result: ")
flow_inputs = (flow.tensor(inputs.detach().numpy(),requires_grad=True),)
flow_outputs = ( exp_reducer(flow_inputs[0]), )
flow_grad_outputs = (flow.eye(2),)
flow_result = flow.autograd.grad(
        flow_outputs,
        flow_inputs,
        flow_grad_outputs,
        allow_unused=True,
        create_graph=False,
        retain_graph=None,
    )
print(flow_result)

Run result

torch result: 
(tensor([[[1.0448, 2.2668],
         [0.0000, 0.0000]],

        [[0.0000, 0.0000],
         [2.5238, 1.5629]]]),)
oneflow result: 
Traceback (most recent call last):
  File "../../test/test.py", line 107, in <module>
    flow_result = flow.autograd.grad(
  File "/workspace/software/oneflow/python/oneflow/autograd/autograd.py", line 63, in grad
    in_grads = grad_api(
oneflow._oneflow_internal.exception.Exception: out_grad's shape must be same as output's ((2,) vs (2,2))
  File "oneflow/api/python/autograd/autograd.cpp", line 113, in Grad
    CheckAndInitOutGrads(outputs, out_grads)
  File "oneflow/api/python/autograd/autograd.cpp", line 73, in CheckAndInitOutGrads
    CHECK_OR_RETURN(*(outputs.at(i)->shape()) == *(out_grads.at(i)->shape()))
Error Type: oneflow.ErrorProto.check_failed_error

@wyg1997
Copy link
Contributor

wyg1997 commented Jan 5, 2024

明白了,是 is_grads_batched 参数支持的这个功能,主要的作用是把 grad 打包只用走一次 AutogradEngine 就可以完成多次后向计算。我后面可以来支持下。

如果着急实现功能的话,这里有一个绕过的方案:既然这里的作用是把 grad 打包,这里就定义一个 batched_autograd_grad 函数,分批次单独计算每个 grad(注意前 n-1 次要把 retain_graph=True ),最后 stack 一下就行。

@wyg1997 wyg1997 self-assigned this Jan 5, 2024
wyg1997 added a commit that referenced this issue Jan 10, 2024
close #10397 

这里只需要在参数检查的时候对 is_grads_batched 做处理就行了,不需要侵入到 AutogradEgnine 里。
实际后向计算的时候,会自己做 broadcast 操作,如果计算错误,是算子对 broadcast 支持的不全。

---------

Co-authored-by: wyg1997 <wangyinggnag@foxmail.com>
@wyg1997
Copy link
Contributor

wyg1997 commented Jan 10, 2024

接口已支持,可以跑一下试试 @lihuizhao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants