Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Efficient Jacobian calculation #8304

Closed
phizaz opened this issue Jun 9, 2018 · 9 comments
Closed

[feature request] Efficient Jacobian calculation #8304

phizaz opened this issue Jun 9, 2018 · 9 comments

Comments

@phizaz
Copy link
Contributor

phizaz commented Jun 9, 2018

Issue description

A function f(x): R^n -> R^m will have Jacobian w.r.t x as [df1(x)/dx, df2(x)/dx, ... df_m(x)/dx] where each df_m(x)/dx is an R^n vector.

As far as I know, Pytorch autograd's library doesn't provide a "one-shot" solution for this calculation. Th current solution is to call torch.autograd.grad multiple times on different parts of the output. This could be slow since it doesn't (presumably) make use of the parallelization of GPUs.

Code example

The current solution I know is:

J = []
F = f(x)
for i in range(len(x)):
    J.append(torch.autograd.grad(F[i], x))
@ssnl
Copy link
Collaborator

ssnl commented Jun 9, 2018

This will be useful, but probably won't be implemented in near future. As is with most autodiff systems, PyTorch has the derivatives formulas for ops written as efficient mappings that do g y -> gx = \{ \sum_i g y_i * d y_i / d x_j \}_j rather than explicitly computing the Jacobian, which will be less efficient in most cases.

@phizaz
Copy link
Contributor Author

phizaz commented Jun 10, 2018

@ssnl I really want to know more about the automatic differentiation, do you have a suggestion reading ?

@phizaz
Copy link
Contributor Author

phizaz commented Jun 10, 2018

@ssnl I have read more about AD. From what I read, the reverse mode AD (which I presume is used in Pytorch) calculates the gradient of the objective function w.r.t. to the previous layer (and so on). It sounds to me that if the function is y = Wx, it should get W during the calculation? That is more or less the thing I want.

@ssnl
Copy link
Collaborator

ssnl commented Jun 10, 2018

@phizaz In your example, the reverse mode AD will calculate gx = gy * (dy/ dx) = W gy, and yes indeed it would obtain W in this example. However, as I mentioned above, it generally doesn't necessarily obtain the Jacobian. Furthermore, it doesn't work by multiplying all Jacobians from each step to get full Jacobian. In other words, for this example, the values fed into the next step is gx rather than W. So eventually you would get from the computation graph g Input = \sum g Output_i * d Output_i / d Input, where Input is the leaves of the graph, and Output is the final result, rather than the Jacobian.

@phizaz
Copy link
Contributor Author

phizaz commented Jun 12, 2018

I think I under stand your point, which is Jacobian-vector product right? I'm just saying that there should be a way to get a Jacobian in those specific trivial cases. Like, y = sigmoid(Wx), Jacobian of dy/dx in this case is simply y * (1- y) * W.

@phizaz phizaz closed this as completed Jun 12, 2018
@ssnl
Copy link
Collaborator

ssnl commented Jun 12, 2018

@phizaz Yeah, unfortunately autograd doesn't do that automatically. You can write them out easily though using explicit formulas! And if you want to, you can use numerical gradient comparison to verify your calculated results.

@phizaz
Copy link
Contributor Author

phizaz commented Jun 12, 2018

If I want to generalize I need to build the graph myself that would be an amount of work! By the way, thanks for your replies!

@sbarratt
Copy link

sbarratt commented May 9, 2019

The following code will do the trick with a single call to backward, taking advantage of when the function takes batched inputs.

@Tikquuss
Copy link

Here are some functions that can help you.

import torch

def gradient(y, x, grad_outputs=None):
    """Compute dy/dx @ grad_outputs"""
    if grad_outputs is None:
        grad_outputs = torch.ones_like(y)
    grad = torch.autograd.grad(y, [x], grad_outputs = grad_outputs, create_graph=True)[0]
    return grad

def jacobian(y, x):
    """Compute dy/dx = dy/dx @ grad_outputs; 
    for grad_outputs in [1, 0, ..., 0], [0, 1, 0, ..., 0], ...., [0, ..., 0, 1]"""
    jac = torch.zeros(y.shape[0], x.shape[0]) 
    for i in range(y.shape[0]):
        grad_outputs = torch.zeros_like(y)
        grad_outputs[i] = 1
        jac[i] = gradient(y, x, grad_outputs = grad_outputs)
    return jac

def divergence(y, x):
    div = 0.
    for i in range(y.shape[-1]):
        div += torch.autograd.grad(y[..., i], x, torch.ones_like(y[..., i]), create_graph=True)[0][..., i:i+1]
    return div

def laplace(y, x):
    grad = gradient(y, x)
    return divergence(grad, x)

Illustration

x = torch.tensor([1., 2., 3.], requires_grad=True)
w = torch.tensor([[1., 2., 3.], [0., 1., -1.]])
b = torch.tensor([1., 2.])
y = torch.matmul(x, w.t()) + b # y = x @ wT + b => y1 = x1 + 2*x2 + 3*x3 + 1 = 15, y2 = x2 - x3 + 2 = 1
dydx = gradient(y, x)  # => jacobian(y, x) @ [1, 1]
jac = jacobian(y, x) 
div = divergence(y, x)
```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants