New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] Efficient Jacobian calculation #8304
Comments
This will be useful, but probably won't be implemented in near future. As is with most autodiff systems, PyTorch has the derivatives formulas for ops written as efficient mappings that do |
@ssnl I really want to know more about the automatic differentiation, do you have a suggestion reading ? |
@ssnl I have read more about AD. From what I read, the reverse mode AD (which I presume is used in Pytorch) calculates the gradient of the objective function w.r.t. to the previous layer (and so on). It sounds to me that if the function is |
@phizaz In your example, the reverse mode AD will calculate |
I think I under stand your point, which is Jacobian-vector product right? I'm just saying that there should be a way to get a Jacobian in those specific trivial cases. Like, |
@phizaz Yeah, unfortunately autograd doesn't do that automatically. You can write them out easily though using explicit formulas! And if you want to, you can use numerical gradient comparison to verify your calculated results. |
If I want to generalize I need to build the graph myself that would be an amount of work! By the way, thanks for your replies! |
The following code will do the trick with a single call to backward, taking advantage of when the function takes batched inputs. |
Here are some functions that can help you.
Illustration
|
Issue description
A function
f(x): R^n -> R^m
will have Jacobian w.r.t x as[df1(x)/dx, df2(x)/dx, ... df_m(x)/dx]
where eachdf_m(x)/dx
is anR^n
vector.As far as I know, Pytorch autograd's library doesn't provide a "one-shot" solution for this calculation. Th current solution is to call
torch.autograd.grad
multiple times on different parts of the output. This could be slow since it doesn't (presumably) make use of the parallelization of GPUs.Code example
The current solution I know is:
The text was updated successfully, but these errors were encountered: