Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hessian of a vector function? #61

Closed
GravityAssisted opened this issue Sep 29, 2015 · 13 comments
Closed

Hessian of a vector function? #61

GravityAssisted opened this issue Sep 29, 2015 · 13 comments

Comments

@GravityAssisted
Copy link

Hi,

Is it possible to calculate Hessian of a vector valued function ?

E.g, lets say I have a function:
f(x::Vector) = x./(sqrt(x[1]^2+x[2]^2+x[3]^3))

I can easily calculate its Jacobian by:
ForwardDiff.jacobian(f,rand(3))

but I get a convert error when I try to calculate Hessian. I assume this is because the hessian is expecting a scalar function. Is there trick to get around this ?

The hessian for the above example would be a 3d tensor, with each matrix component corresponding to a hessian w.r.t. to each output component of the function.

@mlubin
Copy link
Contributor

mlubin commented Sep 29, 2015

No reason this shouldn't be possible, I guess you're the first one to request it

@mlubin
Copy link
Contributor

mlubin commented Sep 29, 2015

I'm not too familiar with terminology in this case, is it still typically called the Hessian?

@GravityAssisted
Copy link
Author

Its commonly called a vector hessian, I think. In our field we just call it a hessian of a vector function. Not sure if there is an official name for this. Wikipedia has a small section on this called "vector -valued function":

https://en.wikipedia.org/wiki/Hessian_matrix

@jrevels
Copy link
Member

jrevels commented Sep 29, 2015

This isn't currently implemented, but like @mlubin said, it's technically possible. If we do decide to provide a built-in function for this, though, we'd have to do some thinking about how it'd fit in the API. @johnmyleswhite any API opinions on this?

My initial reaction would be to provide a vechessian function, or something like that, rather than putting more burden on hessian.

@mlubin
Copy link
Contributor

mlubin commented Sep 29, 2015

Agreed on a separate name for this.

@GravityAssisted
Copy link
Author

This type of hessian common in our field of astrodynamics, due to gravity being a vector valued function.

Thanks for looking into this.

@johnmyleswhite
Copy link
Contributor

+1 to a separate name. No idea what to call these things. For extensibility maybe we should go with an API that passes in types as arguments so we can reuse names more often?

@jrevels
Copy link
Member

jrevels commented Sep 29, 2015

You mean something like:

hessian(f, x, ::Type{Number}) # expect f to be scalar valued
hessian(f, x, ::Type{Vector}) # expect f to be vector valued

...or did you mean passing in types so that we can better distinguish method signatures with name conflicts (a la ForwardDiff.gradient vs. Calculus.gradient)?

Somewhat off-topic: Funny story, my initial API redesign for ForwardDiff passed all the config info via abstract types (chunk_size ---> ChunkSize{n} etc). Once a couple of options had been added, I tried to switch to keyword args so that users wouldn't have to remember complicated default arg ordering. That's when I found out that keyword arg type inference is wonky and settled on the current API + assertions in problem areas to help out inferencing.

@johnmyleswhite
Copy link
Contributor

Yeah, that is what I had in mind. Stefan was also talking about the idea of passing the type of differentiation as a type to make things even more generic, but that does start to be a whole lot of types.

Keyword args do need another pass at their design from what I can see.

@cortner
Copy link

cortner commented Dec 1, 2015

In our case, we would like to take 1st, 2nd and 3rd derivatives of a Hamiltonian matrix, i.e.
f(x::Vector) -> Matrix
More generally, all of the differentiation implemented to far are special cases of
f : d-tensor -> m-tensor
where each derivative increases the dimension of the output tensor. Bearing in mind that I have zero experience with AD, I wonder can a general abstract setting like this be implemented and gradient, hessian, jacobian, etc just become special cases?

Christoph

@shabbychef
Copy link

One way to implement the multivariate case is to represent the derivative simply as a matrix, with rows equal to the total number of elements in the dependent variable, and columns equal to the number of elements in the independent variable (so called 'numerator layout'). The user (or intermediary code) has to interpret this when needed in terms of the original dimensionality of the independent and dependent variables, but it is not too much of a pain. This does not, I think, get you much towards computing higher order derivatives. In this scheme the second derivative would be represented as 3-dimensional array, not taking advantage of the symmetry in the second and third dimensions.

Full disclosure notice: I've implemented a multivariate AD system in R (and am lurking here looking for good ideas to steal), and considered supporting higher order derivatives. I concluded this would require I compute arbitrary order derivatives for every method I supported on my objects, which was unpleasant at best. For element-wise operations, I could rely on elementary calculus, but computing, say, the second derivative of the matrix inverse operation was too hairy for my tastes. I abandoned higher order derivatives in favor of supporting multivariate operations. I am hoping ForwardDiff devs find a clever solution to this problem.

@jrevels
Copy link
Member

jrevels commented Jun 27, 2016

So I revisited this with the latest version of ForwardDiff and lo-and-behold:

using ForwardDiff

f(x) = x./(sqrt(x[1]^2+x[2]^2+x[3]^3))

function vector_hessian(f, x)
    n = length(x)
    out = ForwardDiff.jacobian(x -> ForwardDiff.jacobian(f, x), x)
    return reshape(out, n, n, n)
end
julia> vector_hessian(f, [1,2,3])
3×3×3 Array{Float64,3}:
[:, :, 1] =
 -0.0160549  -0.0100127   -0.067586
 -0.0100127  -0.00345267   0.0139833
 -0.0150191   0.0031074    0.0154507

[:, :, 2] =
 -0.0100127   -0.00345267   0.0139833
 -0.00345267  -0.0290024   -0.046611
  0.0031074   -0.010358     0.0309014

[:, :, 3] =
 -0.067586    0.0139833   0.0446689
  0.0139833  -0.046611    0.0893378
  0.0154507   0.0309014  -0.0151486

@GravityAssisted Is this kind of output you expect here? If this is delivering the correct result, you can tweak vector_hessian to your liking using ForwardDiff.jacobian/ForwardDiff.jacobian! (e.g. to support f!(y, x) as well as f(x)).

@timholy
Copy link
Contributor

timholy commented Apr 17, 2024

Note that a better way to write that reshape of the Hessian of a vector-valued function is

reshape(out, (eachindex(fx), eachindex(x), eachindex(x)))

where fx = f(x). This makes it clearer that the first index is for the output of f, and the last two are for the coordinates of x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants