Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preparation for second order #86

Open
gdalle opened this issue Mar 21, 2024 · 15 comments
Open

Preparation for second order #86

gdalle opened this issue Mar 21, 2024 · 15 comments
Labels
backend Related to one or more autodiff backends core Related to the core utilities of the package

Comments

@gdalle
Copy link
Owner

gdalle commented Mar 21, 2024

Constructing the right extras becomes very tricky when different inner/outer backends must be called in various ways on closure functions

@gdalle gdalle added backend Related to one or more autodiff backends core Related to the core utilities of the package labels Mar 28, 2024
@gdalle
Copy link
Owner Author

gdalle commented Apr 2, 2024

So here's where I'm at. The typical structure of a second-order operator is:

function second_order_operator(f, backend::SecondOrder, x)
    function inner_operator_closure(z)
        inner_extras = prepare_inner_operator(f, inner(backend), z)
        return inner_operator(f, inner(backend), z, inner_extras)
    end
    outer_extras = prepare_outer_operator(inner_operator_closure, outer(backend), x)
    return outer_operator(inner_operator_closure, outer(backend), x, outer_extras)
end

It's hard to prepare the extras because

  • the inner operator is called on a variable z generated during the outer operator, so it may not have the same type as x. Typically, it might be a vector of Duals instead of a vector of numbers.
  • the outer operator extras depend on what is called, in this case a prepared inner operator closure (if it is not prepared it might take a different path, which means the outer operator tape would be wrong)
  • in one case (reverse-over-forward HVP), the inner operator closure closes over the vector v in addition to the rest, so the preparation signature may need to look different

My current suggested workflow for preparation (disregarding the v thing for now):

  1. Define a function wrapper InputCopier which deepcopies and stores the first thing on which it is called, so that we can see what z is like inside the outer operator
  2. Define the inner operator closure
  3. Wrap it in an InputCopier
  4. Call the outer operator on this, now we have the type of z
  5. Prepare the inner operator closure with z
  6. Prepare the outer operator on the prepared inner operator closure

@gdalle
Copy link
Owner Author

gdalle commented Apr 3, 2024

Actually this will not work because

  • some backends don't even call the underlying function as-is
  • some only call it during their own preparation step

@gdalle
Copy link
Owner Author

gdalle commented Apr 3, 2024

Partially solved by #135 where the outer differentiation is prepared, but not the inner one. I think it is close to optimal

@adrhill
Copy link
Collaborator

adrhill commented Apr 3, 2024

I see how it is difficult for us to provide default fallbacks for the inner preparation.

How about allowing people to manually deal with the inner preparation by adding an inner_XYZ_extras field to the HVPExtras and defaulting to NoXYZExtras?

@gdalle
Copy link
Owner Author

gdalle commented Apr 5, 2024

Possibly but that would be a very advanced use, and my take is that plenty of things will fail when people first try out the HVP, so optimizing performance that way is not high-priority for me.

Besides, for reverse mode backends which do not require preparation and work out of place (Zygote, Tracker), this is already optimal

@gdalle
Copy link
Owner Author

gdalle commented May 30, 2024

In the end I think the easiest approach is to have a mutable extras prepared on the first run, like so: https://discourse.julialang.org/t/second-order-autodiff-which-combinations-should-work/114892/12

@adrhill
Copy link
Collaborator

adrhill commented May 30, 2024

This sounds reasonable to me. To play the devil's advocate: on which backends are mutable extras doable and more performant than allocating new extras?

@gdalle
Copy link
Owner Author

gdalle commented May 30, 2024

I really can't think of any scenario where modifying a field of a mutable struct is more costly than essentially re-creating that field from scratch

@adrhill
Copy link
Collaborator

adrhill commented May 30, 2024

Sure, but for which backends is it possible?

(And while it might not be more costly, it should be strictly less costly to warrant the increase in code complexity.)

@gdalle
Copy link
Owner Author

gdalle commented May 30, 2024

It is doable on all backends. It's not the extras itself you mutate, it's just a field from a wrapper. Here's an example:

mutable struct InnerGradientWrapper{F,B}
    const f::F
    const backend::B
    extras::Union{Nothing,GradientExtras}  # type-unstable
end

function (igw::InnerGradientWrapper)(x::AbstractVector)
    if isnothing(igw.extras)
        igw.extras = prepare_gradient(igw.f, igw.backend, x)
    end
    return gradient(igw.f, igw.backend, x, igw.extras)
end

@gdalle
Copy link
Owner Author

gdalle commented May 30, 2024

I'm just wondering how much the type instability will hurt us here

@gdalle gdalle linked a pull request May 31, 2024 that will close this issue
@gdalle
Copy link
Owner Author

gdalle commented May 31, 2024

Tried it in #291 but the problem is that changing this extras object modifies the inner state of our gradient closure. As a result, outer preparation becomes invalid

@adrhill
Copy link
Collaborator

adrhill commented May 31, 2024

I'm just wondering how much the type instability will hurt us here

How about the following?

mutable struct InnerGradientWrapper{F,B,E<:Union{Nothing,GradientExtras}}
    const f::F
    const backend::B
    extras::E
end

@adrhill
Copy link
Collaborator

adrhill commented May 31, 2024

Tried it in #291 but the problem is that changing this extras object modifies the inner state of our gradient closure. As a result, outer preparation becomes invalid

Could you give an example? This is not clear to me from reading the diff in #291 and the PR contains no further comments.

@gdalle
Copy link
Owner Author

gdalle commented May 31, 2024

It's the same discussion that we have had for SparseConnectivityTracer and in #252. The InnerGradientWrapper is a closure that changes its state between calls, so reusing preparation is invalid for the outer backend which differentiates through it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Related to one or more autodiff backends core Related to the core utilities of the package
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants