Closures capture storage in `extras` #252

timholy · 2024-05-10T08:13:00Z

If you prepare a project with both DifferentiationInterface and ForwardDiff:

using DifferentiationInterface
using ForwardDiff

struct FC{F,C,LT}
    f::F
    c::C
    Lx::LT

    function FC(f, c)
        # Define the Lagrange function.
        # Compared to defining it locally in downstream "consumer" functions,
        # this ensures it's the same function-type each time. Aside from
        # reducing useless re-compilation, the main impact is that we can use
        # DifferentiationInterface's `prepare_*` functions to get better autodiff
        # performance.
        function Lx(λ)
            x -> f(x) - λ'*c(x)
        end
        new{typeof(f),typeof(c),typeof(Lx)}(f, c, Lx)
    end
end

# Powell's Maratos-effect test problem
f(x) = 2 * (x[1]^2 + x[2]^2 - 1) - x[1]
c(x) = [x[1]^2 + x[2]^2 - 1]
fc = FC(f, c)

θ = π/12
x = [cos(θ), sin(θ)]
λ = [3/2]
λprep = [0.2]

backend = SecondOrder(AutoForwardDiff(), AutoForwardDiff())
extrasL = prepare_hessian(fc.Lx(λprep), backend, x)
H1 = hessian(fc.Lx(λ), backend, x)
H2 = hessian(fc.Lx(λ), backend, x, extrasL)
copyto!(λprep, λ)
H3 = hessian(fc.Lx(λ), backend, x, extrasL)
@assert H1 ≈ H2
@assert H1 ≈ H3

yields

⋮
julia> H1 = hessian(fc.Lx(λ), backend, x)
2×2 Matrix{Float64}:
 1.0  0.0
 0.0  1.0

julia> H2 = hessian(fc.Lx(λ), backend, x, extrasL)
2×2 Matrix{Float64}:
 3.6  0.0
 0.0  3.6

julia> copyto!(λprep, λ)
1-element Vector{Float64}:
 1.5

julia> H3 = hessian(fc.Lx(λ), backend, x, extrasL)
2×2 Matrix{Float64}:
 1.0  0.0
 0.0  1.0

julia> @assert H1 ≈ H2
ERROR: AssertionError: H1 ≈ H2
Stacktrace:
 [1] top-level scope
   @ REPL[55]:1

julia> @assert H1 ≈ H3

The issue seems to be that the specific storage in λprep somehow gets captured in extrasL, corrupting the answer.

I've also tried the following:

    # backend = SecondOrder(AutoEnzyme(Enzyme.Forward), AutoEnzyme(Enzyme.Reverse))
    # backend = SecondOrder(AutoForwardDiff(), AutoEnzyme(Enzyme.Reverse))
    # backend = SecondOrder(AutoForwardDiff(), AutoZygote())

Enzyme fails outright (ERROR: Attempting to call an indirect active function whose runtime value is inactive...). Zygote gives the same numeric error.

The text was updated successfully, but these errors were encountered:

gdalle · 2024-05-10T09:20:59Z

I think that happens because the HVP preparation (used in the extras) itself defines a closure using f:

DifferentiationInterface.jl/DifferentiationInterface/src/second_order/hvp.jl

Lines 67 to 74 in 16d93ef

    
           function prepare_hvp_aux(f::F, backend::SecondOrder, x, v, ::ForwardOverForward) where {F} 
        
               # pushforward of many pushforwards in theory, but pushforward of gradient in practice 
        
               inner_gradient_closure(z) = gradient(f, inner(backend), z) 
        
               outer_pushforward_extras = prepare_pushforward( 
        
                   inner_gradient_closure, outer(backend), x, v 
        
               ) 
        
               return ForwardOverForwardHVPExtras(inner_gradient_closure, outer_pushforward_extras) 
        
           end

I'm not sure there is anything I can do about it. The extras are specific to a given couple (f, backend), so if you modify the function (even through a sneaky closure over λprep), I feel like you deserve your fate?

As for the other backends, not every pair is compatible with SecondOrder. We already knew that for a few pairs, but DI allows more systematic testing, and thus more systematic disappointment

timholy · 2024-05-10T09:56:36Z

Presumably it should be documented in prepare_hessian and any similarly-affected functions? "I feel like you deserve your fate" applies only to people who understand enough about the internals to realize this is a predictable problem.

timholy · 2024-05-10T11:26:08Z

It's perhaps also worth pointing out that this interacts with #206. If you can't efficiently differentiate L(x) then a good alternative is to compute the "hessian" of both f and c (note that c is vector-valued because usually you have many constraints) and from that you can contract with λ.

gdalle · 2024-05-10T11:28:07Z

I might have been a bit unfair to you, but I had also never imagined someone would modify the function between runs and expect the result to stay the same.
How would you phrase it more precisely than the current doc?

The preparation prepare_operator(f, backend, x) will create an object called extras containing the necessary information to speed up operator and its variants. This information is specific to backend and f, as well as the type and size of the input x and the control flow within the function, but it should work with different values of x.
The extras object is nearly always mutated when given to an operator, even when said operator does not have a bang ! in its name.

timholy · 2024-05-10T11:29:08Z

julia> using DifferentiationInterface

help?> prepare_hessian
search: prepare_hessian

  prepare_hessian(f, backend, x) -> extras

  Create an extras object subtyping HessianExtras that can be given to Hessian operators.

No hint of danger there.

help?> DifferentiationInterface.HessianExtras
  HessianExtras

  Abstract type for additional information needed by Hessian operators.

Nor there.

gdalle · 2024-05-10T11:30:54Z

Basically all of DI treats the function f as a fixed object. For instance we don't differentiate with respect to its internal fields, only with respect to x.

gdalle · 2024-05-10T11:32:47Z

The preparation prepare_operator(f, backend, x) will create an object called extras containing the necessary information to speed up operator and its variants. This information is specific to backend and f, as well as the type and size of the input x and the control flow within the function, but it should work with different values of x.

Should I copy this warning into each preparation function? Do you have a better wording?

gdalle · 2024-05-10T11:34:34Z

How about

"If the function f changes in any way, the result of preparation will be invalidated, and you will need to run it again"

timholy · 2024-05-10T11:38:57Z

julia> hc = ForwardDiff.HessianConfig(fc.Lx(λprep), x);

julia> ForwardDiff.hessian(fc.Lx(λ), x, hc)
2×2 Matrix{Float64}:
 1.0  0.0
 0.0  1.0

julia> ForwardDiff.hessian(fc.Lx(λ), x)
2×2 Matrix{Float64}:
 1.0  0.0
 0.0  1.0

julia> ForwardDiff.hessian(fc.Lx(λprep), x)
2×2 Matrix{Float64}:
 3.6  0.0
 0.0  3.6

Works just fine.

timholy · 2024-05-10T11:41:01Z

Also relevant:

julia> typeof(fc.Lx(λ))
var"#3#5"{Vector{Float64}, typeof(f), typeof(c)}

julia> typeof(fc.Lx(λprep))
var"#3#5"{Vector{Float64}, typeof(f), typeof(c)}

There is literally nothing different about the type.

gdalle · 2024-05-10T11:49:17Z

It's not about the type, preparation is specific to the actual function object. And when it's a closure or functor then the contract is that if you change it, you re-prepare the operator.
I thought it was natural, I was wrong, so I'll add it to the docstrings

timholy · 2024-05-10T11:55:11Z

It is a little unfortunate: if you're doing constrained optimization, λ is likely to change on every iteration. Presumably there is no point in preparation?

adrhill · 2024-05-10T11:58:08Z

While this is about forward-mode, changing memory used by functions is more generally an issue in reverse-mode AD, not just with the preparation of extras:

julia> using DifferentiationInterface

julia> import Zygote

julia> A = [1.0 2.0; 3.0 4.0]
2×2 Matrix{Float64}:
 1.0  2.0
 3.0  4.0

julia> f(x) = sum(A * x);

julia> x = [1.0, 1.0];

julia> gradient(f, AutoZygote(), x)
2-element Vector{Float64}:
 4.0
 6.0

julia> fill!(A, 1.0)
2×2 Matrix{Float64}:
 1.0  1.0
 1.0  1.0

julia> gradient(f, AutoZygote(), x)
2-element Vector{Float64}:
 2.0
 2.0

or in pure Zygote:

julia> A = [1.0 2.0; 3.0 4.0]
2×2 Matrix{Float64}:
 1.0  2.0
 3.0  4.0

julia> y, pb = Zygote.pullback(f, x)
(10.0, Zygote.var"#75#76"{Zygote.Pullback{Tuple{typeof(f), Vector{Float64}}, Tuple{Zygote.ZBack{ChainRules.var"#times_pullback#1476"{Matrix{Float64}, Vector{Float64}}}, Zygote.var"#2989#back#768"{Zygote.var"#762#766"{Vector{Float64}}}, Zygote.var"#1986#back#194"{Zygote.var"#190#193"{Zygote.Context{false}, GlobalRef, Matrix{Float64}}}}}}(∂(f)))

julia> pb(1.0) # evaluate pullback function to compute gradient
([4.0, 6.0],)

julia> fill!(A, 1.0)
2×2 Matrix{Float64}:
 1.0  1.0
 1.0  1.0

julia> pb(1.0)
([2.0, 2.0],)

timholy · 2024-05-10T12:00:05Z

Yep. There's an obvious hack to get around this (copy λ into a constant buffer), but of course that introduces new concerns about thread-safety.

adrhill · 2024-05-10T12:01:55Z

Unfortunately, there is nothing DI can do about this when reverse-mode AD backends use mutable Wengert lists.

timholy · 2024-05-10T12:02:15Z

TBH this is the motivating issue for #206. I suspect that constrained optimization is far and away the most common application of second-order expansions of vector-valued functions, so a good alternative would be to address that issue.

gdalle · 2024-05-10T13:13:20Z

If lambda changes at each iteration, the only solution I see is to make it part of the variable x. Allowing kwargs would not be supported by every backend, far from it

timholy · 2024-05-10T13:35:05Z

That might be workable. What you need genuinely is $\nabla^2_{x,\lambda} L$, see https://www.math.uci.edu/~qnie/Publications/NumericalOptimization.pdf#page=551. Thus you don't actually need keywords. But it's very wasteful to compute the second derivatives in $\lambda$ because it's linear in $\lambda$.

#206 really is more fundamental.

gdalle mentioned this issue May 12, 2024

Add warning to preparation #253

Merged

gdalle added wontfix This will not be worked on core Related to the core utilities of the package labels May 14, 2024

gdalle mentioned this issue May 18, 2024

Useful packages for (sparse) differentiation? JuliaSmoothOptimizers/ADNLPModels.jl#229

Open

timholy closed this as completed May 18, 2024

This was referenced May 25, 2024

Second order for vector to vector functions #206

Open

Preparing functions of 2 variables with respect to just one. #286

Closed

Preparation for second order #86

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closures capture storage in `extras` #252

Closures capture storage in `extras` #252

timholy commented May 10, 2024 •

edited

Loading

gdalle commented May 10, 2024 •

edited

Loading

timholy commented May 10, 2024 •

edited

Loading

timholy commented May 10, 2024

gdalle commented May 10, 2024

timholy commented May 10, 2024 •

edited

Loading

gdalle commented May 10, 2024 •

edited

Loading

gdalle commented May 10, 2024

gdalle commented May 10, 2024

timholy commented May 10, 2024

timholy commented May 10, 2024

gdalle commented May 10, 2024

timholy commented May 10, 2024

adrhill commented May 10, 2024

timholy commented May 10, 2024

adrhill commented May 10, 2024 •

edited

Loading

timholy commented May 10, 2024

gdalle commented May 10, 2024

timholy commented May 10, 2024 •

edited

Loading

Closures capture storage in extras #252

Closures capture storage in extras #252

Comments

timholy commented May 10, 2024 • edited Loading

gdalle commented May 10, 2024 • edited Loading

timholy commented May 10, 2024 • edited Loading

timholy commented May 10, 2024

gdalle commented May 10, 2024

timholy commented May 10, 2024 • edited Loading

gdalle commented May 10, 2024 • edited Loading

gdalle commented May 10, 2024

gdalle commented May 10, 2024

timholy commented May 10, 2024

timholy commented May 10, 2024

gdalle commented May 10, 2024

timholy commented May 10, 2024

adrhill commented May 10, 2024

timholy commented May 10, 2024

adrhill commented May 10, 2024 • edited Loading

timholy commented May 10, 2024

gdalle commented May 10, 2024

timholy commented May 10, 2024 • edited Loading

Closures capture storage in `extras` #252

Closures capture storage in `extras` #252

timholy commented May 10, 2024 •

edited

Loading

gdalle commented May 10, 2024 •

edited

Loading

timholy commented May 10, 2024 •

edited

Loading

timholy commented May 10, 2024 •

edited

Loading

gdalle commented May 10, 2024 •

edited

Loading

adrhill commented May 10, 2024 •

edited

Loading

timholy commented May 10, 2024 •

edited

Loading