-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closures capture storage in extras
#252
Comments
I think that happens because the HVP preparation (used in the extras) itself defines a closure using DifferentiationInterface.jl/DifferentiationInterface/src/second_order/hvp.jl Lines 67 to 74 in 16d93ef
I'm not sure there is anything I can do about it. The As for the other backends, not every pair is compatible with |
Presumably it should be documented in |
It's perhaps also worth pointing out that this interacts with #206. If you can't efficiently differentiate |
I might have been a bit unfair to you, but I had also never imagined someone would modify the function between runs and expect the result to stay the same.
|
julia> using DifferentiationInterface
help?> prepare_hessian
search: prepare_hessian
prepare_hessian(f, backend, x) -> extras
Create an extras object subtyping HessianExtras that can be given to Hessian operators. No hint of danger there.
Nor there. |
Basically all of DI treats the function f as a fixed object. For instance we don't differentiate with respect to its internal fields, only with respect to x. |
Should I copy this warning into each preparation function? Do you have a better wording? |
How about "If the function |
julia> hc = ForwardDiff.HessianConfig(fc.Lx(λprep), x);
julia> ForwardDiff.hessian(fc.Lx(λ), x, hc)
2×2 Matrix{Float64}:
1.0 0.0
0.0 1.0
julia> ForwardDiff.hessian(fc.Lx(λ), x)
2×2 Matrix{Float64}:
1.0 0.0
0.0 1.0
julia> ForwardDiff.hessian(fc.Lx(λprep), x)
2×2 Matrix{Float64}:
3.6 0.0
0.0 3.6 Works just fine. |
Also relevant: julia> typeof(fc.Lx(λ))
var"#3#5"{Vector{Float64}, typeof(f), typeof(c)}
julia> typeof(fc.Lx(λprep))
var"#3#5"{Vector{Float64}, typeof(f), typeof(c)} There is literally nothing different about the type. |
It's not about the type, preparation is specific to the actual function object. And when it's a closure or functor then the contract is that if you change it, you re-prepare the operator. |
It is a little unfortunate: if you're doing constrained optimization, |
While this is about forward-mode, changing memory used by functions is more generally an issue in reverse-mode AD, not just with the preparation of extras: julia> using DifferentiationInterface
julia> import Zygote
julia> A = [1.0 2.0; 3.0 4.0]
2×2 Matrix{Float64}:
1.0 2.0
3.0 4.0
julia> f(x) = sum(A * x);
julia> x = [1.0, 1.0];
julia> gradient(f, AutoZygote(), x)
2-element Vector{Float64}:
4.0
6.0
julia> fill!(A, 1.0)
2×2 Matrix{Float64}:
1.0 1.0
1.0 1.0
julia> gradient(f, AutoZygote(), x)
2-element Vector{Float64}:
2.0
2.0 or in pure Zygote: julia> A = [1.0 2.0; 3.0 4.0]
2×2 Matrix{Float64}:
1.0 2.0
3.0 4.0
julia> y, pb = Zygote.pullback(f, x)
(10.0, Zygote.var"#75#76"{Zygote.Pullback{Tuple{typeof(f), Vector{Float64}}, Tuple{Zygote.ZBack{ChainRules.var"#times_pullback#1476"{Matrix{Float64}, Vector{Float64}}}, Zygote.var"#2989#back#768"{Zygote.var"#762#766"{Vector{Float64}}}, Zygote.var"#1986#back#194"{Zygote.var"#190#193"{Zygote.Context{false}, GlobalRef, Matrix{Float64}}}}}}(∂(f)))
julia> pb(1.0) # evaluate pullback function to compute gradient
([4.0, 6.0],)
julia> fill!(A, 1.0)
2×2 Matrix{Float64}:
1.0 1.0
1.0 1.0
julia> pb(1.0)
([2.0, 2.0],) |
Yep. There's an obvious hack to get around this (copy |
Unfortunately, there is nothing DI can do about this when reverse-mode AD backends use mutable Wengert lists. |
TBH this is the motivating issue for #206. I suspect that constrained optimization is far and away the most common application of second-order expansions of vector-valued functions, so a good alternative would be to address that issue. |
If lambda changes at each iteration, the only solution I see is to make it part of the variable x. Allowing kwargs would not be supported by every backend, far from it |
That might be workable. What you need genuinely is #206 really is more fundamental. |
If you prepare a project with both DifferentiationInterface and ForwardDiff:
yields
The issue seems to be that the specific storage in
λprep
somehow gets captured inextrasL
, corrupting the answer.I've also tried the following:
Enzyme fails outright (
ERROR: Attempting to call an indirect active function whose runtime value is inactive...
). Zygote gives the same numeric error.The text was updated successfully, but these errors were encountered: