-
Notifications
You must be signed in to change notification settings - Fork 15
Description
following an initially-unrelated discussion with @andrewrosemberg
I think there might be room for (substantial 🤔🤞) performance improvements in the NonLinearProgram code.
Note: what I write below might apply to other classes of problems, I only checked the nonlinear code so far.
TLDR: I think we can replace {a lot of linear system solves} with {a single linear system solve} during forward/reverse diff
Medium-long explanation:
Forward/reverse diff compute Jacobian-vector products of the form J*v or J'*v, and the Jacobian J is of the form K\N, where K, N are matrices. Currently, to obtain w=J*v, we solve KJ = N (matrix RHS), then compute J*v. I believe we can instead compute w = Jv by solving Kw = (Nv), which requires a single linear system solve (with vector RHS).
Why it matters for the ACOPF example I work with, on a 300-bus case with loads as parameters, K has size 13k x 13k, and N is 13k x 200. Replacing K\N with K\(Nv) potentially yields a 200x reduction in memory and time.
Complete thought process and stack traversing.
I'm focusing on forward diff for the sake of example, same remarks apply to reverse diff
- Forward diff returns primal/dual sensitivities in a
ForwCachestructure here:
DiffOpt.jl/src/NonLinearProgram/NonLinearProgram.jl
Lines 534 to 537 in df4a7dc
model.forw_grad_cache = ForwCache(; primal_Δs = Dict(model.cache.primal_vars .=> primal_Δs), dual_Δs = dual_Δs, ) - These are obtained from a left multiplication with Jacobian here
DiffOpt.jl/src/NonLinearProgram/NonLinearProgram.jl
Lines 530 to 532 in df4a7dc
# Extract primal and dual sensitivities primal_Δs = Δs[1:length(model.cache.primal_vars), :] * Δp # Exclude slacks dual_Δs = Δs[cache.index_duals, :] * Δp # Includes constraints and bounds - Said Jacobian
Δsis computed by the_compute_sensitivityfunction. Stepping into that call, said jacobian is obtained from_compute_derivatives_no_relax
∂s, K, N = _compute_derivatives_no_relax( - which then points us to a linear system solve
DiffOpt.jl/src/NonLinearProgram/nlp_utilities.jl
Lines 425 to 427 in df4a7dc
∂s = zeros(size(M, 1), size(N, 2)) # ∂s = - (K \ N) # Sensitivity ldiv!(∂s, K, N)
The alternative implementation I propose would cache the factorization of K (which seems to already be the case), and lazily compute Jacobian-vector products w=J*v by solving w = K\(Nv). The math would be similar (I think) for Jacobian-transpose-vector products