-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Closed
Labels
Milestone
Description
Is your feature request related to a problem? Please describe.
Using optimize.minimize(..., method='l-bfgs-b').inv_hess.todense()
(code) I noticed its implementation does
- modifying
LbfgsInvHessProduct.todense
to avoid matrix by matrix multiplications - implementing
LbfgsInvHessProduct._matmat
and doingobj.matmat(eye(*obj.shape))
- using the stock implementation of
LbfgsInvHessProduct._matmat
Code and benchmark:
from scipy import optimize
import numpy as np
sk, yk = np.random.randn(2, 10, 1000)
invh = optimize.LbfgsInvHessProduct(sk, yk)
def todense(self):
""" Rewrite LbfgsInvHessProduct.todense avoiding matrix @ matrix """
s, y, n_corrs, rho = self.sk, self.yk, self.n_corrs, self.rho
Hk = np.eye(*self.shape, dtype=self.dtype)
for i in range(n_corrs):
# H <-- (I - rho sy') H (I - rho ys') + rho ss' =
# = H - rho Hys' - rho sy'H + rho^2 sy'Hys' + rho ss' =
# = H - rho Hys' - rho sy'H + rho(1 + rho y'Hy) ss'
rhoHys = -rho[i] * np.outer(Hk @ y[i], s[i])
Hk += rho[i] * (1 + rho[i] * (y[i] @ Hk @ y[i])) * np.outer(s[i], s[i])
Hk += rhoHys
Hk += rhoHys.T
return Hk
def _matmat(self, X):
""" Rewrite LbfgsInvHessProduct._matvec broadcasting on columns of X """
s, y, n_corrs, rho = self.sk, self.yk, self.n_corrs, self.rho
Q = np.array(X, dtype=self.dtype, copy=True)
assert Q.ndim == 2
alpha = np.empty((n_corrs, Q.shape[1]))
for i in range(n_corrs-1, -1, -1):
alpha[i] = rho[i] * np.dot(s[i], Q)
Q = Q - alpha[i]*y[i][:, np.newaxis]
R = Q
for i in range(n_corrs):
beta = rho[i] * np.dot(y[i], R)
R = R + s[i][:, np.newaxis] * (alpha[i] - beta)
return R
inv1 = invh.todense()
inv2 = todense(invh)
inv3 = _matmat(invh, np.eye(*invh.shape))
np.testing.assert_allclose(inv1, inv2, atol=0, rtol=1e-5)
np.testing.assert_allclose(inv1, inv3, atol=0, rtol=1e-5)
%timeit -r1 invh.todense()
%timeit -r1 todense(invh)
%timeit -r1 _matmat(invh, np.eye(*invh.shape))
%timeit -r1 invh.matmat(np.eye(*invh.shape))
Benchmark result with n = 1000:
583 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
109 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 10 loops each)
48.4 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 10 loops each)
52.5 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 10 loops each)
With n = 10:
76.5 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 10,000 loops each)
77.2 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 10,000 loops each)
56.1 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 10,000 loops each)
316 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1,000 loops each)
With n = 10000:
3min 10s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
8.08 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
3.99 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
3.82 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Conclusions: the direct rewrite of todense
is slower than using the matvec operation, and the existing matmat is slower than the reimplementation for small sizes because it involves a python loop (code).
I can do a PR for this if interested.
Describe the solution you'd like.
No response
Describe alternatives you've considered.
No response
Additional context (e.g. screenshots, GIFs)
No response