Skip to content

ENH: optimize LbfgsInvHessProduct.todense(), 10x speed improvement for size 1000 #18105

@Gattocrucco

Description

@Gattocrucco

Is your feature request related to a problem? Please describe.

Using optimize.minimize(..., method='l-bfgs-b').inv_hess.todense() (code) I noticed its implementation does $O(n^3)$ operations when it could be $O(n^2)$. I made a few tries at optimizing it. The methods I considered are:

  1. modifying LbfgsInvHessProduct.todense to avoid matrix by matrix multiplications
  2. implementing LbfgsInvHessProduct._matmat and doing obj.matmat(eye(*obj.shape))
  3. using the stock implementation of LbfgsInvHessProduct._matmat

Code and benchmark:

from scipy import optimize
import numpy as np

sk, yk = np.random.randn(2, 10, 1000)
invh = optimize.LbfgsInvHessProduct(sk, yk)

def todense(self):
    """ Rewrite LbfgsInvHessProduct.todense avoiding matrix @ matrix """
    s, y, n_corrs, rho = self.sk, self.yk, self.n_corrs, self.rho
    Hk = np.eye(*self.shape, dtype=self.dtype)

    for i in range(n_corrs):
        # H  <--  (I - rho sy') H (I - rho ys') + rho ss' =
        #       = H - rho Hys' - rho sy'H + rho^2 sy'Hys' + rho ss' =
        #       = H - rho Hys' - rho sy'H + rho(1 + rho y'Hy) ss'
        rhoHys = -rho[i] * np.outer(Hk @ y[i], s[i])
        Hk += rho[i] * (1 + rho[i] * (y[i] @ Hk @ y[i])) * np.outer(s[i], s[i])
        Hk += rhoHys
        Hk += rhoHys.T

    return Hk

def _matmat(self, X):
    """ Rewrite LbfgsInvHessProduct._matvec broadcasting on columns of X """
    s, y, n_corrs, rho = self.sk, self.yk, self.n_corrs, self.rho
    Q = np.array(X, dtype=self.dtype, copy=True)
    assert Q.ndim == 2

    alpha = np.empty((n_corrs, Q.shape[1]))

    for i in range(n_corrs-1, -1, -1):
        alpha[i] = rho[i] * np.dot(s[i], Q)
        Q = Q - alpha[i]*y[i][:, np.newaxis]

    R = Q
    for i in range(n_corrs):
        beta = rho[i] * np.dot(y[i], R)
        R = R + s[i][:, np.newaxis] * (alpha[i] - beta)

    return R

inv1 = invh.todense()
inv2 = todense(invh)
inv3 = _matmat(invh, np.eye(*invh.shape))
np.testing.assert_allclose(inv1, inv2, atol=0, rtol=1e-5)
np.testing.assert_allclose(inv1, inv3, atol=0, rtol=1e-5)

%timeit -r1 invh.todense()
%timeit -r1 todense(invh)
%timeit -r1 _matmat(invh, np.eye(*invh.shape))
%timeit -r1 invh.matmat(np.eye(*invh.shape))

Benchmark result with n = 1000:

583 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
109 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 10 loops each)
48.4 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 10 loops each)
52.5 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 10 loops each)

With n = 10:

76.5 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 10,000 loops each)
77.2 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 10,000 loops each)
56.1 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 10,000 loops each)
316 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1,000 loops each)

With n = 10000:

3min 10s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
8.08 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
3.99 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
3.82 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

Conclusions: the direct rewrite of todense is slower than using the matvec operation, and the existing matmat is slower than the reimplementation for small sizes because it involves a python loop (code).

I can do a PR for this if interested.

Describe the solution you'd like.

No response

Describe alternatives you've considered.

No response

Additional context (e.g. screenshots, GIFs)

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions