ENH: optimize `LbfgsInvHessProduct.todense()`, 10x speed improvement for size 1000

### Is your feature request related to a problem? Please describe.

Using `optimize.minimize(..., method='l-bfgs-b').inv_hess.todense()` ([code](https://github.com/scipy/scipy/blob/c1ed5ece8ffbf05356a22a8106affcd11bd3aee0/scipy/optimize/_lbfgsb_py.py#L474-L494)) I noticed its implementation does $O(n^3)$ operations when it could be $O(n^2)$. I made a few tries at optimizing it. The methods I considered are:

1) modifying `LbfgsInvHessProduct.todense` to avoid matrix by matrix multiplications
2) implementing `LbfgsInvHessProduct._matmat` and doing `obj.matmat(eye(*obj.shape))`
3) using the stock implementation of `LbfgsInvHessProduct._matmat`

Code and benchmark:

```python
from scipy import optimize
import numpy as np

sk, yk = np.random.randn(2, 10, 1000)
invh = optimize.LbfgsInvHessProduct(sk, yk)

def todense(self):
    """ Rewrite LbfgsInvHessProduct.todense avoiding matrix @ matrix """
    s, y, n_corrs, rho = self.sk, self.yk, self.n_corrs, self.rho
    Hk = np.eye(*self.shape, dtype=self.dtype)

    for i in range(n_corrs):
        # H  <--  (I - rho sy') H (I - rho ys') + rho ss' =
        #       = H - rho Hys' - rho sy'H + rho^2 sy'Hys' + rho ss' =
        #       = H - rho Hys' - rho sy'H + rho(1 + rho y'Hy) ss'
        rhoHys = -rho[i] * np.outer(Hk @ y[i], s[i])
        Hk += rho[i] * (1 + rho[i] * (y[i] @ Hk @ y[i])) * np.outer(s[i], s[i])
        Hk += rhoHys
        Hk += rhoHys.T

    return Hk

def _matmat(self, X):
    """ Rewrite LbfgsInvHessProduct._matvec broadcasting on columns of X """
    s, y, n_corrs, rho = self.sk, self.yk, self.n_corrs, self.rho
    Q = np.array(X, dtype=self.dtype, copy=True)
    assert Q.ndim == 2

    alpha = np.empty((n_corrs, Q.shape[1]))

    for i in range(n_corrs-1, -1, -1):
        alpha[i] = rho[i] * np.dot(s[i], Q)
        Q = Q - alpha[i]*y[i][:, np.newaxis]

    R = Q
    for i in range(n_corrs):
        beta = rho[i] * np.dot(y[i], R)
        R = R + s[i][:, np.newaxis] * (alpha[i] - beta)

    return R

inv1 = invh.todense()
inv2 = todense(invh)
inv3 = _matmat(invh, np.eye(*invh.shape))
np.testing.assert_allclose(inv1, inv2, atol=0, rtol=1e-5)
np.testing.assert_allclose(inv1, inv3, atol=0, rtol=1e-5)

%timeit -r1 invh.todense()
%timeit -r1 todense(invh)
%timeit -r1 _matmat(invh, np.eye(*invh.shape))
%timeit -r1 invh.matmat(np.eye(*invh.shape))
```

Benchmark result with n = 1000:

```
583 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
109 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 10 loops each)
48.4 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 10 loops each)
52.5 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 10 loops each)
```

With n = 10:

```
76.5 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 10,000 loops each)
77.2 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 10,000 loops each)
56.1 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 10,000 loops each)
316 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1,000 loops each)
```

With n = 10000:

```
3min 10s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
8.08 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
3.99 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
3.82 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
```

Conclusions: the direct rewrite of `todense` is slower than using the matvec operation, and the existing matmat is slower than the reimplementation for small sizes because it involves a python loop ([code](https://github.com/scipy/scipy/blob/c1ed5ece8ffbf05356a22a8106affcd11bd3aee0/scipy/sparse/linalg/_interface.py#L180-L187)).

I can do a PR for this if interested.

### Describe the solution you'd like.

_No response_

### Describe alternatives you've considered.

_No response_

### Additional context (e.g. screenshots, GIFs)

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: optimize `LbfgsInvHessProduct.todense()`, 10x speed improvement for size 1000 #18105

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe alternatives you've considered.

Additional context (e.g. screenshots, GIFs)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: optimize LbfgsInvHessProduct.todense(), 10x speed improvement for size 1000 #18105

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe alternatives you've considered.

Additional context (e.g. screenshots, GIFs)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

ENH: optimize `LbfgsInvHessProduct.todense()`, 10x speed improvement for size 1000 #18105