Skip to content

Conversation

ilayn
Copy link
Member

@ilayn ilayn commented Jun 22, 2024

Recently, in #18570, we have swapped the old F77 nnls solver with a Python version of the so-called Fast NNLS based on Bro & de Jong paper. As we have received a few reports, the FNNLS description of the paper, as it turns out, leaves a lot of details out and there is no obvious way to fix it without switching to "research" mode.

I went back to the original F77 code and converted to Cython. The current tests are now passing without any issues. I can feel how to generalize to Bro & De Jong results but frankly I will have no appetite for a while. The F77 code is unbearable and can't find any examples to make it fail so far (In English, lack of edge problems). Also there is nothing in common with the text version and the actual implementation as can be seen as L&H description of this PR (hence absolute madness)

image

Reference issue

Closes #20813
Closes #20984
Closes #21140

Reverted the changes we did in #20961 (@h-vetinari it would be great if you can test it if you have any capacity)

For @jonasgitabap 's case in #20984, here are the results;

THIS PR : 0.9998772286451303 1.15.0.dev0+960.95f8d01 8640ms
RELEASE: <did not finish after 10 minutes...>

Unfortunately, this old code does not use any tolerances, but I have introduced the atol keyword for the FNNLS and that needs an agreement. Apologies for introducing it prematurely in this fashion. I should not have relied to the paper this much.

@ilayn ilayn added enhancement A new feature or improvement scipy.optimize maintenance Items related to regular maintenance tasks Cython Issues with the internal Cython code base labels Jun 22, 2024
@ilayn ilayn requested a review from andyfaff as a code owner June 22, 2024 00:23
@github-actions github-actions bot added the Meson Items related to the introduction of Meson as the new build system for SciPy label Jun 22, 2024
@ilayn
Copy link
Member Author

ilayn commented Jun 22, 2024

I also dare to ping (and possibly annoy) @jvendrow and @LucasBoTang in case they are interested in improving this solver using Bro & De Jong algorithm.

In the current release of scipy we (meaning I) have translated the FNNLS (possibly in erroneous form) here;

def _nnls(A, b, maxiter=None, tol=None):
"""
This is a single RHS algorithm from ref [2] above. For multiple RHS
support, the algorithm is given in :doi:`10.1002/cem.889`
"""
m, n = A.shape
AtA = A.T @ A
Atb = b @ A # Result is 1D - let NumPy figure it out
if not maxiter:
maxiter = 3*n
if tol is None:
tol = 10 * max(m, n) * np.spacing(1.)
# Initialize vars
x = np.zeros(n, dtype=np.float64)
s = np.zeros(n, dtype=np.float64)
# Inactive constraint switches
P = np.zeros(n, dtype=bool)
# Projected residual
w = Atb.copy().astype(np.float64) # x=0. Skip (-AtA @ x) term
# Overall iteration counter
# Outer loop is not counted, inner iter is counted across outer spins
iter = 0
while (not P.all()) and (w[~P] > tol).any(): # B
# Get the "most" active coeff index and move to inactive set
k = np.argmax(w * (~P)) # B.2
P[k] = True # B.3
# Iteration solution
s[:] = 0.
# B.4
with warnings.catch_warnings():
warnings.filterwarnings('ignore', message='Ill-conditioned matrix',
category=LinAlgWarning)
s[P] = solve(AtA[np.ix_(P, P)], Atb[P], assume_a='sym', check_finite=False)
# Inner loop
while (iter < maxiter) and (s[P].min() < 0): # C.1
iter += 1
inds = P * (s < 0)
alpha = (x[inds] / (x[inds] - s[inds])).min() # C.2
x *= (1 - alpha)
x += alpha*s
P[x <= tol] = False
with warnings.catch_warnings():
warnings.filterwarnings('ignore', message='Ill-conditioned matrix',
category=LinAlgWarning)
s[P] = solve(AtA[np.ix_(P, P)], Atb[P], assume_a='sym',
check_finite=False)
s[~P] = 0 # C.6
x[:] = s[:]
w[:] = Atb - AtA @ x
if iter == maxiter:
# Typically following line should return
# return x, np.linalg.norm(A@x - b), -1
# however at the top level, -1 raises an exception wasting norm
# Instead return dummy number 0.
return x, 0., -1
return x, np.linalg.norm(A@x - b), 1

This caused some accuracy issues and at certain cases it spins too many times signaling a chattering behavior in the active passive set swaps (#20168). I know there are issues with it but I don't have any time to dig deep into the literature. Hence this PR; a cythonized version of original NNLS. It is quite fast up until n=100 and keeps up then quickly deteriorates performance as expected.

This is the fnnls performance result for the this current PR

image

and the scipy implementation that is currently released (using the aforementioned implementation)

image

The original fnnls.m is here but I'm not sure if it is fit for production code https://web.archive.org/web/20010708150016/http://www.models.kvl.dk/source/nnls/fnnls.m

If you have a robust implementation of FNNLS, it would be great if you can help us with it. I can happily help you converting it to Cython and with other details. Please let us know.

@mdhaber
Copy link
Contributor

mdhaber commented Jun 23, 2024

Just trying to help by testing. Just to be sure I understand:

but I have introduced the atol keyword for the FNNLS...

So in this PR, atol is not used, right? (That is what I observed.)

Some observations:

On my machine, the assertion fails:

import numpy as np
from scipy import optimize

A = [[0.004734199143798789, -0.09661916455815653, -0.04308779048103441, 0.4039475561867938,
      -0.27742598780954364, -0.20816924034369574, -0.17264070902176, 0.05251808558963846],
     [-0.030263548855047975, -0.30356483926431466, 0.18080406600591398, -0.06892233941254086,
      -0.41837298885432317, 0.30245352819647003, -0.19008975278116397, -0.00990809825429995],
     [-0.2561747595787612, -0.04376282125249583, 0.4422181991706678, -0.13720906318924858,
      -0.0069523811763796475, -0.059238287107464795, 0.028663214369642594, 0.5415531284893763],
     [0.2949336072968401, 0.33997647534935094, 0.38441519339815755, -0.306001783010386,
      0.18120773805949028, -0.36669767490747895, -0.021539960590992304, -0.2784251712424615],
     [0.5009075736232653, -0.20161970347571165, 0.08404512586550646, 0.2520496489348788,
      0.14812015101612894, -0.25823455803981266, -0.1596872058396596, 0.5960141613922691]]
b = [18.036779281222124, -18.126530733870887, 13.535652034584029, -2.6654275476795966,
     9.166315328199575]

A = np.asarray(A)
b = np.asarray(b)

x, res = optimize.nnls(A, b)
np.testing.assert_allclose(np.linalg.norm(A@x - b), res, rtol=1e-15)

# Mismatched elements: 1 / 1 (100%)
# Max absolute difference: 8.49127064
# Max relative difference: inf
# x: array(8.491271)
#  y: array(0.)

It was a bit puzzling at first, but I realized that the function is mutating A.

Even after passing copies into the function, I'm finding that the assertion still fails (meaningfully - i.e. with O(1) relative error) sometimes. I'd suggest comparing the residuals returned by the function against A@x - b for the problems you're generating. Do they always agree for you, or am I generating pathological problems?

Ah, here's an example:

import numpy as np
from scipy import optimize

A = [[0.2508259992635229, -0.24031300195203256],
     [0.510647748500133, 0.2872936081767836],
     [0.8196387904102849, -0.03520620107046682],
     [0.030739759120097084, -0.07768656359879388]]
b = [24.456141951303913, 28.047143273432333, 41.10526799545987, -1.2078282698324068]
A = np.asarray(A)
b = np.asarray(b)
x, res = optimize.nnls(A.copy(), b.copy())
np.testing.assert_allclose(np.linalg.norm(A@x - b), res, rtol=1e-15)

# Not equal to tolerance rtol=1e-15, atol=0
# Mismatched elements: 1 / 1 (100%)
# Max absolute difference: 9.78234068
# Max relative difference: 1.04348544
#  x: array(19.157019)
#  y: array(9.374679)

In case you're interested in silly edge cases:

optimize.nnls(np.empty((1, 0)), np.empty(1))
# (array([], dtype=float64), 29.732883325343384)

Good news is that I'm finding that this PR and main get essentially the same residual (computed directly - not returned by the function) on all problems, and this PR tends to be much faster for slow problems.

In case you're interested in another random problem generator:

import numpy as np
from time import perf_counter
from scipy import optimize
rng = np.random.default_rng(2549824598234528)

from sklearn.datasets import make_regression

residuals = []
times = []

for i in range(100):
    n_samples = rng.integers(1, 1000)
    n_features = rng.integers(1, 1000)
    n_informative = max(1, int(rng.random() * n_features))
    effective_rank = max(1, int(rng.random() * min(n_features, n_samples)))
    tail_strength = rng.random()
    noise = rng.random() * 10
    random_state = rng.integers(1, 100000)

    A, b = make_regression(n_samples=n_samples, n_features=n_features,
                           n_informative=n_informative, effective_rank=effective_rank,
                           tail_strength=tail_strength, noise=noise,
                           random_state=random_state)
    b = np.atleast_1d(b)
    try:
        tic = perf_counter()
        x, res = optimize.nnls(A.copy(), b.copy(), maxiter=100*n_features)
        toc = perf_counter()
        res = np.linalg.norm(A@x - b)
        # np.testing.assert_allclose(np.linalg.norm(A@x - b), res, rtol=1e-15, atol=1e-12)
        times.append(toc-tic)
        residuals.append(res)
        print(i, toc-tic)
    except RuntimeError:
        times.append(np.nan)
        residuals.append(np.nan)

filename = 'nnls_main.npz'
np.savez(filename,
         times=np.asarray(times),
         residuals=np.asarray(residuals))
d = np.load(filename)
times_main = d['times']
residuals_main = d['residuals']

@ilayn
Copy link
Member Author

ilayn commented Jun 23, 2024

Thank you @mdhaber much appreciated. You are right about the mutation. I thought asarray copies but apparently not, I'll fix.Good catch.

The residual is strange indeed I'll check.

@ilayn
Copy link
Member Author

ilayn commented Jun 27, 2024

but I realized that the function is mutating A.

Now explicit copies are done at the outset on Cython side, regardless of the input to make sure.

Even after passing copies into the function, I'm finding that the assertion still fails (meaningfully - i.e. with O(1) relative error) sometimes. I'd suggest comparing the residuals returned by the function against A@x - b for the problems you're generating.

Thanks for catching this @mdhaber. The residual norm is actually tracked by the algorithm as it moves along by the remaining entries of the working copy of b. But I wasn't slicing the right number of "remaining" values. Now corrected.

I've added your examples and an extra problem with an interesting zero pattern optimal solution to the existing tests and residuals are compared.

@ilayn ilayn added this to the 1.15.0 milestone Jun 27, 2024
@ilayn
Copy link
Member Author

ilayn commented Jul 6, 2024

@mdhaber I know you are quite busy with other PRs. Do you think you can have a final look?

@h-vetinari
Copy link
Member

(@h-vetinari it would be great if you can test it if you have any capacity)

I backported this PR to 1.14 and threw it at the BLAS flavour testing gauntlet in conda-forge/scipy-feedstock#280.

@h-vetinari
Copy link
Member

No failures of test_nnls_inner_loop_case1 reappeared 👍

@ilayn
Copy link
Member Author

ilayn commented Jul 7, 2024

Ah great! Then I'll wait for a few days and click in case someone else does not. Thank you @h-vetinari !

@ilayn
Copy link
Member Author

ilayn commented Jul 9, 2024

As an external verification from #21140, which seems to hit all parts of the inner loop, I think this is correct for cases we tried out. I'll click it in and brace for impact :) Thank you all for the help.

@ilayn ilayn merged commit 81c53d4 into scipy:main Jul 9, 2024
@ilayn ilayn deleted the nnls_rewrite branch July 9, 2024 11:07
@divenex
Copy link
Contributor

divenex commented Jul 11, 2024

and the scipy implementation that is currently released (using the aforementioned implementation)

image

How does the current Cython version compare with the following nnls based on SciPy bvls?

def nnls(A, b):
    return bvls(A, b, bounds=(0, np.inf)).x

@jd-edp
Copy link

jd-edp commented Nov 5, 2024

Hey @ilayn is there a release date of this version. Can't wait to see it on action. For now i have to stick on version 1.11.4 which doesn't let me upgrade to py3.13

Thank you in advance

@ilayn
Copy link
Member Author

ilayn commented Nov 5, 2024

Hi @jd-edp. It is going to show up in version 1.15 that will be out in about a month.

@h-vetinari
Copy link
Member

For now i have to stick on version 1.11.4 which doesn't let me upgrade to py3.13

What keeps you from moving to 1.14 (which supports py3.13)?

@jd-edp
Copy link

jd-edp commented Nov 6, 2024

Hey @h-vetinari,

i need the performance of the old version. After the rewrite after 1.11.4 my training is completely broken and slow. You can see my issue here #20984

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython Issues with the internal Cython code base enhancement A new feature or improvement maintenance Items related to regular maintenance tasks Meson Items related to the introduction of Meson as the new build system for SciPy scipy.optimize
Projects
None yet
5 participants