ENH:MAINT:optimize: Re-rewrite nnls in Cython #21021

ilayn · 2024-06-22T00:23:45Z

Recently, in #18570, we have swapped the old F77 nnls solver with a Python version of the so-called Fast NNLS based on Bro & de Jong paper. As we have received a few reports, the FNNLS description of the paper, as it turns out, leaves a lot of details out and there is no obvious way to fix it without switching to "research" mode.

I went back to the original F77 code and converted to Cython. The current tests are now passing without any issues. I can feel how to generalize to Bro & De Jong results but frankly I will have no appetite for a while. The F77 code is unbearable and can't find any examples to make it fail so far (In English, lack of edge problems). Also there is nothing in common with the text version and the actual implementation as can be seen as L&H description of this PR (hence absolute madness)

Reference issue

Closes #20813
Closes #20984
Closes #21140

Reverted the changes we did in #20961 (@h-vetinari it would be great if you can test it if you have any capacity)

For @jonasgitabap 's case in #20984, here are the results;

THIS PR : 0.9998772286451303 1.15.0.dev0+960.95f8d01 8640ms
RELEASE: <did not finish after 10 minutes...>

Unfortunately, this old code does not use any tolerances, but I have introduced the atol keyword for the FNNLS and that needs an agreement. Apologies for introducing it prematurely in this fashion. I should not have relied to the paper this much.

ilayn · 2024-06-22T10:36:43Z

I also dare to ping (and possibly annoy) @jvendrow and @LucasBoTang in case they are interested in improving this solver using Bro & De Jong algorithm.

In the current release of scipy we (meaning I) have translated the FNNLS (possibly in erroneous form) here;

scipy/scipy/optimize/_nnls.py

Lines 98 to 164 in 628a6b3

    
           def _nnls(A, b, maxiter=None, tol=None): 
        
               """ 
        
               This is a single RHS algorithm from ref [2] above. For multiple RHS 
        
               support, the algorithm is given in  :doi:`10.1002/cem.889` 
        
               """ 
        
               m, n = A.shape 
        
               AtA = A.T @ A 
        
               Atb = b @ A  # Result is 1D - let NumPy figure it out 
        
               if not maxiter: 
        
                   maxiter = 3*n 
        
               if tol is None: 
        
                   tol = 10 * max(m, n) * np.spacing(1.) 
        
               # Initialize vars 
        
               x = np.zeros(n, dtype=np.float64) 
        
               s = np.zeros(n, dtype=np.float64) 
        
               # Inactive constraint switches 
        
               P = np.zeros(n, dtype=bool) 
        
               # Projected residual 
        
               w = Atb.copy().astype(np.float64)  # x=0. Skip (-AtA @ x) term 
        
               # Overall iteration counter 
        
               # Outer loop is not counted, inner iter is counted across outer spins 
        
               iter = 0 
        
               while (not P.all()) and (w[~P] > tol).any():  # B 
        
                   # Get the "most" active coeff index and move to inactive set 
        
                   k = np.argmax(w * (~P))  # B.2 
        
                   P[k] = True  # B.3 
        
                   # Iteration solution 
        
                   s[:] = 0. 
        
                   # B.4 
        
                   with warnings.catch_warnings(): 
        
                       warnings.filterwarnings('ignore', message='Ill-conditioned matrix', 
        
                                               category=LinAlgWarning) 
        
                       s[P] = solve(AtA[np.ix_(P, P)], Atb[P], assume_a='sym', check_finite=False) 
        
                   # Inner loop 
        
                   while (iter < maxiter) and (s[P].min() < 0):  # C.1 
        
                       iter += 1 
        
                       inds = P * (s < 0) 
        
                       alpha = (x[inds] / (x[inds] - s[inds])).min()  # C.2 
        
                       x *= (1 - alpha) 
        
                       x += alpha*s 
        
                       P[x <= tol] = False 
        
                       with warnings.catch_warnings(): 
        
                           warnings.filterwarnings('ignore', message='Ill-conditioned matrix', 
        
                                                   category=LinAlgWarning) 
        
                           s[P] = solve(AtA[np.ix_(P, P)], Atb[P], assume_a='sym', 
        
                                        check_finite=False) 
        
                       s[~P] = 0  # C.6 
        
                   x[:] = s[:] 
        
                   w[:] = Atb - AtA @ x 
        
                   if iter == maxiter: 
        
                       # Typically following line should return 
        
                       # return x, np.linalg.norm(A@x - b), -1 
        
                       # however at the top level, -1 raises an exception wasting norm 
        
                       # Instead return dummy number 0. 
        
                       return x, 0., -1 
        
               return x, np.linalg.norm(A@x - b), 1

This caused some accuracy issues and at certain cases it spins too many times signaling a chattering behavior in the active passive set swaps (#20168). I know there are issues with it but I don't have any time to dig deep into the literature. Hence this PR; a cythonized version of original NNLS. It is quite fast up until n=100 and keeps up then quickly deteriorates performance as expected.

This is the fnnls performance result for the this current PR

and the scipy implementation that is currently released (using the aforementioned implementation)

The original fnnls.m is here but I'm not sure if it is fit for production code https://web.archive.org/web/20010708150016/http://www.models.kvl.dk/source/nnls/fnnls.m

If you have a robust implementation of FNNLS, it would be great if you can help us with it. I can happily help you converting it to Cython and with other details. Please let us know.

mdhaber · 2024-06-23T09:03:29Z

Just trying to help by testing. Just to be sure I understand:

but I have introduced the atol keyword for the FNNLS...

So in this PR, atol is not used, right? (That is what I observed.)

Some observations:

On my machine, the assertion fails:

import numpy as np
from scipy import optimize

A = [[0.004734199143798789, -0.09661916455815653, -0.04308779048103441, 0.4039475561867938,
      -0.27742598780954364, -0.20816924034369574, -0.17264070902176, 0.05251808558963846],
     [-0.030263548855047975, -0.30356483926431466, 0.18080406600591398, -0.06892233941254086,
      -0.41837298885432317, 0.30245352819647003, -0.19008975278116397, -0.00990809825429995],
     [-0.2561747595787612, -0.04376282125249583, 0.4422181991706678, -0.13720906318924858,
      -0.0069523811763796475, -0.059238287107464795, 0.028663214369642594, 0.5415531284893763],
     [0.2949336072968401, 0.33997647534935094, 0.38441519339815755, -0.306001783010386,
      0.18120773805949028, -0.36669767490747895, -0.021539960590992304, -0.2784251712424615],
     [0.5009075736232653, -0.20161970347571165, 0.08404512586550646, 0.2520496489348788,
      0.14812015101612894, -0.25823455803981266, -0.1596872058396596, 0.5960141613922691]]
b = [18.036779281222124, -18.126530733870887, 13.535652034584029, -2.6654275476795966,
     9.166315328199575]

A = np.asarray(A)
b = np.asarray(b)

x, res = optimize.nnls(A, b)
np.testing.assert_allclose(np.linalg.norm(A@x - b), res, rtol=1e-15)

# Mismatched elements: 1 / 1 (100%)
# Max absolute difference: 8.49127064
# Max relative difference: inf
# x: array(8.491271)
#  y: array(0.)

It was a bit puzzling at first, but I realized that the function is mutating A.

Even after passing copies into the function, I'm finding that the assertion still fails (meaningfully - i.e. with O(1) relative error) sometimes. I'd suggest comparing the residuals returned by the function against A@x - b for the problems you're generating. Do they always agree for you, or am I generating pathological problems?

Ah, here's an example:

import numpy as np
from scipy import optimize

A = [[0.2508259992635229, -0.24031300195203256],
     [0.510647748500133, 0.2872936081767836],
     [0.8196387904102849, -0.03520620107046682],
     [0.030739759120097084, -0.07768656359879388]]
b = [24.456141951303913, 28.047143273432333, 41.10526799545987, -1.2078282698324068]
A = np.asarray(A)
b = np.asarray(b)
x, res = optimize.nnls(A.copy(), b.copy())
np.testing.assert_allclose(np.linalg.norm(A@x - b), res, rtol=1e-15)

# Not equal to tolerance rtol=1e-15, atol=0
# Mismatched elements: 1 / 1 (100%)
# Max absolute difference: 9.78234068
# Max relative difference: 1.04348544
#  x: array(19.157019)
#  y: array(9.374679)

In case you're interested in silly edge cases:

optimize.nnls(np.empty((1, 0)), np.empty(1))
# (array([], dtype=float64), 29.732883325343384)

Good news is that I'm finding that this PR and main get essentially the same residual (computed directly - not returned by the function) on all problems, and this PR tends to be much faster for slow problems.

In case you're interested in another random problem generator:

import numpy as np
from time import perf_counter
from scipy import optimize
rng = np.random.default_rng(2549824598234528)

from sklearn.datasets import make_regression

residuals = []
times = []

for i in range(100):
    n_samples = rng.integers(1, 1000)
    n_features = rng.integers(1, 1000)
    n_informative = max(1, int(rng.random() * n_features))
    effective_rank = max(1, int(rng.random() * min(n_features, n_samples)))
    tail_strength = rng.random()
    noise = rng.random() * 10
    random_state = rng.integers(1, 100000)

    A, b = make_regression(n_samples=n_samples, n_features=n_features,
                           n_informative=n_informative, effective_rank=effective_rank,
                           tail_strength=tail_strength, noise=noise,
                           random_state=random_state)
    b = np.atleast_1d(b)
    try:
        tic = perf_counter()
        x, res = optimize.nnls(A.copy(), b.copy(), maxiter=100*n_features)
        toc = perf_counter()
        res = np.linalg.norm(A@x - b)
        # np.testing.assert_allclose(np.linalg.norm(A@x - b), res, rtol=1e-15, atol=1e-12)
        times.append(toc-tic)
        residuals.append(res)
        print(i, toc-tic)
    except RuntimeError:
        times.append(np.nan)
        residuals.append(np.nan)

filename = 'nnls_main.npz'
np.savez(filename,
         times=np.asarray(times),
         residuals=np.asarray(residuals))
d = np.load(filename)
times_main = d['times']
residuals_main = d['residuals']

ilayn · 2024-06-23T21:25:21Z

Thank you @mdhaber much appreciated. You are right about the mutation. I thought asarray copies but apparently not, I'll fix.Good catch.

The residual is strange indeed I'll check.

MAINT:optimize:nnls:Address the review comments

See scipygh-20470

ilayn · 2024-06-27T13:56:36Z

but I realized that the function is mutating A.

Now explicit copies are done at the outset on Cython side, regardless of the input to make sure.

Even after passing copies into the function, I'm finding that the assertion still fails (meaningfully - i.e. with O(1) relative error) sometimes. I'd suggest comparing the residuals returned by the function against A@x - b for the problems you're generating.

Thanks for catching this @mdhaber. The residual norm is actually tracked by the algorithm as it moves along by the remaining entries of the working copy of b. But I wasn't slicing the right number of "remaining" values. Now corrected.

I've added your examples and an extra problem with an interesting zero pattern optimal solution to the existing tests and residuals are compared.

ilayn · 2024-07-06T11:17:46Z

@mdhaber I know you are quite busy with other PRs. Do you think you can have a final look?

h-vetinari · 2024-07-07T03:30:53Z

(@h-vetinari it would be great if you can test it if you have any capacity)

I backported this PR to 1.14 and threw it at the BLAS flavour testing gauntlet in conda-forge/scipy-feedstock#280.

h-vetinari · 2024-07-07T08:13:03Z

No failures of test_nnls_inner_loop_case1 reappeared 👍

ilayn · 2024-07-07T08:33:30Z

Ah great! Then I'll wait for a few days and click in case someone else does not. Thank you @h-vetinari !

ilayn · 2024-07-09T11:06:45Z

As an external verification from #21140, which seems to hit all parts of the inner loop, I think this is correct for cases we tried out. I'll click it in and brace for impact :) Thank you all for the help.

divenex · 2024-07-11T10:32:21Z

and the scipy implementation that is currently released (using the aforementioned implementation)

How does the current Cython version compare with the following nnls based on SciPy bvls?

def nnls(A, b):
    return bvls(A, b, bounds=(0, np.inf)).x

jd-edp · 2024-11-05T19:53:42Z

Hey @ilayn is there a release date of this version. Can't wait to see it on action. For now i have to stick on version 1.11.4 which doesn't let me upgrade to py3.13

Thank you in advance

ilayn · 2024-11-05T20:44:20Z

Hi @jd-edp. It is going to show up in version 1.15 that will be out in about a month.

h-vetinari · 2024-11-06T02:27:53Z

For now i have to stick on version 1.11.4 which doesn't let me upgrade to py3.13

What keeps you from moving to 1.14 (which supports py3.13)?

jd-edp · 2024-11-06T03:15:04Z

Hey @h-vetinari,

i need the performance of the old version. After the rewrite after 1.11.4 my training is completely broken and slow. You can see my issue here #20984

ilayn added enhancement A new feature or improvement scipy.optimize maintenance Items related to regular maintenance tasks Cython Issues with the internal Cython code base labels Jun 22, 2024

ilayn requested a review from andyfaff as a code owner June 22, 2024 00:23

github-actions bot added the Meson Items related to the introduction of Meson as the new build system for SciPy label Jun 22, 2024

ilayn added 5 commits June 27, 2024 15:50

ENH:optimize: Re-rewrite nnls in Cython

2333de4

MAINT:optimize:nnls:Address the review comments

ENH:optimize: Use Cython version of nnls

984c4e0

MAINT:optimize:Modify meson file for Cython nnls

b2a4ee9

TST:MAINT:optimize: Test the unperturbed version of the test

2326294

See scipygh-20470

TST:optimize: Add more tests for nnls

dfd1ffb

ilayn force-pushed the nnls_rewrite branch from 5b299f1 to dfd1ffb Compare June 27, 2024 13:51

ilayn added this to the 1.15.0 milestone Jun 27, 2024

mdhaber mentioned this pull request Jul 8, 2024

BUG: optimize.nnls: re-implementation not robust #21140

Closed

ilayn merged commit 81c53d4 into scipy:main Jul 9, 2024

ilayn deleted the nnls_rewrite branch July 9, 2024 11:07

wangzcl mentioned this pull request Jul 9, 2024

BUG: Test failure due to scipy.optimize.nnls defect aleixalcacer/archetypes#23

Closed

ilayn mentioned this pull request Jul 11, 2024

BUG: sparse.linalg: Segfault in arpack with ifx #20728

Closed

timholy mentioned this pull request Jul 15, 2024

Fix tests JuliaLinearAlgebra/NonNegLeastSquares.jl#37

Merged

ilayn mentioned this pull request Sep 27, 2024

BUG: optimize.nnls: precision problems #21630

Closed

h-vetinari mentioned this pull request Nov 9, 2024

ENH: bring back nnls.f in v1.15.0 #21852

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH:MAINT:optimize: Re-rewrite nnls in Cython #21021

ENH:MAINT:optimize: Re-rewrite nnls in Cython #21021

Uh oh!

ilayn commented Jun 22, 2024 •

edited

Loading

Uh oh!

ilayn commented Jun 22, 2024

Uh oh!

mdhaber commented Jun 23, 2024 •

edited

Loading

Uh oh!

ilayn commented Jun 23, 2024

Uh oh!

ilayn commented Jun 27, 2024 •

edited

Loading

Uh oh!

ilayn commented Jul 6, 2024

Uh oh!

h-vetinari commented Jul 7, 2024

Uh oh!

h-vetinari commented Jul 7, 2024

Uh oh!

ilayn commented Jul 7, 2024

Uh oh!

ilayn commented Jul 9, 2024 •

edited

Loading

Uh oh!

divenex commented Jul 11, 2024

Uh oh!

jd-edp commented Nov 5, 2024

Uh oh!

ilayn commented Nov 5, 2024

Uh oh!

h-vetinari commented Nov 6, 2024

Uh oh!

jd-edp commented Nov 6, 2024

Uh oh!

Uh oh!

Uh oh!

ENH:MAINT:optimize: Re-rewrite nnls in Cython #21021

ENH:MAINT:optimize: Re-rewrite nnls in Cython #21021

Uh oh!

Conversation

ilayn commented Jun 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference issue

Uh oh!

ilayn commented Jun 22, 2024

Uh oh!

mdhaber commented Jun 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilayn commented Jun 23, 2024

Uh oh!

ilayn commented Jun 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilayn commented Jul 6, 2024

Uh oh!

h-vetinari commented Jul 7, 2024

Uh oh!

h-vetinari commented Jul 7, 2024

Uh oh!

ilayn commented Jul 7, 2024

Uh oh!

ilayn commented Jul 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

divenex commented Jul 11, 2024

Uh oh!

jd-edp commented Nov 5, 2024

Uh oh!

ilayn commented Nov 5, 2024

Uh oh!

h-vetinari commented Nov 6, 2024

Uh oh!

jd-edp commented Nov 6, 2024

Uh oh!

Uh oh!

ilayn commented Jun 22, 2024 •

edited

Loading

mdhaber commented Jun 23, 2024 •

edited

Loading

ilayn commented Jun 27, 2024 •

edited

Loading

ilayn commented Jul 9, 2024 •

edited

Loading