-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
ENH:MAINT:optimize: Re-rewrite nnls in Cython #21021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I also dare to ping (and possibly annoy) @jvendrow and @LucasBoTang in case they are interested in improving this solver using Bro & De Jong algorithm. In the current release of scipy we (meaning I) have translated the FNNLS (possibly in erroneous form) here; Lines 98 to 164 in 628a6b3
This caused some accuracy issues and at certain cases it spins too many times signaling a chattering behavior in the active passive set swaps (#20168). I know there are issues with it but I don't have any time to dig deep into the literature. Hence this PR; a cythonized version of original NNLS. It is quite fast up until n=100 and keeps up then quickly deteriorates performance as expected. This is the fnnls performance result for the this current PR and the scipy implementation that is currently released (using the aforementioned implementation) The original If you have a robust implementation of FNNLS, it would be great if you can help us with it. I can happily help you converting it to Cython and with other details. Please let us know. |
Just trying to help by testing. Just to be sure I understand:
So in this PR, Some observations: On my machine, the assertion fails: import numpy as np
from scipy import optimize
A = [[0.004734199143798789, -0.09661916455815653, -0.04308779048103441, 0.4039475561867938,
-0.27742598780954364, -0.20816924034369574, -0.17264070902176, 0.05251808558963846],
[-0.030263548855047975, -0.30356483926431466, 0.18080406600591398, -0.06892233941254086,
-0.41837298885432317, 0.30245352819647003, -0.19008975278116397, -0.00990809825429995],
[-0.2561747595787612, -0.04376282125249583, 0.4422181991706678, -0.13720906318924858,
-0.0069523811763796475, -0.059238287107464795, 0.028663214369642594, 0.5415531284893763],
[0.2949336072968401, 0.33997647534935094, 0.38441519339815755, -0.306001783010386,
0.18120773805949028, -0.36669767490747895, -0.021539960590992304, -0.2784251712424615],
[0.5009075736232653, -0.20161970347571165, 0.08404512586550646, 0.2520496489348788,
0.14812015101612894, -0.25823455803981266, -0.1596872058396596, 0.5960141613922691]]
b = [18.036779281222124, -18.126530733870887, 13.535652034584029, -2.6654275476795966,
9.166315328199575]
A = np.asarray(A)
b = np.asarray(b)
x, res = optimize.nnls(A, b)
np.testing.assert_allclose(np.linalg.norm(A@x - b), res, rtol=1e-15)
# Mismatched elements: 1 / 1 (100%)
# Max absolute difference: 8.49127064
# Max relative difference: inf
# x: array(8.491271)
# y: array(0.) It was a bit puzzling at first, but I realized that the function is mutating Even after passing copies into the function, I'm finding that the assertion still fails (meaningfully - i.e. with O(1) relative error) sometimes. I'd suggest comparing the residuals returned by the function against Ah, here's an example: import numpy as np
from scipy import optimize
A = [[0.2508259992635229, -0.24031300195203256],
[0.510647748500133, 0.2872936081767836],
[0.8196387904102849, -0.03520620107046682],
[0.030739759120097084, -0.07768656359879388]]
b = [24.456141951303913, 28.047143273432333, 41.10526799545987, -1.2078282698324068]
A = np.asarray(A)
b = np.asarray(b)
x, res = optimize.nnls(A.copy(), b.copy())
np.testing.assert_allclose(np.linalg.norm(A@x - b), res, rtol=1e-15)
# Not equal to tolerance rtol=1e-15, atol=0
# Mismatched elements: 1 / 1 (100%)
# Max absolute difference: 9.78234068
# Max relative difference: 1.04348544
# x: array(19.157019)
# y: array(9.374679) In case you're interested in silly edge cases: optimize.nnls(np.empty((1, 0)), np.empty(1))
# (array([], dtype=float64), 29.732883325343384) Good news is that I'm finding that this PR and In case you're interested in another random problem generator: import numpy as np
from time import perf_counter
from scipy import optimize
rng = np.random.default_rng(2549824598234528)
from sklearn.datasets import make_regression
residuals = []
times = []
for i in range(100):
n_samples = rng.integers(1, 1000)
n_features = rng.integers(1, 1000)
n_informative = max(1, int(rng.random() * n_features))
effective_rank = max(1, int(rng.random() * min(n_features, n_samples)))
tail_strength = rng.random()
noise = rng.random() * 10
random_state = rng.integers(1, 100000)
A, b = make_regression(n_samples=n_samples, n_features=n_features,
n_informative=n_informative, effective_rank=effective_rank,
tail_strength=tail_strength, noise=noise,
random_state=random_state)
b = np.atleast_1d(b)
try:
tic = perf_counter()
x, res = optimize.nnls(A.copy(), b.copy(), maxiter=100*n_features)
toc = perf_counter()
res = np.linalg.norm(A@x - b)
# np.testing.assert_allclose(np.linalg.norm(A@x - b), res, rtol=1e-15, atol=1e-12)
times.append(toc-tic)
residuals.append(res)
print(i, toc-tic)
except RuntimeError:
times.append(np.nan)
residuals.append(np.nan)
filename = 'nnls_main.npz'
np.savez(filename,
times=np.asarray(times),
residuals=np.asarray(residuals))
d = np.load(filename)
times_main = d['times']
residuals_main = d['residuals'] |
Thank you @mdhaber much appreciated. You are right about the mutation. I thought asarray copies but apparently not, I'll fix.Good catch. The residual is strange indeed I'll check. |
MAINT:optimize:nnls:Address the review comments
Now explicit copies are done at the outset on Cython side, regardless of the input to make sure.
Thanks for catching this @mdhaber. The residual norm is actually tracked by the algorithm as it moves along by the remaining entries of the working copy of I've added your examples and an extra problem with an interesting zero pattern optimal solution to the existing tests and residuals are compared. |
@mdhaber I know you are quite busy with other PRs. Do you think you can have a final look? |
I backported this PR to 1.14 and threw it at the BLAS flavour testing gauntlet in conda-forge/scipy-feedstock#280. |
No failures of |
Ah great! Then I'll wait for a few days and click in case someone else does not. Thank you @h-vetinari ! |
As an external verification from #21140, which seems to hit all parts of the inner loop, I think this is correct for cases we tried out. I'll click it in and brace for impact :) Thank you all for the help. |
How does the current Cython version compare with the following
|
Hey @ilayn is there a release date of this version. Can't wait to see it on action. For now i have to stick on version 1.11.4 which doesn't let me upgrade to py3.13 Thank you in advance |
Hi @jd-edp. It is going to show up in version 1.15 that will be out in about a month. |
What keeps you from moving to 1.14 (which supports py3.13)? |
Hey @h-vetinari, i need the performance of the old version. After the rewrite after 1.11.4 my training is completely broken and slow. You can see my issue here #20984 |
Recently, in #18570, we have swapped the old F77 nnls solver with a Python version of the so-called Fast NNLS based on Bro & de Jong paper. As we have received a few reports, the FNNLS description of the paper, as it turns out, leaves a lot of details out and there is no obvious way to fix it without switching to "research" mode.
I went back to the original F77 code and converted to Cython. The current tests are now passing without any issues. I can feel how to generalize to Bro & De Jong results but frankly I will have no appetite for a while. The F77 code is unbearable and can't find any examples to make it fail so far (In English, lack of edge problems). Also there is nothing in common with the text version and the actual implementation as can be seen as L&H description of this PR (hence absolute madness)
Reference issue
Closes #20813
Closes #20984
Closes #21140
Reverted the changes we did in #20961 (@h-vetinari it would be great if you can test it if you have any capacity)
For @jonasgitabap 's case in #20984, here are the results;
Unfortunately, this old code does not use any tolerances, but I have introduced the
atol
keyword for the FNNLS and that needs an agreement. Apologies for introducing it prematurely in this fashion. I should not have relied to the paper this much.