Divide by zero in Nyström approximation. #1860

Closed
breuderink opened this Issue Apr 15, 2013 · 12 comments

Comments

Projects
None yet
2 participants

Sometimes I get what seems to be a divide-by-zero in the Nyström kernel approximation:

/usr/local/lib/python2.7/site-packages/sklearn/kernel_approximation.py:445: RuntimeWarning: divide by zero encountered in divide
  self.normalization_ = np.dot(U * 1. / np.sqrt(S), V)
/usr/local/lib/python2.7/site-packages/sklearn/kernel_approximation.py:445: RuntimeWarning: invalid value encountered in divide
  self.normalization_ = np.dot(U * 1. / np.sqrt(S), V)

I am trying to minimal example to reproduce the problem, but so far I have not been able to do so. I expect that some eigenvalues of the kernel matrix are rounded to zero, which could be caused by a rank-deficient kernel matrix. This is with sklearn version 0.13.1.

If that is the case, this problem can be fixed by adding a tiny ridge to the kernel matrix, or by adding a tiny value to the eigenvalues S. Alternatively, one could use only positive eigenvalues in the computation.

Owner

amueller commented Apr 15, 2013

Thanks for the report.
Adding a tiny value doesn't seem like a good idea as you add meaningless features.
I'd probably raise a value error maybe.

You should be able to reproduce using a linear kernel in a space with less dimensions than there are samples.

For your application: just use less components. If the kernel is rank deficient, you don't actually need the other components (from a kernel approximation point you are lucky ;)

Thank you for the quick response! Creating a low-rank kernel matrix is indeed easily done when using a linear kernel. But, numerically, the eigenvalues /approach/ zero, but tend to stick around 1e-14. So this doesn't directly reproduce the problem I had. Further, I experienced this problem using the included 'digits' dataset, with a RBF kernel.

It might be causes by examples that are very similar:

import numpy as np
from sklearn import kernel_approximation, metrics

X = np.random.rand(10, 20)
X = np.vstack([X] * 2) # duplicate samples

gamma = 1000.
N = kernel_approximation.Nystroem(gamma=gamma, n_components=X.shape[0]).fit(X)
Y = N.transform(X)

K = metrics.pairwise.rbf_kernel(X, gamma=gamma)
print 'Approximation error: ', np.linalg.norm(K - np.dot(Y, Y.T))
assert np.all(np.isfinite(Y))  # This fails!

Regarding the solution... the computation of the inverse square root of the eigenvalues is numerically problematic when the values approach zero. And even when the output stays finite, this can cause enormous inaccuracies. Too see this, run the example above with gamma=100. Now, the program does not fail, but the norm of the difference between the kernel and the Nyström approximation is around 5e91.

I see a strong parallel with the whitening transform, that has a similar problem when the eigenvalues of the covariance matrix tend to get too small. Adding a small ridge is helpful with whitening, and corresponds to using a better covariance estimator.

So this is not to say that adding a tiny value is the only solution, but I intended this not as a quick fix to make the problem go away, but as something that I think makes sense for this (numerical) problem.

But, please keep up al the good work. I am really loving sklearn, and it's strong community.

Owner

amueller commented Apr 15, 2013

Enough of the flattery ;)

Yes, you are right that this is quite unstable. It is also a bottle neck. I was wondering if we could so something better.
My intuition was that small eigenvalues tell you that you shouldn't use these dimensions anyhow. But I am not sure this intuition is correct. Adding a small value might help.
Did you do any experiments to see if the error of approximation actually gets lower?
That would be interesting to know.

Btw, as you are playing with this: using KMeans instead of random sampled points has been looked at in the literature and seems to do better. I was just to busy to implement it yet (and it is not that high up my priority list).
If you have any insights there, that would also be sweet :)

I haven't tested the effect of a ridge in specifically this setting, but have done so in the past with my own Nyström approximator, and with some other eigenvalue problems, and I am fairly certain that it will improve the approximation quite a bit in these (pathological) circumstances.

Using KMeans sounds very interesting, but it adds a layer of intelligence — and I like my classifiers dumb :). But performance wise I think that is a better option.

Right now I have some trouble getting a development version of sklearn running, so I can't offer a fix right now. Maybe if I find some time.

Owner

amueller commented Apr 15, 2013

About K-Means. Well it is not that intelligent and not less deterministic than random assignment ;)

If you have trouble getting anything to run (I guess blas issues?) let us know via issue / mailing list. We really try to make it easy usually.
Thanks for offering to pitch in. I know how it is with finding time ;)

Ok, I have a fun work-around. This numerical imprecision can be circumvented by adding extra numerical imprecision :). I have updated my minimal non-working example to add a bit of Gaussian noise with a standard deviation of 1e-6 to X. Now the approximation error is down to 1e-7.

This indirectly shows that the ridge is helpful, since now there are no duplicate instances any more, and thus the kernel matrix is not rank deficient any more.

For now, this solves my problem — but it is an ugly botch.

My compiling problem is related to murmurhash. I'll file a separate issue.

Owner

amueller commented May 4, 2013

Actually I can't reproduce the error with your above snipplet. For all values of gamma I get very small approximation errors and no infinities...

Owner

amueller commented May 4, 2013

Using digits I get an approximation error of zero... hum... not sure what to do now

Oh, that is odd (and interesting). When I paste it in a Python interpreter, I get this output. Note the divide by zero error.

Python 2.7.4 (default, Apr 16 2013, 16:36:54) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.24)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> from sklearn import kernel_approximation, metrics
>>> 
>>> X = np.random.rand(10, 20)
>>> X = np.vstack([X] * 2) # duplicate samples
>>> 
>>> gamma = 1000.
>>> N = kernel_approximation.Nystroem(gamma=gamma, n_components=X.shape[0]).fit(X)
/usr/local/lib/python2.7/site-packages/sklearn/kernel_approximation.py:462: RuntimeWarning: divide by zero encountered in divide
  self.normalization_ = np.dot(U * 1. / np.sqrt(S), V)
/usr/local/lib/python2.7/site-packages/sklearn/kernel_approximation.py:462: RuntimeWarning: invalid value encountered in divide
  self.normalization_ = np.dot(U * 1. / np.sqrt(S), V)
>>> Y = N.transform(X)
>>> 
>>> K = metrics.pairwise.rbf_kernel(X, gamma=gamma)
>>> print 'Approximation error: ', np.linalg.norm(K - np.dot(Y, Y.T))
Approximation error:  nan
>>> assert np.all(np.isfinite(Y))  # This fails!
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

My Python packages:

$ pip freeze
Fabric==1.4.3
PyYAML==3.10
SQLAlchemy==0.8.0
coverage==3.5.2
distribute==0.6.36
dotfiles==0.5.4
git-remote-helpers==0.1.0
joblib==0.7.0b
matplotlib==1.1.1
mock==1.0.1
numpy==1.6.2
pymongo==2.4.2
redis==2.7.2
requests==1.1.0
requests-oauthlib==0.3.0
scikit-learn==0.13.1
scipy==0.12.0.dev-9666ab5
ssh==1.7.14
vboxapi==1.0
virtualenv==1.9.1
wsgiref==0.1.2

Do you have different versions? I see that the SVD is from scipy, so the version of scipy might be important.

Owner

amueller commented May 4, 2013

In [8]: scipy.version.version
Out[8]: '0.10.1'

In [10]: np.version.version
Out[10]: '1.6.2'

I'm on scikit-learn master, though there hasn't been any change there afaik.
Your version of scipy is way newer ;) It seems mine is more robust lol.

Owner

amueller commented May 4, 2013

Yeah I just tried to get a newer scipy but I don't think that's gonna happen today, sorry ^^

@amueller amueller added this to the 0.15.1 milestone Jul 18, 2014

@larsmans larsmans closed this in #4179 Feb 6, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment