Locality Preserving Projection module for manifold learning #1628

Open
wants to merge 27 commits into
from

Projects

None yet

6 participants

@lucidfrontier45
Owner

This pull request adds a new module "Locality Preserving Projection" (LPP) to the manifold learning package. LPP is can be see as a linear approximation to the Laplacian Eigen Mapping. Unlike other manifold learning algorithms, LPP is a linear transformation and can be used like PCA.

Detail of LPP can be found in the following paper.
"X. He and P. Niyogi. Locality preserving projections. Advances in Neural Information Processing Systems 16 (NIPS 2003), 2003. Vancouver, Canada.".

Currently I finished main LPP module and added examples. The remained tasks are to write test codes and documents. Can any one suggest me a guideline to write unit-tests? I don't understand its manner.

Thank you.

Owner

I never heard of LPP but 1500 cites say it is pretty well-known.
Still, I'd rather call the estimator LocalityPreservingProjection.

Owner

When running the tests, I get this warning:

LPP(eigen_solver=None, kernel_func=None, kernel_param=None, max_iter=None,
  mode=None, n_components=None, n_neighbors=None, pca_preprocess=None,
  random_state=0, tol=None, use_ext=None)
The eigenvalue range specified is not valid.
Valid range is [0,0]
Owner

several of the common tests are failing. please run the test-suite. LPP doesn't check the input for nan or inf (use the utils.validation) functions, and it changes some of its parameters in the __init__.

Owner

The current examples don't really convince me. Maybe I have to read the paper. As it is a linear method, it can't do well on the sphere. For the S we get a projection that basically projects to the flat S. That might make some sense.
I tried the digits, too, and didn't see anything there....

Owner

Hi @amueller

I fixed name LPP to LocalityPreservingProjection.

Can you tell me how can I get the error you reported?

LPP(eigen_solver=None, kernel_func=None, kernel_param=None, max_iter=None,
  mode=None, n_components=None, n_neighbors=None, pca_preprocess=None,
  random_state=0, tol=None, use_ext=None)
The eigenvalue range specified is not valid.
Valid range is [0,0]
@lucidfrontier45 lucidfrontier45 - change LPP to LocalityPreservingProjection
- modified PCA preprocess of LPP for removing singularity
5f0b73c
Owner

I think that I'd prefer a more explicite name than 'LPP'. We try to avoid
TLAs in scikit-learn (TLA = three letter acronym :} )

Owner

I fixed name LPP to LocalityPreservingProjection.

Thanks. Comming late to the party :)

Owner

Thanks for the rename. I got the error when running the tests but can't reproduce now.
Before going into the details, I think it would be great if some other devs who might be familiar with the method could say if it is appropriate for sklearn.
Also: do you have any specific real-world application were it works particularly well or some example that demonstrates its benefits?

@agramfort agramfort and 1 other commented on an outdated diff Jan 29, 2013
sklearn/manifold/lpp_util.pyx
+# xx = nzx[i]
+# yy = nzy[i]
+# if G[yy, xx] == 0:
+# G[yy, xx] = G[xx, yy]
+# return G
+
+@cython.boundscheck(False)
+def toSymmetrixMat(G, np.ndarray[int, ndim = 1] nzx,
+ np.ndarray[int, ndim = 1] nzy):
+ cdef int i, xx, yy, n_nz
+ n_nz = len(nzx)
+ for i in xrange(n_nz):
+ xx = nzx[i]
+ yy = nzy[i]
+ if G[yy, xx] == 0:
+ G[yy, xx] = G[xx, yy]
agramfort
agramfort Jan 29, 2013 Owner

is this really necessary? since G is not typed I guess it's not really a performance improvement.

lucidfrontier45
lucidfrontier45 Feb 9, 2013 Owner

sorry for late response, actually it made 10 times faster,

@agramfort agramfort and 1 other commented on an outdated diff Jan 29, 2013
sklearn/manifold/lpp.py
+
+ Returns
+ -------
+ adjacency matrix : LIL sparse matrix, shape: (n_samples, n_samples)
+ (i, j) component of it is defined in the section 2.2 of [1]
+ """
+ X = np.asanyarray(X)
+ G = kneighbors_graph(X, n_neighbors, mode)
+ G = G.tolil()
+ nzx, nzy = G.nonzero()
+ if use_ext:
+ G = toSymmetrixMat(G, nzx, nzy)
+ else:
+ for xx, yy in zip(nzx, nzy):
+ if G[yy, xx] == 0:
+ G[yy, xx] = G[xx, yy]
agramfort
agramfort Jan 29, 2013 Owner

how about :

G = (G + G.T) / 2.

?

lucidfrontier45
lucidfrontier45 Jan 30, 2013 Owner

I think it's defiantly needed since kneighbors_graph does not give a symmetric matrix.
I know in the spectral_embedding module, (G + G.T /2.) is used but I don't think it's a correct way.

Suppose jth node is among k-nearest of ith node and the opposite is not. i.e. G[i, j] = dist(i, j) and G[j, i] = 0.
Also suppose G[m, l] = G[l, m] = dist(m, l).
Now if we use (G + G.T)/2, G[i, j] = G[j,i] = dist(i, j) /2 and G[m, l] = G[l, m] = dist(m. l].
It is obviously not what we want.

agramfort
agramfort Jan 30, 2013 Owner

ok makes sense but since both methods should take valid symmetric matrices
as input something should be either fixed in spectral embedding or changed
here.

my 2c

lucidfrontier45
lucidfrontier45 Jan 30, 2013 Owner

OK, I will somehow try to merge lpp in spectral_embedding.

@agramfort agramfort commented on the diff Jan 29, 2013
sklearn/manifold/lpp.py
+ ----------
+ afn_mat : CSR sparse matrix, shape(n_samples, n_samples)
+ the affinity matrix
+
+ Returns
+ -------
+ Laplacian matrix : CSR sparse matrix, shape(n_samples, n_samples)
+ (i, j) component L(i,j) is defined in [1] section 2.2 3
+
+ col_sum : ndarray, shape(n_samples)
+ diagonal matrix whose entries are column sums of the affinity matrix
+ """
+ col_sum = np.asarray(afn_mat.sum(0)).flatten()
+ lap_mat = (-afn_mat).tolil()
+ lap_mat.setdiag(col_sum)
+ return lap_mat.tocsr(), col_sum
agramfort
agramfort Jan 29, 2013 Owner

there is a need for the 3 functions above also for spectral embedding / laplacian eigenmaps so I guess the code should already exist and be factorized.

@agramfort agramfort commented on the diff Jan 29, 2013
sklearn/manifold/lpp.py
+ Returns
+ -------
+ Laplacian matrix : CSR sparse matrix, shape(n_samples, n_samples)
+ (i, j) component L(i,j) is defined in [1] section 2.2 3
+
+ col_sum : ndarray, shape(n_samples)
+ diagonal matrix whose entries are column sums of the affinity matrix
+ """
+ col_sum = np.asarray(afn_mat.sum(0)).flatten()
+ lap_mat = (-afn_mat).tolil()
+ lap_mat.setdiag(col_sum)
+ return lap_mat.tocsr(), col_sum
+
+
+def auto_dsygv(M, N, k, k_skip=0, eigen_solver='auto', tol=1E-6,
+ max_iter=100, random_state=None):
agramfort
agramfort Jan 29, 2013 Owner

if such a function does not exist elsewhere it should be in utils as it's a common problem. Can you check that a similar solver is not already implemented for laplacian eigenmaps for example?

lucidfrontier45
lucidfrontier45 Feb 9, 2013 Owner

In the spectral_embedding module, such eigen value problem part was treated in the main spectral_embedding function. I want to separate eigen part from main.

@agramfort agramfort commented on the diff Jan 29, 2013
sklearn/manifold/lpp.py
+ or matrix type. This method should be avoided for
+ large problems.
+
+ tol : float, optional
+ Tolerance for 'arpack' method.
+ Not used if eigen_solver=='dense'.
+
+ max_iter : maximum number of iterations for 'arpack' method
+ not used if eigen_solver=='dense'
+
+ random_state: numpy.RandomState or int, optional
+ The generator or seed used to determine the starting vector for arpack
+ iterations. Defaults to numpy.random.
+
+ use_ext : bool
+ whether use cython extension or not
agramfort
agramfort Jan 29, 2013 Owner

such an option should not exist. Either we go for cython or we don't.

lucidfrontier45
lucidfrontier45 Jan 30, 2013 Owner

I will remove use_ext and always use cython module.

@agramfort agramfort commented on an outdated diff Jan 29, 2013
sklearn/manifold/lpp.py
+ self._n_components = n_components
+ self._mode = mode
+ self._kernel_func = kernel_func
+ self._kernel_param = kernel_param
+ self._eigen_solver = eigen_solver
+ self._tol = tol
+ self._max_iter = max_iter
+ self._random_state = random_state
+ self._use_ext = use_ext
+ self._components = None
+ self._pca_preprocess = pca_preprocess
+
+ def fit(self, X, y=None):
+ if self._pca_preprocess:
+ # PCA preprocess for removing singularity
+ target_dim = int(min(X.shape) * 0.9)
agramfort
agramfort Jan 29, 2013 Owner

I don't like much the hard coded 0.9.

Owner

for what I understand one benefit is that the embedding is defined everywhere such that the transform for new samples is just a matrix multiplication. The question for me now is: does it really work better than what we already have? maybe the examples I did not take the time to run are convincing...

Owner

@agramfort

In this paper "Face Recognition Using Laplacianfaces", LPP performed much better than PCA.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.120.4055&rep=rep1&type=pdf

When I was doing some research, I found the above result can be reproduced for olivetti_faces data set.
However, I can't now due to some due to some changes or some bug... I'll first try to show you benchmark and then proceed to re-factoring.

@mblondel mblondel commented on an outdated diff Jan 30, 2013
sklearn/manifold/lpp_util.pyx
+#def adjacency_matrix(X, n_neighbors, mode='distance'):
+# cdef int i, xx, yy
+# X = np.asanyarray(X)
+# G = kneighbors_graph(X, n_neighbors, mode)
+# G = G.tolil()
+# nzx, nzy = G.nonzero()
+## for xx, yy in zip(nzx, nzy):
+# for i in xrange(nzx):
+# xx = nzx[i]
+# yy = nzy[i]
+# if G[yy, xx] == 0:
+# G[yy, xx] = G[xx, yy]
+# return G
+
+@cython.boundscheck(False)
+def toSymmetrixMat(G, np.ndarray[int, ndim = 1] nzx,
mblondel
mblondel Jan 30, 2013 Owner

-> to_symmetrix_matrix

@mblondel mblondel commented on an outdated diff Jan 30, 2013
sklearn/manifold/lpp.py
+ """
+ X = np.asanyarray(X)
+ G = kneighbors_graph(X, n_neighbors, mode)
+ G = G.tolil()
+ nzx, nzy = G.nonzero()
+ if use_ext:
+ G = toSymmetrixMat(G, nzx, nzy)
+ else:
+ for xx, yy in zip(nzx, nzy):
+ if G[yy, xx] == 0:
+ G[yy, xx] = G[xx, yy]
+
+ return G
+
+
+def affinity_matrix(adj_mat, kernel_func="heat", kernel_param=1.0):
mblondel
mblondel Jan 30, 2013 Owner

This functions arises in many estimators too, so it could be factorized.

Owner

It would indeed be nice to have an example on which LPP works better than PCA.

Owner

In this paper "Face Recognition Using Laplacianfaces", LPP performed much
better than PCA.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.120.4055&rep=rep1&type
=pdf

When I was doing some research, I found the above result can be reproduced for
olivetti_faces data set.

If you can reproduce on the olivetti_faces dataset, it would be a
convincing argument for inclusion.

I must say that, as the other devs, I would like to understand a bit more
the practical benefits compared to what we have.

Shiqiao Du - LPP code refactored
- added a comparison code of LPP and PCA (face recognition)
7c3208b
Owner

Hi, I added a benchmark code of PCA vs LPP.
It is a simple face recognition by PCA/LPP and one nearest neighbor.

for olivetti_faces, the average precision (score method) was
EigenFace : 88.7%, LaplacianFace : 89.1%

for lfw_people
EigenFace : 50.2%, LaplacianFace : 63.4%

While LPP and PCA were even for olivetti, LPP outperformed PCA for larger lfw_people dataset.

LPP has two additional parameters (number of neighbors and heat kernel parameter).
I used totally different kernel parameter for olivetti_faces (1e2) and lfw_people (1e7).
I will consider some sophisticated automatic parameter selection from now on.

Owner

for lfw_people
EigenFace : 50.2%, LaplacianFace : 63.4%

While LPP and PCA were even for olivetti, LPP outperformed PCA for larger
lfw_people dataset.

OK, that's very interesting. Given that, LPP does sound interesting. Do
you have an hand-waving explanation of why LPP outperforms PCA?

LPP has two additional parameters (number of neighbors and heat kernel
parameter).
I used totally different kernel parameter for olivetti_faces (1e2) and
lfw_people (1e7).
I will consider some sophisticated automatic parameter selection from now on.

Cool! Maybe a heuristic to set the heat kernel parameter, as in the
estimate_bandwidth function used in the mean_shift, would be useful. This
heuristic often works well and removes a parameter to tune.

Owner

OK, that's very interesting. Given that, LPP does sound interesting. Do you have an hand-waving explanation of why LPP outperforms PCA?

According to the paper I listed above, LPP is a linear approximation to the Laplacian Eigen Mapping (Spectral Embedding). It tries to find a low dimensional representation which will preserve the local structure the most, while PCA tries to preserve the global structure. This can be seen from their scoring functions to be minimized or maximized.

The author said, by this way LPP can extract intrinsic low dimensional manifold from high dimensional complex data like face image. Of course it is merely a hypothesis that facial data lie in such a manifold.

Owner

According to the paper I listed above, LPP is a linear approximation to the
Laplacian Eigen Mapping (Spectral Embedding). It tries to find a low
dimensional representation which will preserve the local structure the most,
while PCA tries to preserve the global structure. This can be seen from their
scoring functions to be minimized or maximized.

Right, but can it be seen from figures? For instance, can LPP extract
parameters that are meaningful in the setting of faces? It would be
fantastic to have that in the example, and have pretty pictures. For
instance in the 'Manifold Tangent Classifier' paper, the authors have
fantastically sold their paper with figures of the tangent of the
manifold, which shows that it captures relevent information:
http://books.nips.cc/papers/files/nips24/NIPS2011_1240.pdf

Owner

I don't know whether it is convincing but in my listed paper, the authors showed the so called Laplacianface, which is an LPP counterpart of Eigenface. At the first sight it is not so intuitive but the result was very similar to that of supervised Fisherface and therefore LPP is better for classification task than PCA. I can prepare the same kind of plot for the face dataset of sklearn.

Owner

I don't know whether it is convincing but in my listed paper, the
authors showed the so called Laplacianface, which is an LPP counterpart
of Eigenface.

Yes, this is great, if you manage to reproduce such a convincing figure.
Can you also add LPP to the various manifold learning comparison methods.

@lucidfrontier45 lucidfrontier45 - added `components_` to the LDA so that it is more compatibe to
    other `transformer` classes
- added LPP adn LDA to plot_faces_decomposition.py
ba342c8
Owner

I added LDA and LPP to the examples/decomposition/plot_faces_decomposition.py. Unlike the paper's result, LPP didn't looks like LDA so much. Nevertheless, we can see LPP and PCA returns completely different projection.

Owner
amueller commented Feb 1, 2013

Could you post the plot for the lazy reviewers please ;)

Owner

Here it is.

laplacianface
eigenface
fisherface

Owner
amueller commented Feb 1, 2013

Thanks :)

lucidfrontier45 and others added some commits Nov 29, 2012
@lucidfrontier45 lucidfrontier45 lpp modules fe5c160
@lucidfrontier45 lucidfrontier45 - bugfix
- default eigen_solver -> auto
- default kernel_param -> 10.0
- use_ext member
70b0756
@lucidfrontier45 lucidfrontier45 PCA preprocess d89e00b
@lucidfrontier45 lucidfrontier45 writing documents for LPP 9cb967a
@lucidfrontier45 lucidfrontier45 Comments for LPP ea2d66b
@lucidfrontier45 lucidfrontier45 added example of LPP 5d596d8
@lucidfrontier45 lucidfrontier45 modified manifold examples for LPP c008c05
@lucidfrontier45 lucidfrontier45 - change LPP to LocalityPreservingProjection
- modified PCA preprocess of LPP for removing singularity
41c6c08
@lucidfrontier45 Shiqiao Du - LPP code refactored
- added a comparison code of LPP and PCA (face recognition)
e98c57c
@lucidfrontier45 lucidfrontier45 - added `components_` to the LDA so that it is more compatibe to
    other `transformer` classes
- added LPP adn LDA to plot_faces_decomposition.py
e6587c5
@lucidfrontier45 lucidfrontier45 Merge branch 'lpp' of github.com:lucidfrontier45/scikit-learn into lpp
Conflicts:
	examples/manifold/plot_compare_methods.py
bd3208d
Owner

Thanks Shiqiao!

The figures clearly demonstrate that there is a difference between PCA
and LPP, so they are really useful. In terms of 'story' to tell, it seems
that they sit in between LDA and PCA. That's a nice story.

The picture aren't terribly interpretable by themselves. I wonder if it's
due to the small size of the Olivetti faces dataset. How does it look on
the LFW? We cannot really use that as a selling point as we cannot use
the LFW dataset in an automatically plotted example. How does the same
figures look like on the digits dataset?

Also, while you are pasting pictures of the LPP (which we all really
appreciate), could you please add it to the plot_lle_digits.py and paste
the picture here.

Thanks heaps

Owner

Digit plot for PCA, Spectral Embedding and LPP.
Looks like PCA already done good enough for this example.
LPP's result seems equivalent to PCA. Just their 2nd component vectors have opposite sign.

pca
lem
lpp

Owner

Yeah, LPP is not terrible different from PCA on digits. The good thing is that it does quite well.

How do the 'LaplacianDigits' look like?

Owner

Do we already have an EigenDigits or FisherDigits plot example?

Owner
amueller commented Feb 1, 2013

Actually I think the LPP on the digits looks quite nice. Not terribly different from PCA but if you look at the details, it is indeed "better'.

Owner

Do we already have an EigenDigits or FisherDigits plot example?

Not that I know off. I am just trying to find ideas on how to 'sell' LPP.
It's always good to have a compeling example.

Owner

Actually I think the LPP on the digits looks quite nice. Not terribly
different from PCA but if you look at the details, it is indeed
"better'.

That's my feeling too.

Owner

Actually I think the LPP on the digits looks quite nice. Not terribly different from PCA but if you look at the details, it is indeed "better'.

One drawback of LPP is that it has two additional parameters. While PCA only needs n_components, LPP also needs n_neighbors and kernel_param. Only when they are well tuned can LPP be better than PCA.

Owner

One drawback of LPP is that it has two additional parameters. While PCA only
needs n_components, LPP also needs n_neighbors and kernel_param. Only when they
are well tuned can LPP be better than PCA.

Is the estimate_bandwidth trick on the mean-shift that I suggested
earlier of any use?

Owner

Is the estimate_bandwidth trick on the mean-shift that I suggested
earlier of any use?

I haven't seen them yet. Since I've never heard of them, it might take some time.

Owner

I haven't seen them yet. Since I've never heard of them, it might take some
time.

It is also a standard trick to set the kernel parameter in SVM-RBF.

Owner

I hacked plot_face_decomposition and made a digits version.
However, plots of digits components are beyond my intuitive understanding.

center_digits
eigen_digits
laplacian_digits

Owner

Thanks for the effort. I am indeed not sure that this is useful. Sorry for suggesting a dead end.

Owner

@GaelVaroquaux I looked into bandwidth estimate algorithm in meanshift module but I don't understand why it works.
Can you give me a reference of that algorithm?

Owner

I took this heuristic from a blog post from alex smola a while ago. Cannot
find it now sorry.

Owner
vene commented Feb 2, 2013

Plots on the sklearn.datasets.digits are usually hard to interpret, maybe for curiosity we should look at a similar comparison but on mnist? Just on this thread, not in the docs, of course.

Owner

I found a good estimator for kernel_param. I just used a simple analogy to the variance parameter of Gaussian.
It worked beautifully for the previous face recognition example (examples/manifold/compare_lpp_pca.py).

if kernel_func == "heat":
    W.data **= 2

    # Estimate kernel_param by the heuristics that it must correspond to
    # the variance of Gaussian distribution.
    if kernel_param == "auto":
        kernel_param = np.mean(W.data)

    np.exp(-W.data / kernel_param, W.data)
Owner

Usually, people take a percentile, rather than the mean, to be robust to skewed distribution. You could replace the mean by the median, for instance.

Owner

I plotted PCA, spectral embedding, and LPP of MNIST digit data.
This time LPP showed clearly different shape from PCA.
In fact, LPP showed very similar shape as spectral embedding, which is consistent to fact that LPP is an linear approximation of the Laplacian Eigen Mapping (spectral embedding).

pca
LEM
LPP

Owner

Thanks a lot for all these examples, Shiqiao. You have quite clearly
convinced me that LLP is very interesting. Sorry for expressing doubt
originally.

Owner

I tried to merge overlapped function of spectral_embedding and LPP but it was hard.

First of all I want to replace the laplacian_matrix function to utils.graph.graph_laplacian. However, the two function gave different result for sparse matrix. For dense matrix, the result was same.
I think this code in _graph_laplacian_sparse do some bad thing.

diag_mask = (lap.row == lap.col)
if not diag_mask.sum() == n_nodes:
    # The sparsity pattern of the matrix has holes on the diagonal,
    # we need to fix that
    diag_idx = lap.row[diag_mask]

    lap = lap.tolil()

    diagonal_holes = list(set(range(n_nodes)).difference(diag_idx))
    lap[diagonal_holes, diagonal_holes] = 1
    lap = lap.tocoo()
    diag_mask = (lap.row == lap.col)
lap.data[diag_mask] = 0
w = -np.asarray(lap.sum(axis=1)).squeeze()

What is the holes on the diagonal to be fixed?

Second, LPP and spetral_embedding have different way of constructing affinity matrix. For nearest neighbor affinity matrix, LPP also have to calculate its heat kernel function. The two methods also solve a little different eigen value proble. In spectral_embedding, it is a ordinal eigen value problem but LPP is a generalized eigen value problem.

Considering those, I think code would be less readable if we merge the two module. I'd like to separate lpp and spectral embedding.

Owner

I'll have a look at the commonalities between LPP and spectral embedding, to see if the code can be gracefully merged.

In the mean time, the branch does not pass the common tests, as can be seen from the travis log (look on the pull request page, it tells you that the tests are failing):
https://travis-ci.org/scikit-learn/scikit-learn/builds/4596058
The reason is that you don't use standard input preprocessors to deal with X and y, and as a result the estimator does not give good error messages on things like sparse input values, or NaNs.

Owner

Another remark: you should add a section of LPP in the manifold part of the documentation. Follow the sections for the other methods: a paragraph with a hand-waving description of the algorithm; its strength and weaknesses, a picture (of the embedded digits), and a link to references and example.

Thanks!

Owner

With regards to the difference between your version of graph_laplacian and the one that is in utils, it would help me understand the problem if you could give me a small example of a matrix on which they disagree.

The 'holes' on the diagonal are the diagonal elements that are zero, which shouldn't be the case for a laplacian matrix.

Owner

Also, could you please rename lpp_utils.pyx to _lpp.pyx. Thanks!

Owner

Hey @lucidfrontier45 : this PR looks like it has moved forward a lot, and I'd like to merge it before I leave on vacations. I am ready to put in some work to get it merged, but I will need some help. What's your schedule like in the next week? Will you be able to invest some time, and if so when? Sorry for asking, but I must do some planning for my own time management.

Owner

@GaelVaroquaux Hi. I will try to make some time but my office work is under some important part. Could you give me a list of task that I should do to merge this PR?

The reason is that you don't use standard input preprocessors to deal with X and y, and as a result the estimator does not give good error messages on things like sparse input values, or NaNs.

Can you give me an example of "standard input preprocessors to deal with X and y"

About document, I will check it today. (Maybe first copy from spectral embedding and modify it)

Owner

@GaelVaroquaux Hi. I will try to make some time but my office work is under
some important part.

I can absolutely understand.

Could you give me a list of task that I should do to merge this PR?

Not completely :). I need to have a further look. I'll try to do what I
can to finish it off, but there are some things that I cannot do. Could
you dig up an example for the laplacian matrix for me, please.

Owner

I took a look at test error message.
I fixed some but the I cannot understand this one.
Can anyone tell me what this test exactly check and how to fix it?

FAIL: sklearn.tests.test_common.test_estimators_nan_inf

Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/du/workspace/scikit-learn/sklearn/tests/test_common.py", line 378, in test_estimators_nan_inf
raise AssertionError(error_string_transform, Est)
AssertionError: ("Estimator doesn't check for NaN and inf in transform.", <class 'sklearn.manifold.lpp.LocalityPreservingProjection'>)

raise AssertionError("Estimator doesn't check for NaN and inf in transform.", Est)


Owner

I fixed to pass nosetest.

On rst document, when I type make doc and compiled html documents, the LaTex codes were not compiled. In addition to Sphinx and LaTex distribution, what do I need to embed math in html?
I used 64bit Linux Mint 12 apt repository to install Sphinx and LaTex.

@GaelVaroquaux
Try this code to reproduce that my lpp.laplacian_matrix and utils.graph.graph_laplacian give different result for sparse bug same for dense.
https://gist.github.com/lucidfrontier45/4987187

Owner
mblondel commented Jan 8, 2014

@lucidfrontier45 Since this PR has stalled, I think it would be a good idea to extract the source into a github repo and add it to https://github.com/scikit-learn/scikit-learn/wiki/Third-party-projects-and-code-snippets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment