Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Error : Locally Linear Embedding #10493

Closed
ncherel opened this issue Jan 17, 2018 · 10 comments · Fixed by #17997
Closed

Memory Error : Locally Linear Embedding #10493

ncherel opened this issue Jan 17, 2018 · 10 comments · Fixed by #17997
Labels
Bug Easy Well-defined and straightforward way to resolve help wanted Moderate Anything that requires some knowledge of conventions and best practices

Comments

@ncherel
Copy link

ncherel commented Jan 17, 2018

Locally Linear Embedding (LLE) encounters a memory error when using a large matrix (10000x10000) and a large number of neighbors (>500).

The memory error occurs with the standard option after the nearest neighbors have been computed in barycenter_kneighbors_graph()
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/manifold/locally_linear.py#L99

knn = NearestNeighbors(n_neighbors + 1, n_jobs=n_jobs).fit(X)
X = knn._fit_X
n_samples = X.shape[0]
ind = knn.kneighbors(X, return_distance=False)[:, 1:]
data = barycenter_weights(X, X[ind], reg=reg)

X[ind] creates a matrix of size (n_samples x n_neighbors x n_dim) but the algorithm is then only using the neighbors sequentially in barycenter_weights() https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/manifold/locally_linear.py#L20

barycenter_weights() could be modified to take ind as input and avoid this issue.

@jnothman
Copy link
Member

jnothman commented Jan 17, 2018 via email

@ncherel
Copy link
Author

ncherel commented Jan 19, 2018

B is larger than it needs to be.

@jnothman jnothman added Easy Well-defined and straightforward way to resolve Moderate Anything that requires some knowledge of conventions and best practices labels Jan 22, 2018
@jnothman
Copy link
Member

jnothman commented Jan 22, 2018 via email

@pinakinathc
Copy link
Contributor

hi can i take this up?

@jnothman
Copy link
Member

jnothman commented Jan 23, 2018 via email

@maykulkarni
Copy link
Contributor

I would like to take it up. I have an idea to fix this.

@wxnudt
Copy link

wxnudt commented Aug 23, 2019

I think I also meet the same problem. have you fixed it? @maykulkarni

@marimeireles
Copy link
Contributor

@jnothman I'm gonna give a try on this one. I'll post here if I get stuck!
I'm taking this bug as part of the #WiMLDS_Berlin sprint =)

@marimeireles
Copy link
Contributor

@sammsc is also working on this with me :)

@marimeireles
Copy link
Contributor

Okay... So I tried to replicate the error on my computer but when I try to add 500 neighbors to each part of the matrix my computer just can't do it.
Any idea on how I can replicate it?
I tried:

    X = np.random.rand(12000, 12000)
    print("Just finished to create the matrix")
    A = barycenter_kneighbors_graph(X, 500)

And my program gets stuck in the last line.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Easy Well-defined and straightforward way to resolve help wanted Moderate Anything that requires some knowledge of conventions and best practices
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants