Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MemoryError exception when calling barycenter_weights with large inputs #17997

Merged

Conversation

bmaisonn
Copy link
Contributor

Reference Issues/PRs

Fixes #10493

What does this implement/fix? Explain your changes.

barycenter_weiths was expecting a (n_samples, n_neighbors, n_dim) array as a second argument. With large number of samples, dimensions and neighbors this provoked MemoryError exceptions.
See a call example here

data = barycenter_weights(X, X[ind], reg=reg)

As noticed in the issue this array is only used to iterate over the first axis so instead of building this huge array we can simply pass the list of indices to the method, then iterate over the indices instead.
The code below is simply changed by building A for each indice.

for i, A in enumerate(Z.transpose(0, 2, 1)):

Any other comments?

This method is private in the module so we can afford changing its signature.
No test was added for this specific use case because doing this computation for large arrays is very long.
In another PR i'll try to improve the time it takes to perform this calculation by leverage n_jobs parameter when it's available.

@bmaisonn
Copy link
Contributor Author

Hello @ogrisel,
Thx for the review the tests are passing now.
Bertrand

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @bmaisonn. Can you please just add a new entry to the change log under doc/whats_new/0.24.rst?

@ogrisel
Copy link
Member

ogrisel commented Jul 27, 2020

Also can you please re-render the plots of the examples that use LLE and report the comments of this PR to compare those with the LLE plots of master to make sure that we did not introduce a regression not caught by the existing tests?

@jnothman
Copy link
Member

Adding a commit with [doc build] in the commit name will re-render the examples in the cloud, accessible using the ci/circleci: doc artifact link.

@bmaisonn
Copy link
Contributor Author

@ogrisel
Indeed i can see few differences in the documentation (Dev is on the left)
https://113900-843222-gh.circle-artifacts.com/0/doc/_changed.html

image
image
image
image

@bmaisonn
Copy link
Contributor Author

Actually when i run those examples on my local env on master they never look the same :)

@bmaisonn
Copy link
Contributor Author

Hello @ogrisel @jnothman, Is this what you expected?

@ogrisel
Copy link
Member

ogrisel commented Jul 29, 2020

The quality of the viz still look good. The non-determinism must come from something else. LGTM.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@@ -235,6 +235,10 @@ Changelog
will be removed in 0.28. :pr:`17662` by
:user:`Joshua Newton <joshuacwnewton>`.

- |Fix| Fix :issue:`10493`. MemoryError exception was raised when
calling barycenter_weights with large inputs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change log entries should refer to the affected public API items. barycenter_weights is private. Could you try give some sense of how this change affects the user?

Z : array-like, shape (n_samples, n_neighbors, n_dim)
Y : array-like, shape (n_samples, n_dim)

indices : array-like, shape (n_samples, n_dim)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please provide a brief description of what indices is for.

@bmaisonn
Copy link
Contributor Author

bmaisonn commented Aug 5, 2020

I updated the documentation @jnothman, let me know if you see something else

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! Thanks, @bmaisonn

@@ -235,6 +235,10 @@ Changelog
will be removed in 0.28. :pr:`17662` by
:user:`Joshua Newton <joshuacwnewton>`.

- |Fix| Fix :issue:`10493`. Fix Local Linear Embedding (LLE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be Efficiency since there is no faulty logic as such, rather we have reduced the space complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can make sense, i don't know all the categories :)
I pushed a new version

@jnothman jnothman merged commit 0fa3263 into scikit-learn:master Aug 5, 2020
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory Error : Locally Linear Embedding
3 participants