New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MemoryError exception when calling barycenter_weights with large inputs #17997
Fix MemoryError exception when calling barycenter_weights with large inputs #17997
Conversation
Hello @ogrisel, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @bmaisonn. Can you please just add a new entry to the change log under doc/whats_new/0.24.rst
?
Also can you please re-render the plots of the examples that use LLE and report the comments of this PR to compare those with the LLE plots of master to make sure that we did not introduce a regression not caught by the existing tests? |
Adding a commit with |
@ogrisel |
Actually when i run those examples on my local env on master they never look the same :) |
The quality of the viz still look good. The non-determinism must come from something else. LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
doc/whats_new/v0.24.rst
Outdated
@@ -235,6 +235,10 @@ Changelog | |||
will be removed in 0.28. :pr:`17662` by | |||
:user:`Joshua Newton <joshuacwnewton>`. | |||
|
|||
- |Fix| Fix :issue:`10493`. MemoryError exception was raised when | |||
calling barycenter_weights with large inputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change log entries should refer to the affected public API items. barycenter_weights
is private. Could you try give some sense of how this change affects the user?
Z : array-like, shape (n_samples, n_neighbors, n_dim) | ||
Y : array-like, shape (n_samples, n_dim) | ||
|
||
indices : array-like, shape (n_samples, n_dim) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please provide a brief description of what indices is for.
I updated the documentation @jnothman, let me know if you see something else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm! Thanks, @bmaisonn
doc/whats_new/v0.24.rst
Outdated
@@ -235,6 +235,10 @@ Changelog | |||
will be removed in 0.28. :pr:`17662` by | |||
:user:`Joshua Newton <joshuacwnewton>`. | |||
|
|||
- |Fix| Fix :issue:`10493`. Fix Local Linear Embedding (LLE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be Efficiency since there is no faulty logic as such, rather we have reduced the space complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can make sense, i don't know all the categories :)
I pushed a new version
Reference Issues/PRs
Fixes #10493
What does this implement/fix? Explain your changes.
barycenter_weiths
was expecting a(n_samples, n_neighbors, n_dim)
array as a second argument. With large number of samples, dimensions and neighbors this provokedMemoryError
exceptions.See a call example here
scikit-learn/sklearn/manifold/_locally_linear.py
Line 105 in 6ddaaed
As noticed in the issue this array is only used to iterate over the first axis so instead of building this huge array we can simply pass the list of indices to the method, then iterate over the indices instead.
The code below is simply changed by building A for each indice.
scikit-learn/sklearn/manifold/_locally_linear.py
Line 54 in 6ddaaed
Any other comments?
This method is private in the module so we can afford changing its signature.
No test was added for this specific use case because doing this computation for large arrays is very long.
In another PR i'll try to improve the time it takes to perform this calculation by leverage n_jobs parameter when it's available.