PERF Slightly speedup MinCovDet.#29835
Merged
Merged
Conversation
jeremiedbb
approved these changes
Sep 13, 2024
jeremiedbb
left a comment
Member
There was a problem hiding this comment.
Thanks for the PR @anntzer. Even if it's not a major speed improvement, the changes are minimal so it's a net improvement to me. LGTM
ogrisel
reviewed
Sep 13, 2024
ogrisel
left a comment
Member
There was a problem hiding this comment.
The change LGTM. I timed a similar improvement. I expected the improvement to increase when increasing n_samples but it does not seem so. Anyways I agree with @jeremiedbb's comment above.
Please add a changelog entry in doc/whats_new/v1.6.rst.
`support` doesn't need to repeatedly converted from a list of indices
(from argsort) to a boolean mask (just do it once at the end);
furthermore, the distances don't need to be fully sorted (in O(n log
n)), rather, only the n_support first indices need to be selected (in
O(n)).
Locally, this patch speeds up the following simple benchmark by ~15%.
np.random.seed(1)
# unit gaussian plus 10% outliers
t = np.concatenate([np.random.randn(1000, 2), [2, 4] * np.random.randn(100, 2)])
%timeit sklearn.covariance.MinCovDet().fit(t).covariance_
Contributor
Author
|
Sure, renamed support to support_indices and added a changelog entry. |
ogrisel
approved these changes
Sep 16, 2024
ogrisel
left a comment
Member
There was a problem hiding this comment.
LGTM. Thanks for the follow-up.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

supportdoesn't need to repeatedly converted from a list of indices (from argsort) to a boolean mask (just do it once at the end); furthermore, the distances don't need to be fully sorted (in O(n log n)), rather, only the n_support first indices need to be selected (in O(n)).Locally, this patch speeds up the following simple benchmark by ~15%.
Reference Issues/PRs
What does this implement/fix? Explain your changes.
Any other comments?