Skip to content

FIX Restore support for n_samples == n_features in MinCovDet.#30483

Merged
jeremiedbb merged 1 commit into
scikit-learn:mainfrom
anntzer:mcdnsnf
Jan 20, 2025
Merged

FIX Restore support for n_samples == n_features in MinCovDet.#30483
jeremiedbb merged 1 commit into
scikit-learn:mainfrom
anntzer:mcdnsnf

Conversation

@anntzer
Copy link
Copy Markdown
Contributor

@anntzer anntzer commented Dec 14, 2024

Fixes #30625

#29835 broke support for the (degenerate) n_samples == n_features (and support_fraction unset) case in MinCovDet because this led to n_support = n_samples + 1, which was implicitly clipped to n_support = n_samples previously (at np.argsort(dist)[:n_support]) but not anymore now (and crashes np.argpartition(dist, n_support - 1).

To fix this, explicitly clip n_support.

Noticed at pyRiemann/pyRiemann#335.

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 14, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 4aa5665. Link to the linter CI: here

Copy link
Copy Markdown
Member

@virchan virchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if n_samples == n_feautres (and assume support_fraction=None), then

$$\begin{align*} n_{\mathrm{support}} &= \left\lceil \frac{n_{\mathrm{samples}} + n_{\mathrm{features}} + 1}{2} \right\rceil \\ &= \left\lceil \frac{2 n_{\mathrm{samples}} + 1}{2} \right\rceil \\ &= \left\lceil n_{\mathrm{samples}} + \frac{1}{2} \right\rceil \\ &= n_{\mathrm{samples}} + 1 \end{align*}$$

This causes an out-of-range index because we must always have n_support <= n_samples.

@anntzer's solution---to choose the smaller value between n_samples and the originally implemented n_support---makes sense to me.

launch_mcd_on_dataset(500, 1, 100, 0.02, 0.02, 350, global_random_seed)

# n_samples == n_features
launch_mcd_on_dataset(20, 20, 0, 0.1, 0.1, 50, global_random_seed)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the CI test is failing because tol_support = 50 in the launch_mcd_on_dataset function. We should always have tol_support <= n_samples here.

Copy link
Copy Markdown
Member

@virchan virchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a changelog is needed here @anntzer. You can refer to the following link for more details:

https://github.com/scikit-learn/scikit-learn/blob/main/doc/whats_new/upcoming_changes/README.md

fc772c7 broke support for the (degenerate) `n_samples == n_features`
(and support_fraction unset) case in MinCovDet because this led to
`n_support = n_samples + 1`, which was implicitly clipped to
`n_support = n_samples` previously (at `np.argsort(dist)[:n_support]`)
but not anymore now (and crashes `np.argpartition(dist, n_support - 1)`.

To fix this, explicitly clip `n_support`.
@anntzer
Copy link
Copy Markdown
Contributor Author

anntzer commented Dec 20, 2024

done

Copy link
Copy Markdown
Member

@virchan virchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @anntzer!

@agramfort, @ogrisel, just a friendly ping—would you like to take a look?

Copy link
Copy Markdown
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @anntzer

@jeremiedbb jeremiedbb added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Jan 20, 2025
@jeremiedbb jeremiedbb enabled auto-merge (squash) January 20, 2025 13:21
@jeremiedbb jeremiedbb merged commit e36f66a into scikit-learn:main Jan 20, 2025
@anntzer anntzer deleted the mcdnsnf branch January 20, 2025 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:covariance To backport PR merged in master that need a backport to a release branch defined based on the milestone.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

scikit-learn 1.6: Elliptic Envelope Fails with More Features than Samples

3 participants