Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix n_nonzero_coefs_ in OrthogonalMatchingPursuit always None when ignored #28557

Merged
merged 6 commits into from Mar 6, 2024

Conversation

lucyleeow
Copy link
Member

Reference Issues/PRs

closes #28469

What does this implement/fix? Explain your changes.

Ensures that n_nonzero_coefs_ in OrthogonalMatchingPursuit is always None when ignored. Clarified n_nonzero_coefs and n_nonzero_coefs_ docstrings

Any other comments?

cc @StefanieSenger you may be interested in reviewing?

Copy link

github-actions bot commented Mar 1, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: ddb458b. Link to the linter CI: here

Copy link
Contributor

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given it another look and have suggested some little changes in the phrasing of the docstring.

Thinking about it, I was wondering why n_nonzero_coefs_ is actually what it is and not _n_nonzero_coefs instead. It‘s not reflecting a lot of information about the fitted model, because it only holds a value under certain conditions (this was already so before this PR) and thus doesn't reflect "the number of non-zero coefficients in the solution".

And there is a codecov warning for the newly added line. So I think you need to add this in one of the tests.

I've approved this PR (seen this button for the first time), but I'm not a maintainer, so you need two more reviewers.

Comment on lines 632 to 634
Desired number of non-zero entries in the solution. If `None` and `tol` is
also `None` this value is either set to 10% of `n_features` or 1, whichever is
greater. Ignored if `tol` is not `None`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think since the "10% or 1" are a calculation that refer to n_nonzero_coefs_, not n_nonzero_coefs, it's okay to leave it away here. That reads a bit cleaner:

Suggested change
Desired number of non-zero entries in the solution. If `None` and `tol` is
also `None` this value is either set to 10% of `n_features` or 1, whichever is
greater. Ignored if `tol` is not `None`.
Desired number of non-zero entries in the solution. Ignored if `tol` is set.

It's also fine if it stays as it is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is useful to say what happens when it is left as None (which it does do in main, just not completely correct). But I agree it does not read as clean now, though I couldn't think of a way to improve.

Comment on lines 662 to 665
The number of non-zero coefficients in the solution. If
`n_nonzero_coefs` is None and `tol` is None this value is either set
to 10% of `n_features` or 1, whichever is greater.
When `None`, `n_nonzero_coefs` has been ignored because `tol` has been set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then maybe explain the internal logic of calculating n_nonzero_coefs_ here:

Suggested change
The number of non-zero coefficients in the solution. If
`n_nonzero_coefs` is None and `tol` is None this value is either set
to 10% of `n_features` or 1, whichever is greater.
When `None`, `n_nonzero_coefs` has been ignored because `tol` has been set.
The number of non-zero coefficients in the solution. `None`, if `tol` is set,
otherwise is `n_nonzero_coefs` if this param is set. If both
`n_nonzero_coefs` and `tol` are `None`, this value is set to either 10%
of `n_features` or 1, whichever is greater.

I tried to rephrase so it's more intuitive. You can also ignore that, it it doesn't feel more intuitive to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, I've re-word.

@lucyleeow
Copy link
Member Author

lucyleeow commented Mar 1, 2024

Good point about the test, I've added a separate test, to avoid testing too many things in one of the existing tests.

Thinking about it, I was wondering why n_nonzero_coefs_ is actually what it is and not _n_nonzero_coefs instead.

I guess it is mostly for the case when n_nonzero_coefs and tol are both None and n_nonzero_coefs needs to be calculated? Just to tell the user what n_nonzero_coefs value was used in that case? But yeah, otherwise it's not really providing much info.

I've approved this PR (seen this button for the first time), but I'm not a maintainer, so you need two more reviewers.

All good. Non-maintainer reviews are also valuable! :)

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @lucyleeow and @StefanieSenger

@jeremiedbb
Copy link
Member

jeremiedbb commented Mar 6, 2024

Actually, I would make it a |Fix| since the old behavior was wrong or at least misleading: when tol is not None, the effective number of non-zero coefs doesn't match n_nonzero_coefs_.

@lucyleeow
Copy link
Member Author

Done, thanks for review @jeremiedbb !

@jeremiedbb jeremiedbb enabled auto-merge (squash) March 6, 2024 10:14
@jeremiedbb jeremiedbb merged commit 630961c into scikit-learn:main Mar 6, 2024
28 checks passed
@lucyleeow lucyleeow deleted the doc_omp branch March 6, 2024 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggesting updates on the doc of sklearn.linear_model.OrthogonalMatchingPursuit
3 participants