Skip to content

ENH Add replace_undefined_by param to class_likelihood_ratios#29288

Merged
adrinjalali merged 47 commits intoscikit-learn:mainfrom
StefanieSenger:class_likelihood_ratios
Jan 17, 2025
Merged

ENH Add replace_undefined_by param to class_likelihood_ratios#29288
adrinjalali merged 47 commits intoscikit-learn:mainfrom
StefanieSenger:class_likelihood_ratios

Conversation

@StefanieSenger
Copy link
Copy Markdown
Member

@StefanieSenger StefanieSenger commented Jun 18, 2024

Reference Issues/PRs

towards #29048

What does this implement/fix? Explain your changes.

This PR adds a zero_division param to class_likelihood_ratios like we're doing in the above issue. Since this function returns two scores, the input to the zero_division param also needs to encompass two values.

There is a raise_warning param already used for a similar purpose, that I deprecated here in a way that translates its functionality (exclusively raising warnings, the return values are not affected) to the new param.

Question:
The output of zero_division="warn" (default) is set to np.nan as it is with the current function in case of a zero division (which is also the content of the warning). The idea when we talked about it was to keep backwards compatibility.

I think we don't need to do this and can return the lowest scores for each metric respectively (1 for LR+ and 0 for LR-) in case of zero_division="warn" right away, because the return values don't have anything to do with the deprecated param.
Does that make sense?

Edit: this question was answered: yes, we keep the np.nan default return value for backwards compatibility until version 1.8.

Any other comments?

The warning that was previously raised if support_pos == 0 has nothing to do with dividing by zero and thus I decided to decouple it from the new param (and the old one). Since this doesn't change the functionality and only adds an additional warning in a certain case, that is probably alright, isn't it?

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jun 18, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 49ca5d8. Link to the linter CI: here

@ArturoAmorQ ArturoAmorQ self-requested a review June 18, 2024 15:22
@glemaitre glemaitre self-requested a review June 18, 2024 21:09
Copy link
Copy Markdown
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should add more test to check all the possibility for zero_division !="warn". We should also check that we raise the proper deprecation warning for raise_warning.

Comment thread sklearn/metrics/_classification.py Outdated
"zero_division": [
Hidden(StrOptions({"default"})),
StrOptions({"warn"}),
dict, # this needs to be further defined, but Options only takes unmutable
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think dict is enough. Further validation will be handled in the function itself. The parameter is not doing advance checks.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there currently is no validation inplace on the function itself, because I was using open else statements for any other input. An option would be to use elifs with well defined conditions for the zero_division args and raise an input error on else if the user passes something invalid.
But my impression had been that the param validation meant to handle this rather than the function code and the goal of having this was to unburden the actual function code from validations like these. Which way should be precede?

Copy link
Copy Markdown
Member Author

@StefanieSenger StefanieSenger Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter is not doing advance checks.

Oh, did you mean: parameter_validation is not used for advance checks? And advance checks, does this mean checks that go further than checking types and maybe immutable arguments? Then I think I understand now and the input validation needs to happen in the function itself.

Edit: Done

Comment thread sklearn/metrics/_classification.py Outdated
To define the return values in case of a division by zero, use
`zero_division` instead.

zero_division : str or dict, default="warn"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know exactly the string, we can state it directly.

I'm also thinking to add np.nan as a parameter that would be a shorthand for {"LR+": np.nan, "LR-": np.nan}. It is a bit lighter.

Suggested change
zero_division : str or dict, default="warn"
zero_division : "warn", np.nan or dict, default="warn"

Copy link
Copy Markdown
Member Author

@StefanieSenger StefanieSenger Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think we also keep the {"LR+": np.nan, "LR-": np.nan} option and add the np.nan option additionally. I slighly deviated and made it a str "nan", because a) this is how it is used in the other zero_division param cases and b) np.nan resulted difficult to work with (in the validation and in other places).

Edit: this is done

Comment thread sklearn/metrics/_classification.py Outdated
Comment thread sklearn/metrics/_classification.py Outdated
Comment thread sklearn/metrics/_classification.py Outdated
Comment on lines +2042 to +2044
# zero_division="warn" does not need to be "np.nan" anymore and should be the lowest
# score for each metric respectively (1 for LR+ and 0 for LR-) to match the other
# functions that take a zero_division param. Return values and warning messages need
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this change of behaviour should also be done with a deprecation from 1.8 to 1.10.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand. Do you say there should be two cycles of deprecation?

Copy link
Copy Markdown
Member Author

@StefanieSenger StefanieSenger Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at this anew, I see that you meant that the change of the default value for zero_division from warn to the lowest possible scores needs to be a new deprecation cycle.

Can we also consider doing two steps in one? Because the goal is that the functionality of zero_division matches the other functions that take a zero_division param and the longer this is not the case, the more confusing.

Edit: I have added a FutureWarning for the changing default behaviour and commented on this in the # TODO section.

Comment thread sklearn/metrics/_classification.py Outdated
Comment thread sklearn/metrics/_classification.py Outdated
Comment thread sklearn/metrics/_classification.py Outdated
Comment thread sklearn/metrics/_classification.py Outdated
Comment thread sklearn/metrics/_classification.py Outdated
negative_likelihood_ratio = neg_num / neg_denom
elif isinstance(
zero_division.get("LR-", None), (int, float)
) and zero_division.get("LR-", None) in [0, 1]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I don't think that we will raise an error if we don't have any other thing that 0/1. I'm thinking that we could always create when possible a dictionary and check that the values in the dict are the one expected (0/1/np.inf). This would be safer at this stage.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very interesting. Before I jump into making this, let me reformulate to clarify if we are talking about the same idea:

Should we accept any value that the metrics could output as a valid input in the the zero_division dict?
Should we allow it for class_likelihood_ratios only or also for the other classification metrics?
Would it be useful to someone?

I feel I could not judge if it was useful and would rather stick with how zero_division works on the other metrics by implementing it in a similar way here. (Which means restricting possible values to the min and max scores.) If you or someone goes "yes, yes yes" for the above questions, I'd be happy to learn more about this idea and to make it work this way.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we had talked about this, I have now implemented a lenient check of the input values for the zero_division param at the beginning of the function, that allows all the values, the ratios could take.

Copy link
Copy Markdown
Member

@ArturoAmorQ ArturoAmorQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @StefanieSenger, I would also briefly document the zero_division behavior in the dedicated mathematical divergences section of the user guide.

Copy link
Copy Markdown
Member Author

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review @glemaitre, I have done most things you have suggested and commented on a few, that were not that clear and I would like to exchange some thoughts or concerns.

Comment thread sklearn/metrics/_classification.py Outdated
To define the return values in case of a division by zero, use
`zero_division` instead.

zero_division : str or dict, default="warn"
Copy link
Copy Markdown
Member Author

@StefanieSenger StefanieSenger Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think we also keep the {"LR+": np.nan, "LR-": np.nan} option and add the np.nan option additionally. I slighly deviated and made it a str "nan", because a) this is how it is used in the other zero_division param cases and b) np.nan resulted difficult to work with (in the validation and in other places).

Edit: this is done

Comment thread sklearn/metrics/_classification.py Outdated
Comment thread sklearn/metrics/_classification.py Outdated
Comment on lines +2042 to +2044
# zero_division="warn" does not need to be "np.nan" anymore and should be the lowest
# score for each metric respectively (1 for LR+ and 0 for LR-) to match the other
# functions that take a zero_division param. Return values and warning messages need
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand. Do you say there should be two cycles of deprecation?

Comment thread sklearn/metrics/_classification.py Outdated
Comment thread sklearn/metrics/_classification.py Outdated
Comment thread sklearn/metrics/_classification.py Outdated
"zero_division": [
Hidden(StrOptions({"default"})),
StrOptions({"warn"}),
dict, # this needs to be further defined, but Options only takes unmutable
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there currently is no validation inplace on the function itself, because I was using open else statements for any other input. An option would be to use elifs with well defined conditions for the zero_division args and raise an input error on else if the user passes something invalid.
But my impression had been that the param validation meant to handle this rather than the function code and the goal of having this was to unburden the actual function code from validations like these. Which way should be precede?

Comment thread sklearn/metrics/_classification.py Outdated
negative_likelihood_ratio = neg_num / neg_denom
elif isinstance(
zero_division.get("LR-", None), (int, float)
) and zero_division.get("LR-", None) in [0, 1]:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very interesting. Before I jump into making this, let me reformulate to clarify if we are talking about the same idea:

Should we accept any value that the metrics could output as a valid input in the the zero_division dict?
Should we allow it for class_likelihood_ratios only or also for the other classification metrics?
Would it be useful to someone?

I feel I could not judge if it was useful and would rather stick with how zero_division works on the other metrics by implementing it in a similar way here. (Which means restricting possible values to the min and max scores.) If you or someone goes "yes, yes yes" for the above questions, I'd be happy to learn more about this idea and to make it work this way.

@StefanieSenger
Copy link
Copy Markdown
Member Author

StefanieSenger commented Jun 24, 2024

Thanks for the PR @StefanieSenger, I would also briefly document the zero_division behavior in the dedicated mathematical divergences section of the user guide.

Thank you @ArturoAmorQ, I have mentioned it in this section. I also couldn't hold myself back to try to define the conditions for zero divisions a bit clearer there. Please let me know if I succeeded with my attempt.

Copy link
Copy Markdown
Member

@ArturoAmorQ ArturoAmorQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @StefanieSenger, here are just a couple of comments regarding the documentation aspects of this PR. I'll let the others review for the actual code :)

Comment thread doc/modules/model_evaluation.rst Outdated
Comment on lines +1921 to +1922
interpreted as the classifier never wrongly identifying negative cases as positives.
This happens, for instance, when using a `DummyClassifier` that always predicts the
Copy link
Copy Markdown
Member

@ArturoAmorQ ArturoAmorQ Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording "never wrongly identifying negative cases as positives" does not bring added value, as it is an equivalence to fp=0.

The original wording "perfectly identifying positive cases" was intended to remind the user that LR+ is a "higher is better" kind of metrics (indeed, if fp=0 any sample classified as positive has to be a real positive and then LR+=inf). The paragraph is meant to highlight that the interpretation "higher is better ergo inf is perfect" is lost when both fp=tp=0, as that could be the case of a DummyClassifier (the absence of positive predictions leads to fp=tp=0).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe my point would be clearer if we give some examples:

  • Case 1: fp=0, tp≠0, LR_+ diverges
    • the classifier is perfect
    • in coherence with being a "higher is better" metric e.g.:
    • y_true = np.array([0, 0, 0, 1, 1, 1])
    • y_pred = np.array([0, 0, 0, 1, 1, 1])
  • Case 2: fp=0, tp=0, LR_+ diverges
    • y_pred only includes the majority class
    • can be misleading as it's the case of a DummyClassifier with imbalanced data e.g.:
    • y_true = np.array([0, 0, 0, 0, 0, 1])
    • y_pred = np.array([0, 0, 0, 0, 0, 0])
  • Case 3: fn=0, tp=0, LR_+ and LR_- diverge
    • the test set only includes the majority class, but the classifier still makes errors
    • can be misleading as it's a common case when cross-validating with imbalanced data e.g.:
    • y_true = np.array([0, 0, 0, 0, 0, 0])
    • y_pred = np.array([0, 0, 0, 0, 0, 1])
  • Case 4: tn=0, LR_- diverges
    • positive and negative classes are inverted
    • can be indicative of divergences in LR_+ if classes are reverted e.g.:
    • y_true = np.array([1, 1, 1, 0, 0, 0])
    • y_pred = np.array([0, 0, 0, 1, 1, 1])

Having this in the shape of table would be nice.

Copy link
Copy Markdown
Member

@ArturoAmorQ ArturoAmorQ Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something similar to:

| Zeros in the Confusion Matrix | Diverges      | Reason                                                         | Interpretation                                                                                    | Example                                      |
|-------------------------------|---------------|----------------------------------------------------------------|---------------------------------------------------------------------------------------------------|----------------------------------------------|
| fp=0, tp≠0                    | LR_+          | The classifier is perfect                                      | In coherence with being a "higher is better" metric                                               | `y_true = [0, 0, 0, 1, 1, 1]`<br>`y_pred = [0, 0, 0, 1, 1, 1]` |
| fp=0, tp=0                    | LR_+          | `y_pred` only includes the majority class                      | Can be misleading, as it’s the case of a `DummyClassifier` with imbalanced data                   | `y_true = [0, 0, 0, 0, 0, 1]`<br>`y_pred = [0, 0, 0, 0, 0, 0]` |
| fn=0, tp=0                    | LR_+ and LR_- | The test set only includes the majority class, but classifier still makes errors | Can be misleading, common when cross-validating with imbalanced data | `y_true = [0, 0, 0, 0, 0, 0]`<br>`y_pred = [0, 0, 0, 0, 0, 1]` |
| tn=0                          | LR_-          | Positive and negative classes are inverted                     | Can indicate divergences in LR_+ if classes are reversed                                          | `y_true = [1, 1, 1, 0, 0, 0]`<br>`y_pred = [0, 0, 0, 1, 1, 1]` |

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clarifying this, @ArturoAmorQ. I think I understand for the LR+ case (not so sure about the LR- case though, so I didn't touch it).

I have tried to point out these two cases fp==0 might indicate and also stay in the scope of a side note/drop down on mathematical divergences. I feel that this could be material for an extensive example, but this is outside the scope of this PR.

Please let me know what you think about my attempt to rephrase and give advice on how to interpret the positive likelihood ratio in case of a divergence / warning shown to the user.

Comment thread sklearn/metrics/_classification.py Outdated
Comment on lines +2021 to +2028
Conditions under which this warning is raised in relation to the confusion matrix
deriving from the `y_true` and `y_pred` arguments:
When the number of false positives is 0, the positive likelihood ratio is undefined.
When the number of true negatives is 0, the negative likelihood ratio is undefined.
`UndefinedMetricWarning` is also (and regardles of the `zero_division` and
`raise_warning` arguments) raised when no samples of the positive class are present
in `y_true` (when the sum of true positives and false negatives is 0). Then, both
the positive and the negative likelihood ratios are undefined.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Conditions under which this warning is raised in relation to the confusion matrix
deriving from the `y_true` and `y_pred` arguments:
When the number of false positives is 0, the positive likelihood ratio is undefined.
When the number of true negatives is 0, the negative likelihood ratio is undefined.
`UndefinedMetricWarning` is also (and regardles of the `zero_division` and
`raise_warning` arguments) raised when no samples of the positive class are present
in `y_true` (when the sum of true positives and false negatives is 0). Then, both
the positive and the negative likelihood ratios are undefined.
A warning is raised given the following conditions in `y_true` and `y_pred`:
- False positives are 0: positive likelihood ratio is undefined.
- True negatives are 0: negative likelihood ratio is undefined.
- No positive class samples seen in `y_true`: both likelihood ratios are
undefined.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestion, @ArturoAmorQ. I agree that the previous text was a bit confusing and I took your suggestion to work on it in my new commit. I have also mentioned the input arguments for zero_division and raise_warning that also influence, if a warning in raised.

Comment thread sklearn/metrics/_classification.py Outdated
Comment thread doc/modules/model_evaluation.rst Outdated
Comment on lines +1921 to +1922
interpreted as the classifier never wrongly identifying negative cases as positives.
This happens, for instance, when using a `DummyClassifier` that always predicts the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe my point would be clearer if we give some examples:

  • Case 1: fp=0, tp≠0, LR_+ diverges
    • the classifier is perfect
    • in coherence with being a "higher is better" metric e.g.:
    • y_true = np.array([0, 0, 0, 1, 1, 1])
    • y_pred = np.array([0, 0, 0, 1, 1, 1])
  • Case 2: fp=0, tp=0, LR_+ diverges
    • y_pred only includes the majority class
    • can be misleading as it's the case of a DummyClassifier with imbalanced data e.g.:
    • y_true = np.array([0, 0, 0, 0, 0, 1])
    • y_pred = np.array([0, 0, 0, 0, 0, 0])
  • Case 3: fn=0, tp=0, LR_+ and LR_- diverge
    • the test set only includes the majority class, but the classifier still makes errors
    • can be misleading as it's a common case when cross-validating with imbalanced data e.g.:
    • y_true = np.array([0, 0, 0, 0, 0, 0])
    • y_pred = np.array([0, 0, 0, 0, 0, 1])
  • Case 4: tn=0, LR_- diverges
    • positive and negative classes are inverted
    • can be indicative of divergences in LR_+ if classes are reverted e.g.:
    • y_true = np.array([1, 1, 1, 0, 0, 0])
    • y_pred = np.array([0, 0, 0, 1, 1, 1])

Having this in the shape of table would be nice.

Comment thread doc/modules/model_evaluation.rst Outdated
Comment on lines +1921 to +1922
interpreted as the classifier never wrongly identifying negative cases as positives.
This happens, for instance, when using a `DummyClassifier` that always predicts the
Copy link
Copy Markdown
Member

@ArturoAmorQ ArturoAmorQ Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something similar to:

| Zeros in the Confusion Matrix | Diverges      | Reason                                                         | Interpretation                                                                                    | Example                                      |
|-------------------------------|---------------|----------------------------------------------------------------|---------------------------------------------------------------------------------------------------|----------------------------------------------|
| fp=0, tp≠0                    | LR_+          | The classifier is perfect                                      | In coherence with being a "higher is better" metric                                               | `y_true = [0, 0, 0, 1, 1, 1]`<br>`y_pred = [0, 0, 0, 1, 1, 1]` |
| fp=0, tp=0                    | LR_+          | `y_pred` only includes the majority class                      | Can be misleading, as it’s the case of a `DummyClassifier` with imbalanced data                   | `y_true = [0, 0, 0, 0, 0, 1]`<br>`y_pred = [0, 0, 0, 0, 0, 0]` |
| fn=0, tp=0                    | LR_+ and LR_- | The test set only includes the majority class, but classifier still makes errors | Can be misleading, common when cross-validating with imbalanced data | `y_true = [0, 0, 0, 0, 0, 0]`<br>`y_pred = [0, 0, 0, 0, 0, 1]` |
| tn=0                          | LR_-          | Positive and negative classes are inverted                     | Can indicate divergences in LR_+ if classes are reversed                                          | `y_true = [1, 1, 1, 0, 0, 0]`<br>`y_pred = [0, 0, 0, 1, 1, 1]` |

@glemaitre glemaitre self-requested a review November 25, 2024 18:54
Comment thread doc/whats_new/upcoming_changes/sklearn.metrics/29288.fix.rst Outdated
Comment thread sklearn/metrics/_classification.py Outdated
Comment thread sklearn/metrics/tests/test_classification.py Outdated
Comment thread sklearn/metrics/tests/test_classification.py Outdated
Comment thread sklearn/metrics/tests/test_classification.py Outdated
Comment thread sklearn/metrics/tests/test_classification.py Outdated
Comment thread doc/modules/model_evaluation.rst Outdated
default an appropriate warning message and returns `nan` to avoid pollution when
averaging over cross-validation folds.
The positive likelihood ratio (`LR+`) is undefined when :math:`fp = 0`, meaning the
classifier does not misclassify any negatives as positives. This condition can either
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to call "negative samples" (or observations) or "positive samples" instead of "negatives" and "positives"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rephrased this into "does not misclassify any negative labels as positives."
I think that is even clearer.

Comment thread doc/modules/model_evaluation.rst Outdated
Copy link
Copy Markdown
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only minor changes. The code is pretty great and well documented I should say (quite easy to get lost in this zero_division mess otherwise).

Really good job @StefanieSenger.

@adrinjalali you might want to be the one merging after the comments are addressed.

Copy link
Copy Markdown
Member Author

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review @glemaitre.

I have made all the required changes.

Comment thread doc/modules/model_evaluation.rst Outdated
default an appropriate warning message and returns `nan` to avoid pollution when
averaging over cross-validation folds.
The positive likelihood ratio (`LR+`) is undefined when :math:`fp = 0`, meaning the
classifier does not misclassify any negatives as positives. This condition can either
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rephrased this into "does not misclassify any negative labels as positives."
I think that is even clearer.

Comment thread sklearn/metrics/tests/test_classification.py Outdated
Comment thread sklearn/metrics/_classification.py Outdated
Comment thread sklearn/metrics/_classification.py Outdated
"raise_warning": ["boolean", Hidden(StrOptions({"deprecated"}))],
"replace_undefined_by": [
Hidden(StrOptions({"default"})),
(StrOptions({"worst"})),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it back from "worst" to to 1.0 here, in the tests, in the control flow checks, in the warning messages and in the docstring.

Comment on lines +143 to +145
if (isinstance(constraint, str) and constraint == "nan") or (
isinstance(constraint, float) and np.isnan(constraint)
):
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the test case.

@adrinjalali adrinjalali merged commit 3060384 into scikit-learn:main Jan 17, 2025
@lesteve
Copy link
Copy Markdown
Member

lesteve commented Jan 17, 2025

Not of extremely importance on a Friday afternoon but FYI this broke main, looks like a warning turned into error.

Relevant part from build log

Extension error:
Here is a summary of the problems encountered when running the examples:

Unexpected failing examples (1):

    ../examples/model_selection/plot_likelihood_ratios.py failed leaving traceback:

    Traceback (most recent call last):
      File "/home/circleci/project/examples/model_selection/plot_likelihood_ratios.py", line 64, in <module>
        pos_LR, neg_LR = class_likelihood_ratios(y_test, y_pred)
      File "/home/circleci/project/sklearn/utils/_param_validation.py", line 218, in wrapper
        return func(*args, **kwargs)
      File "/home/circleci/project/sklearn/metrics/_classification.py", line 2100, in class_likelihood_ratios
        warnings.warn(mgs_changed_default, FutureWarning)
    FutureWarning: The default return value of `class_likelihood_ratios` in case of a division by zero has been deprecated in 1.7 and will be changed to the worst scores (`(1.0, 1.0)`) in version 1.9. Set `replace_undefined_by=1.0` to use the newdefault and to silence this Warning.

@StefanieSenger
Copy link
Copy Markdown
Member Author

So, didn't we build the examples properly before in the CI? I don't understand why this didn't show up before.

@adrinjalali
Copy link
Copy Markdown
Member

The whole doc is not built in the CI unless you add [doc build] to a commit message, but they're built on main.

@StefanieSenger
Copy link
Copy Markdown
Member Author

Otherwise only the files I touched are build?

@adrinjalali
Copy link
Copy Markdown
Member

Yes, exactly. I should have remembered to ask you to submit that commit with the message, happens every now and then. All good, a follow up PR to fix is okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants