-
-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sklearn.utils._param_validation.InvalidParameterError: The 'zero_division' parameter of precision_score must be a float among {0.0, 1.0, nan} or a str among {'warn'}. Got nan instead #27563
Comments
Your manual cross-validation does not use In [1]: from sklearn.metrics import precision_score, make_scorer
In [2]: from sklearn.ensemble import RandomForestClassifier
In [3]: import numpy as np
In [4]: from sklearn.datasets import make_classification
In [5]: X, y = make_classification(random_state=0)
In [6]: classifier = RandomForestClassifier(random_state=0).fit(X, y)
In [7]: scoring = make_scorer(precision_score, zero_division=np.nan)
In [8]: scoring(classifier, X, y)
Out[8]: 1.0
In [9]: scoring = make_scorer(precision_score, zero_division="nan")
In [9]: scoring(classifier, X, y)
---------------------------------------------------------------------------
InvalidParameterError Traceback (most recent call last)
Cell In[12], line 1
----> 1 scoring(classifier, X, y)
File ~/Documents/packages/scikit-learn/sklearn/metrics/_scorer.py:265, in _BaseScorer.__call__(self, estimator, X, y_true, sample_weight, **kwargs)
262 if sample_weight is not None:
263 _kwargs["sample_weight"] = sample_weight
--> 265 return self._score(partial(_cached_call, None), estimator, X, y_true, **_kwargs)
File ~/Documents/packages/scikit-learn/sklearn/metrics/_scorer.py:361, in _PredictScorer._score(self, method_caller, estimator, X, y_true, **kwargs)
359 y_pred = method_caller(estimator, "predict", X)
360 scoring_kwargs = {**self._kwargs, **kwargs}
--> 361 return self._sign * self._score_func(y_true, y_pred, **scoring_kwargs)
File ~/Documents/packages/scikit-learn/sklearn/utils/_param_validation.py:201, in validate_params.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
198 to_ignore += ["self", "cls"]
199 params = {k: v for k, v in params.arguments.items() if k not in to_ignore}
--> 201 validate_parameter_constraints(
202 parameter_constraints, params, caller_name=func.__qualname__
203 )
205 try:
206 with config_context(
207 skip_parameter_validation=(
208 prefer_skip_nested_validation or global_skip_validation
209 )
210 ):
File ~/Documents/packages/scikit-learn/sklearn/utils/_param_validation.py:95, in validate_parameter_constraints(parameter_constraints, params, caller_name)
89 else:
90 constraints_str = (
91 f"{', '.join([str(c) for c in constraints[:-1]])} or"
92 f" {constraints[-1]}"
93 )
---> 95 raise InvalidParameterError(
96 f"The {param_name!r} parameter of {caller_name} must be"
97 f" {constraints_str}. Got {param_val!r} instead."
98 )
InvalidParameterError: The 'zero_division' parameter of precision_score must be a float among {0.0, 1.0, nan} or a str among {'warn'}. Got 'nan' instead. So the problematic line is:
Could you give some information regarding what is the |
@jeremiedbb I was checking how do we check for Options(Real, {0.0, 1.0, np.nan}), I assume that it should be fine because we should have something like |
There's some strange behavior going on.
I actually used This one is a little more explicit: import pickle
from sklearn.metrics import precision_score
from sklearn.model_selection import cross_val_score
# Load in data
with open("sklearn_data.pkl", "rb") as f:
objects = pickle.load(f)
# > objects.keys()
# dict_keys(['estimator', 'X', 'y', 'scoring', 'cv', 'n_jobs'])
estimator = objects["estimator"]
X = objects["X"]
y = objects["y"]
scoring = objects["scoring"]
cv = objects["cv"]
n_jobs = objects["n_jobs"]
# > scoring
scoring = make_scorer(precision_score, pos_label="Case_0", zero_division=np.nan)
# > y.unique()
# ['Control', 'Case_0']
# Categories (2, object): ['Case_0', 'Control']
# First I checked to make sure that there are both classes in all the training and validation pairs
pos_label = "Case_0"
control_label = "Control"
for index_training, index_validation in cv:
assert y.iloc[index_training].nunique() == 2
assert y.iloc[index_validation].nunique() == 2
assert pos_label in y.values
assert control_label in y.values
# If I run manually using precision_score function
scores = list()
for index_training, index_validation in cv:
estimator.fit(X.iloc[index_training], y.iloc[index_training])
y_hat = estimator.predict(X.iloc[index_validation])
score = precision_score(y_true = y.iloc[index_validation], y_pred=y_hat, pos_label=pos_label)
scores.append(score)
# > print(np.mean(scores))
# 0.501156937317928
# Now using the scorer
scores = list()
for index_training, index_validation in cv:
estimator.fit(X.iloc[index_training], y.iloc[index_training])
y_hat = estimator.predict(X.iloc[index_validation])
# score = precision_score(y_true = y.iloc[index_validation], y_pred=y_hat, pos_label=pos_label)
score = scoring(estimator, X=X.iloc[index_validation], y_true=y.iloc[index_validation])
scores.append(score)
print(np.mean(scores))
# 0.501156937317928
# If I use cross_val_score:
cross_val_score(estimator=estimator, X=X, y=y, cv=cv, scoring=scoring, n_jobs=n_jobs)
# Users/jespinoz/anaconda3/envs/soothsayer_py3.9_env2/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:839: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
# Traceback (most recent call last):
# File "/Users/jespinoz/anaconda3/envs/soothsayer_py3.9_env2/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 136, in __call__
# score = scorer._score(
# File "/Users/jespinoz/anaconda3/envs/soothsayer_py3.9_env2/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 355, in _score
# return self._sign * self._score_func(y_true, y_pred, **scoring_kwargs)
# File "/Users/jespinoz/anaconda3/envs/soothsayer_py3.9_env2/lib/python3.9/site-packages/sklearn/utils/_param_validation.py", line 201, in wrapper
# validate_parameter_constraints(
# File "/Users/jespinoz/anaconda3/envs/soothsayer_py3.9_env2/lib/python3.9/site-packages/sklearn/utils/_param_validation.py", line 95, in validate_parameter_constraints
# raise InvalidParameterError(
# sklearn.utils._param_validation.InvalidParameterError: The 'zero_division' parameter of precision_score must be a float among {0.0, 1.0, nan} or a str among {'warn'}. Got nan instead.
...
[A bunch of these] |
OK so here is a minimum reproducer: # %%
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import precision_score, make_scorer
X, y = make_classification(weights=[0.3, 0.7], random_state=0)
X = pd.DataFrame(X, columns=[f'feature_{i}' for i in range(X.shape[1])])
y = pd.Series(y, name='target').apply(lambda x: 'class_1' if x == 1 else 'class_0')
classifier = DecisionTreeClassifier(max_depth=2, random_state=0).fit(X, y)
scoring = make_scorer(precision_score, pos_label='class_0', zero_division=np.nan)
print(scoring(classifier, X, y))
# %%
from sklearn.model_selection import cross_val_score
print(cross_val_score(classifier, X, y, scoring=scoring, n_jobs=2)) The culprit is We need to make the |
Describe the bug
I'm trying to use
precision_score
withnp.nan
for thezero_division
. It's not working withcross_val_score
but working when I do manual cross-validation with the same pairs.Steps/Code to Reproduce
Here's the data files to reproduce:
sklearn_data.pkl.zip
Expected Results
0.501156937317928
Actual Results
Versions
The text was updated successfully, but these errors were encountered: