You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just tried to upgrade the package from version 0.24.2 to the latest release. Doing so, my integration tests would start to fail, claiming that there would not be enough samples for at least one class. This only occurs if I use string-based targets instead of integers.
As far as I have seen, there is no API change documented inside the changelog. Doing some testing, it seems like version 1.0 introduced the breaking change.
Both pipelines (once with integer targets, once with string targets) can be trained without issues.
Actual Results
Traceback (most recent call last):
File "/home/stefan/aaa/run.py", line 25, in <module>
pipeline.fit(
File "/home/stefan/aaa/venv/lib64/python3.9/site-packages/sklearn/base.py", line 1474, in wrapper
return fit_method(estimator, *args, **kwargs)
File "/home/stefan/aaa/venv/lib64/python3.9/site-packages/sklearn/pipeline.py", line 475, in fit
self._final_estimator.fit(Xt, y, **last_step_params["fit"])
File "/home/stefan/aaa/venv/lib64/python3.9/site-packages/sklearn/base.py", line 1474, in wrapper
return fit_method(estimator, *args, **kwargs)
File "/home/stefan/aaa/venv/lib64/python3.9/site-packages/sklearn/calibration.py", line 394, in fit
raise ValueError(
ValueError: Requesting 3-fold cross-validation but provided less than 3 examples for at least one class.
is the culprit as the array filtering does not seem to work with string-based lists anymore. This seems to indicate that scikit-learn would previously do some conversions for string targets beforehand which were dropped, as git blame does not show a change for this line.
A workaround is to convert my lists to a NumPy array with numpy.asarray() beforehand, but I still think that this is some breaking change/side-effect as integer-based lists continue to work as expected.
Thanks for the report @stefan6419846. We indeed used to convert y into a ndarray but this conversion is now delegated to the inner estimator.
This method for computing the number of occurrences for each class works if both are numpy arrays but fails when 1 is a list and elements are strings. I opened #28843 to fix it
Describe the bug
I just tried to upgrade the package from version 0.24.2 to the latest release. Doing so, my integration tests would start to fail, claiming that there would not be enough samples for at least one class. This only occurs if I use string-based targets instead of integers.
As far as I have seen, there is no API change documented inside the changelog. Doing some testing, it seems like version 1.0 introduced the breaking change.
Steps/Code to Reproduce
Expected Results
Both pipelines (once with integer targets, once with string targets) can be trained without issues.
Actual Results
Versions
Failing:
Working:
The text was updated successfully, but these errors were encountered: