New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Stacking classifier cannot use Thresholder function - no .predict_proba #501
Comments
Interesting. We're using a custom class to check if an estimator is probabilistic (ie. if it has Just in case, could you share your machine info. Operating system, python version and package versions? You can use watermark for this. |
I think I have an idea on what's causing this. The Our thresholder checks in |
@MBrouns I'm not sure I fully understand, is the classifier not already fitted in line 14? Prefitting the final estimator doesn't seem to help either. Here's my basic machine info. Looks like I'm a version behind on both modules, so I'll try updating.
E. no change with 1.0.2 and 0.6.10 updates. |
There is a refit parameter that retrains the pipeline passed into the threshold but it's turned off by default. I'm reminded that we should really add a note on this behavior to the main docs page. @L-Marriott could you share the full stack trace as well? |
Sure, full trace below. I've also cleaned up the initial post as I'd confused some of my example code with my actual use case.
Enabling the refit parameter for Thresholder doesn't seem to resolve it though. Same trace given. |
AH @L-Marriott I see! There is indeed a bug with our refit behaviour I think. We check if Seems quite fixable though, we should only clone Just to check: @L-Marriott do you want to make a PR for this? |
It'd be good to also add a unit test that confirms it works for the stacking classifier. There's a nice unit test in this issue. |
I'll be honest, I'm a very amateur programmer. I'm out of my depth writing the tests for it. |
No worries. But you may appreciate calmcode.io |
…not clone the model
Unit test shows same error. I guess i need to slim it down a bit and add some asserts. Need to learn what the stacking classifier actually does |
I do not really get the refit tests Does something like this make sense for retest? def test_no_refit_does_not_fit_underlying():
X = np.array([1,2,3,4]).reshape(-1,1)
y_ones = np.array([0,1,1,1]).reshape(-1,)
y_zeros = np.array([0,0,0,1]).reshape(-1,)
clf = DummyClassifier(strategy="most_frequent")
clf.fit(X, y_ones)
a = Thresholder(clf, threshold=0.2, refit=False)
a.fit(X, y_zeros)
assert a.predict(np.array([[1]])) == 1 and corresponding def test_refit_fits_underlying():
X = np.array([1,2,3,4]).reshape(-1,1)
y_ones = np.array([0,1,1,1]).reshape(-1,)
y_zeros = np.array([0,0,0,1]).reshape(-1,)
clf = DummyClassifier(strategy="most_frequent")
clf.fit(X, y_ones)
a = Thresholder(clf, threshold=0.2, refit=True)
a.fit(X, y_zeros)
assert a.predict(np.array([[1]])) == 0 Why do we have to clone in the refit case? |
Do we also need to clone in _handle_refit in case we have refit = False and the underlying model was not fitted before (NotFittedError)? |
So there are 3 cases when calling fit or?
|
I took a look whether test_passes_sample_weight is the right test for it. But his parameterized test always has an underlying model that is unfitted, so at least the case of a fitted underlying and refit = True should mabye tested additionally |
Sorry for the delayed response; I've been dealing with a very narly flu this season. I see you've made the relevant changes to the PR, which looked good to me! |
The PR is now merged into the main branch. I'd like to give it another week of waiting to see if other PRs come in. If so, we may be able to batch together a few fixes into new releases on PyPI. @MarkusDegen, thanks for the PR! |
Description:
I'm able to use the thresholder on sklearn's voting classifer, but not on the stacking classifier. It throws this error, which I believe is in error. StackingClassifier does have predict_proba. Maybe I'm missunderstanding the use case, but this seems to fit.
ValueError: The Thresholder meta model only works on classifcation models with .predict_proba.
Code for reproduction (using the sklearn sample data for StackingClassifier):
Full trace:
The text was updated successfully, but these errors were encountered: