Skip to content

Loading…

ConstantPredictor in OVR #3386

Closed
vene opened this Issue · 5 comments

2 participants

@vene
scikit-learn member

I just noticed that the one vs. rest implementation does an optimization whereby a class that is present in all training examples will not get a fitted estimator, but just a dummy _ConstantPredictor that always returns it.

This assumes that any sensible estimator will always classify instances as belonging to that class. This is not true for the DummyClassifier(strategy='uniform') which is often useful. I think this is a bug.

@ogrisel suggests having a parameter to explicitly turn off (or on) this optimization.

Ping @amueller (blamed by git for the _ConstantPredictor), @arjoly and @hamsal (because of actively working on OvR)

@jnothman
scikit-learn member
@vene
scikit-learn member

Surely even the uniform dummy, when fit with data with only one label will predict only that label (uniformly!).

This might be the case, but should it? For OvR each class should get a binary classifier. In a very imbalanced setting you might not be able to form a cv fold with positive instances of a given class, but you still might want to check how a ~0.5 random sampling classifier would do, right?

@jnothman
scikit-learn member
@vene
scikit-learn member

Good point, I guess if this happens in practice, there will be at least 2 warnings (one from the cross-validation and one from the ovr.) We know that users ignore warnings though.

Closing this after checking with @ogrisel too.

@vene vene closed this
@vene
scikit-learn member

A dummy uniform binary classifier would still be useful, though, but it seems to be an edge case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.