Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure meta-estimators are lenient towards missing data #15319

Open
adrinjalali opened this issue Oct 21, 2019 · 6 comments
Open

Make sure meta-estimators are lenient towards missing data #15319

adrinjalali opened this issue Oct 21, 2019 · 6 comments

Comments

@adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Oct 21, 2019

This item is in our roadmap, and I don't totally understand it. Trying to kinda track the progress of those items, I'm creating this issue. Not sure who wrote it. @amueller maybe you could elaborate? (feel free to edit the description of the issue).

@glemaitre

This comment has been minimized.

Copy link
Contributor

@glemaitre glemaitre commented Oct 22, 2019

I think that we developed some meta-estimators which would call check_X_y and not allowing missing-values. However, this check can be done by the underlying estimator instead.

Let's give an example: If BaggingClassifier with an HistGradientBoostingClassifier as a base estimator. The gradient boosting will natively manage missing values. If the BaggingClassifier already raises an error at fit, it would be wrong.

So the idea is to delay this checking.

@thomasjpfan thomasjpfan added the API label Oct 26, 2019
@jnothman jnothman added the Easy label Dec 4, 2019
@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Dec 4, 2019

This is also part of #9854

@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Dec 4, 2019

Also related: #12072

@jnothman jnothman added the help wanted label Dec 4, 2019
@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Dec 4, 2019

I think looking through our implementations to check which meta-estimators and ensembles enforce finiteness is pretty straightforward, so I've tagged this as such.

@NicolasHug

This comment has been minimized.

Copy link
Contributor

@NicolasHug NicolasHug commented Dec 4, 2019

IMHO, this is not an easy task from the perspective of new or inexperienced contributors (who are attracted by the help wanted tag).

Seeing something tagged as easy when it doesn't feel like it can be quite discouraging, and since our barrier of entry is already crazy high, I think we need to be careful when tagging issues as such

@jnothman jnothman added Moderate and removed Easy labels Dec 5, 2019
@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Dec 5, 2019

I usually consider "easy" to require a grade of familiarity higher than "good first issue"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
5 participants
You can’t perform that action at this time.