Incorporate a use_missing argument #142

omsuchak · 2020-06-30T20:54:41Z

First, This is a lovely framework!

One suggestion: It would be very useful to expand the framework to accept sparse data/missing data items. LightGBM has incorporated this in their use_missing argument.

alejandroschuler · 2020-06-30T21:13:20Z

Hey @omsuchak, thanks for the suggestion. There is no one "natural" or good way to generically handle missing data. If ngboost were to do this for you, we would be making a number of choices behind the scenes that would be obscured from the user.

If we limited ourselves to use cases where the base learner is a regression tree (like we do with the feature importances) there are some reasonable default choices for what to do with missing data. Implementing those strategies here is probably not crazy hard to do but it's also not a trivial task. Either way, I'd want the user to have a transparent choice about what is going on. I'd be open to review pull requests on that front as they satisfy that requirement, but it's not something I plan on working on myself in the foreseeable future. I'll close for now but if anyone wants to try to add this please feel free to comment.

alejandroschuler · 2020-06-30T21:17:31Z

As a practical note that might help you- for prediction problems it's typically hard to beat some sort of imputation (e.g. column mean) + adding a missingness indicator feature per column. sklearn makes it easy. I'd recommend handling missing data in your feature matrix upfront as a pre-processing step using those tools before passing the data into ngboost.

As long as you apply the same ("trained") imputation strategy to your test set or future observations, you're not incurring any bias from doing this.

alejandroschuler closed this as completed Jun 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorporate a use_missing argument #142

Incorporate a use_missing argument #142

omsuchak commented Jun 30, 2020 •

edited

alejandroschuler commented Jun 30, 2020

alejandroschuler commented Jun 30, 2020 •

edited

Incorporate a use_missing argument #142

Incorporate a use_missing argument #142

Comments

omsuchak commented Jun 30, 2020 • edited

alejandroschuler commented Jun 30, 2020

alejandroschuler commented Jun 30, 2020 • edited

omsuchak commented Jun 30, 2020 •

edited

alejandroschuler commented Jun 30, 2020 •

edited