-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorporate a use_missing argument #142
Comments
Hey @omsuchak, thanks for the suggestion. There is no one "natural" or good way to generically handle missing data. If ngboost were to do this for you, we would be making a number of choices behind the scenes that would be obscured from the user. If we limited ourselves to use cases where the base learner is a regression tree (like we do with the feature importances) there are some reasonable default choices for what to do with missing data. Implementing those strategies here is probably not crazy hard to do but it's also not a trivial task. Either way, I'd want the user to have a transparent choice about what is going on. I'd be open to review pull requests on that front as they satisfy that requirement, but it's not something I plan on working on myself in the foreseeable future. I'll close for now but if anyone wants to try to add this please feel free to comment. |
As a practical note that might help you- for prediction problems it's typically hard to beat some sort of imputation (e.g. column mean) + adding a missingness indicator feature per column. As long as you apply the same ("trained") imputation strategy to your test set or future observations, you're not incurring any bias from doing this. |
First, This is a lovely framework!
One suggestion: It would be very useful to expand the framework to accept sparse data/missing data items. LightGBM has incorporated this in their use_missing argument.
The text was updated successfully, but these errors were encountered: