Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for missing values with XGBoost #481

Open
styrmis opened this issue Jan 22, 2024 · 0 comments
Open

Implement support for missing values with XGBoost #481

styrmis opened this issue Jan 22, 2024 · 0 comments

Comments

@styrmis
Copy link
Contributor

styrmis commented Jan 22, 2024

Inference for XGBoost models is implemented using the NaiveAdditiveDecisionTree implementation. As it is a DenseLtrRanker, it fills in missing values with 0. If the data that the XGBoost model was trained on contained missing values, then the scores produced in training may not match those in production, unless we similarly fill in missing values with 0.

This is related to #135, #353, and has been partly implemented in #452 (which was ultimately merged via #480). With this change we now visit the designated missing node when a score is missing, but we won't hit this branch as missing scores are filled in with 0 at inference time.

This issue proposes that we alter the implementation of NaiveAdditiveDecisionTree to not fill in missing values, given that the implementation now correctly follows the model specification when missing values are encountered.

In the meantime we have found that we can achieve parity in scoring between training and inference by filling in missing values with 0 in the training data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant