Implement support for missing values with XGBoost #481

styrmis · 2024-01-22T11:10:31Z

Inference for XGBoost models is implemented using the NaiveAdditiveDecisionTree implementation. As it is a DenseLtrRanker, it fills in missing values with 0. If the data that the XGBoost model was trained on contained missing values, then the scores produced in training may not match those in production, unless we similarly fill in missing values with 0.

This is related to #135, #353, and has been partly implemented in #452 (which was ultimately merged via #480). With this change we now visit the designated missing node when a score is missing, but we won't hit this branch as missing scores are filled in with 0 at inference time.

This issue proposes that we alter the implementation of NaiveAdditiveDecisionTree to not fill in missing values, given that the implementation now correctly follows the model specification when missing values are encountered.

In the meantime we have found that we can achieve parity in scoring between training and inference by filling in missing values with 0 in the training data.

The text was updated successfully, but these errors were encountered:

styrmis mentioned this issue Jan 22, 2024

Update - xgboost to handle missing values #480

Merged

patrick-le-shopify mentioned this issue Jan 24, 2024

Issue 481 - implement support for missing values with XGBoost #482

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement support for missing values with XGBoost #481

Implement support for missing values with XGBoost #481

styrmis commented Jan 22, 2024

Implement support for missing values with XGBoost #481

Implement support for missing values with XGBoost #481

Comments

styrmis commented Jan 22, 2024