New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] Reapply the trained categorical columns when predicting #5246
base: master
Are you sure you want to change the base?
Conversation
This appears to not work when loading a saved model via @jameslamb pointed out the params issue is likely #2613 (#4802) |
Hi @johnpaulett. We've merged a PR that loads the parameters from the model file, so now you can access bst = lgb.Booster(model_file='model.txt')
bst.params['categorical_feature'] Please let us know if you want to continue with this. |
@jmoralez Wonderful -- let me look at rebasing and testing. I do think this would be valuable, as I currently maintain a fork of kserve's lgbserver docker image that side loads these features in. |
thanks! Please use merge commits instead of rebasing, though, for the reasons described in #5252 (comment). |
Hi! I was wondering what the progress is on this PR and whether it's still on the roadmap? As I'm running into the exact problem @johnpaulett described in the first post. And I'm not sure what a different workaround would look like if I want to keep the category dtypes and not do some category-integer mapping. |
Fixes #5244. During prediction, force any columns that were categorical during training to dtype
category
again. Useful when hosted via kserve and the user is sending a HTTP JSON POST that will not natively get translated to a categorical column in the DataFrame.Initially tried coding this change in
_data_from_pandas
, but elected to pull it into a separate method that is only called bypredict()
. I'm open to any feedback or suggestion on how to better implement this change.