Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Reapply the trained categorical columns when predicting #5246

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

johnpaulett
Copy link

Fixes #5244. During prediction, force any columns that were categorical during training to dtype category again. Useful when hosted via kserve and the user is sending a HTTP JSON POST that will not natively get translated to a categorical column in the DataFrame.

Initially tried coding this change in _data_from_pandas, but elected to pull it into a separate method that is only called by predict(). I'm open to any feedback or suggestion on how to better implement this change.

@ghost
Copy link

ghost commented May 27, 2022

CLA assistant check
All CLA requirements met.

@johnpaulett johnpaulett marked this pull request as draft May 27, 2022 21:50
@johnpaulett
Copy link
Author

johnpaulett commented May 29, 2022

This appears to not work when loading a saved model via model_file as the params are not read in. I'm looking at options.

@jameslamb pointed out the params issue is likely #2613 (#4802)

@jmoralez
Copy link
Collaborator

Hi @johnpaulett. We've merged a PR that loads the parameters from the model file, so now you can access categorical_feature from params, i.e.

bst = lgb.Booster(model_file='model.txt')
bst.params['categorical_feature']

Please let us know if you want to continue with this.

@johnpaulett
Copy link
Author

@jmoralez Wonderful -- let me look at rebasing and testing. I do think this would be valuable, as I currently maintain a fork of kserve's lgbserver docker image that side loads these features in.

@jameslamb
Copy link
Collaborator

thanks! Please use merge commits instead of rebasing, though, for the reasons described in #5252 (comment).

@spatiebalk
Copy link

spatiebalk commented Feb 3, 2023

Hi!

I was wondering what the progress is on this PR and whether it's still on the roadmap? As I'm running into the exact problem @johnpaulett described in the first post. And I'm not sure what a different workaround would look like if I want to keep the category dtypes and not do some category-integer mapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

predict() requires DataFrame to have category dtype, but should be able to infer which fields are categorical
4 participants