[python] Reapply the trained categorical columns when predicting #5246

johnpaulett · 2022-05-27T18:39:51Z

Fixes #5244. During prediction, force any columns that were categorical during training to dtype category again. Useful when hosted via kserve and the user is sending a HTTP JSON POST that will not natively get translated to a categorical column in the DataFrame.

Initially tried coding this change in _data_from_pandas, but elected to pull it into a separate method that is only called by predict(). I'm open to any feedback or suggestion on how to better implement this change.

ghost · 2022-05-27T18:40:03Z

All CLA requirements met.

johnpaulett · 2022-05-29T11:33:26Z

This appears to not work when loading a saved model via model_file as the params are not read in. I'm looking at options.

@jameslamb pointed out the params issue is likely #2613 (#4802)

jmoralez · 2022-10-11T19:35:36Z

Hi @johnpaulett. We've merged a PR that loads the parameters from the model file, so now you can access categorical_feature from params, i.e.

bst = lgb.Booster(model_file='model.txt')
bst.params['categorical_feature']

Please let us know if you want to continue with this.

johnpaulett · 2022-10-12T21:25:00Z

@jmoralez Wonderful -- let me look at rebasing and testing. I do think this would be valuable, as I currently maintain a fork of kserve's lgbserver docker image that side loads these features in.

jameslamb · 2022-10-12T21:29:59Z

thanks! Please use merge commits instead of rebasing, though, for the reasons described in #5252 (comment).

spatiebalk · 2023-02-03T09:34:14Z

Hi!

I was wondering what the progress is on this PR and whether it's still on the roadmap? As I'm running into the exact problem @johnpaulett described in the first post. And I'm not sure what a different workaround would look like if I want to keep the category dtypes and not do some category-integer mapping.

[python] Reapply the trained categorical columns when predicting

cfa11cf

johnpaulett requested review from StrikerRUS, shiyu1994, jameslamb and jmoralez as code owners May 27, 2022 18:39

johnpaulett marked this pull request as draft May 27, 2022 21:50

johnpaulett mentioned this pull request May 30, 2022

predict() requires DataFrame to have category dtype, but should be able to infer which fields are categorical #5244

Open

johnpaulett added a commit to johnpaulett/kserve that referenced this pull request May 31, 2022

Multistage build to include patch from microsoft/LightGBM#5246

2e89317

jmoralez mentioned this pull request Aug 16, 2022

[python-package][R-package] load parameters from model file (fixes #2613) #5424

Merged

jameslamb added in progress feature labels Oct 12, 2022

jmoralez mentioned this pull request Sep 7, 2023

[Docs] 4.0.0 parameters missing from model.txt #6010

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] Reapply the trained categorical columns when predicting #5246

[python] Reapply the trained categorical columns when predicting #5246

johnpaulett commented May 27, 2022

ghost commented May 27, 2022 •

edited by ghost

johnpaulett commented May 29, 2022 •

edited

jmoralez commented Oct 11, 2022

johnpaulett commented Oct 12, 2022

jameslamb commented Oct 12, 2022

spatiebalk commented Feb 3, 2023 •

edited

[python] Reapply the trained categorical columns when predicting #5246

Are you sure you want to change the base?

[python] Reapply the trained categorical columns when predicting #5246

Conversation

johnpaulett commented May 27, 2022

ghost commented May 27, 2022 • edited by ghost

johnpaulett commented May 29, 2022 • edited

jmoralez commented Oct 11, 2022

johnpaulett commented Oct 12, 2022

jameslamb commented Oct 12, 2022

spatiebalk commented Feb 3, 2023 • edited

ghost commented May 27, 2022 •

edited by ghost

johnpaulett commented May 29, 2022 •

edited

spatiebalk commented Feb 3, 2023 •

edited