ENH Specify categorical features with feature names in HGBDT #24889

ogrisel · 2022-11-10T20:00:32Z

Similar to #24855 but for the categorical_features parameter as stated in #24852 (comment).

Note that this works well with the .set_output("pandas") of this release.

However it requires disabling the verbose column names of the column transformer :)

Note: in the future we might directly inspect dataframe column dtypes in HGBDT and have an "auto" mode to trigger native categorical support for explicitly encoded categorical dtyped columns, but this will be the topic for a later PR.

doc/whats_new/v1.2.rst

sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

examples/ensemble/plot_gradient_boosting_categorical.py

examples/applications/plot_cyclical_feature_engineering.py

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

glemaitre

LGTM

lorentzenchr

LGTM, only some questions.
This will improve user friendliness soooo much!!!

examples/applications/plot_cyclical_feature_engineering.py

examples/ensemble/plot_gradient_boosting_categorical.py

lorentzenchr · 2022-11-12T10:20:50Z

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

+                    "on data without feature names."
+                )
+            is_categorical = np.zeros(n_features, dtype=bool)
+            feature_names = self.feature_names_in_.tolist()


Is this conversion to a list necessary?

Arrays do not have the index method. Not sure how to implement this while staying in numpy and making it easy to raise the error message timely at the same time.

Also, the feature names list should never be to long (few hundred values) for HGBDT models in practice because those models tend to perform poorly when n_features >> n_samples.

Thanks for the explanation.

lorentzenchr

Final adjustments.

examples/applications/plot_cyclical_feature_engineering.py

examples/ensemble/plot_gradient_boosting_categorical.py

ogrisel · 2022-11-13T14:04:01Z

Thank for the final fixes @lorentzenchr!

ogrisel added 2 commits November 10, 2022 20:54

Allow categorical_features to be passed as feature names

2b2d894

Update relevant examples

fc0d51a

ogrisel added the Quick Review For PRs that are quick to review label Nov 10, 2022

ogrisel added this to the 1.2 milestone Nov 10, 2022

github-actions bot added the module:ensemble label Nov 10, 2022

Changelog

28e3397

glemaitre reviewed Nov 10, 2022

View reviewed changes

doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved

glemaitre reviewed Nov 10, 2022

View reviewed changes

sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py Outdated Show resolved Hide resolved

glemaitre reviewed Nov 10, 2022

View reviewed changes

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py Show resolved Hide resolved

glemaitre changed the title ~~Specify categorical feaature with feature names in HGBDT~~ ENH Specify categorical feaature with feature names in HGBDT Nov 10, 2022

thomasjpfan reviewed Nov 10, 2022

View reviewed changes

examples/ensemble/plot_gradient_boosting_categorical.py Outdated Show resolved Hide resolved

examples/applications/plot_cyclical_feature_engineering.py Outdated Show resolved Hide resolved

thomasjpfan mentioned this pull request Nov 10, 2022

FIX Make sure that set_output is keyword only everywhere #24890

Merged

ogrisel and others added 5 commits November 11, 2022 06:34

Apply suggestions from code review

1f5b46f

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Apply suggestions from code review

3bdcbb6

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Call .set_output('pandas') on the pipelines

f9c3401

.set_output(transform='pandas') instead of .set_output('pandas')

a4f9a3f

Make error message deterministic

ec7337a

lorentzenchr changed the title ~~ENH Specify categorical feaature with feature names in HGBDT~~ ENH Specify categorical feature with feature names in HGBDT Nov 11, 2022

lorentzenchr changed the title ~~ENH Specify categorical feature with feature names in HGBDT~~ ENH Specify categorical features with feature names in HGBDT Nov 11, 2022

glemaitre approved these changes Nov 12, 2022

View reviewed changes

lorentzenchr approved these changes Nov 12, 2022

View reviewed changes

Insert comment to explain verbose_feature_names_out

01d390d

lorentzenchr approved these changes Nov 13, 2022

View reviewed changes

examples/applications/plot_cyclical_feature_engineering.py Outdated Show resolved Hide resolved

examples/ensemble/plot_gradient_boosting_categorical.py Outdated Show resolved Hide resolved

lorentzenchr added 2 commits November 13, 2022 13:01

Update examples/applications/plot_cyclical_feature_engineering.py

c21a310

Update examples/ensemble/plot_gradient_boosting_categorical.py

6241c47

lorentzenchr merged commit 84a7a7a into scikit-learn:main Nov 13, 2022

ogrisel deleted the categorical_feaature-feature_names_in_ branch November 13, 2022 14:03

This was referenced Nov 13, 2022

ENH Automatic handling of categorical columns in Hist Gradient Boosting models #24907

Closed

Make it possible to specify interaction_cst and monotonic_cst with feature names. #24852

Closed

ArturoAmorQ mentioned this pull request Aug 1, 2025

DOC Improve wording in Categorical Feature support example #31864

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH Specify categorical features with feature names in HGBDT #24889

ENH Specify categorical features with feature names in HGBDT #24889

ogrisel commented Nov 10, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre left a comment

Uh oh!

lorentzenchr left a comment

Uh oh!

Uh oh!

Uh oh!

lorentzenchr Nov 12, 2022

Uh oh!

ogrisel Nov 13, 2022 •

edited

Loading

Uh oh!

ogrisel Nov 13, 2022 •

edited

Loading

Uh oh!

lorentzenchr Nov 13, 2022

Uh oh!

lorentzenchr left a comment

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Nov 13, 2022

Uh oh!

Uh oh!

Uh oh!

ENH Specify categorical features with feature names in HGBDT #24889

ENH Specify categorical features with feature names in HGBDT #24889

Conversation

ogrisel commented Nov 10, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lorentzenchr Nov 12, 2022

Choose a reason for hiding this comment

Uh oh!

ogrisel Nov 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Nov 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Nov 13, 2022

Choose a reason for hiding this comment

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Nov 13, 2022

Uh oh!

Uh oh!

ogrisel Nov 13, 2022 •

edited

Loading

ogrisel Nov 13, 2022 •

edited

Loading