Improve AutoML Target Encoding integration (auto mode) #7847

exalate-issue-sync · 2023-05-11T18:19:11Z

Ideas for improving the general performance of the basic Target Encoding integration in AutoML (currently turned off by default in AutoML, but activated by setting preprocessing = \["target_encoding] in the AutoML function).

Configure TE on a per-algorithm basis (XGBoost and non-XGB models to start, then tune each model separately). One suggestion was: apply only to categorical columns with cardinality >=10 for xgboost and >=25 for h2o tree algos
Consider not applying TE at all to DNN models (DNNs are able to find out interactions more easily than tree models. TE usually is bad for NNs since they probably overfit to the values TE provides instead of finding them using backprop.)
Different minimal cardinality threshold (when > N categories, turn on TE, otherwise leave it off)
Different upper cardinality threshold (when > N categories, drop original categorical column, otherwise keep original column in the training frame)

Our current approach is: Column is encoded if card >= 10 (hard limit) and nrows/card >= 10 (blending inflection point).

We also want to improve the user experience by offering customizable TE encodings (but for now its just a on/off switch to an auto-TE strategy). Ticket for that is here: [https://0xdata.atlassian.net/browse/PUBDEV-7803|https://0xdata.atlassian.net/browse/PUBDEV-7803|smart-link]

The text was updated successfully, but these errors were encountered:

h2o-ops · 2023-05-14T20:55:55Z

JIRA Issue Migration Info

Jira Issue: PUBDEV-7795
Assignee: Sebastien Poirier
Reporter: Erin LeDell
State: Open
Fix Version: Backlog
Attachments: N/A
Development PRs: N/A

exalate-issue-sync bot added the long_issue label May 11, 2023

h2o-ops added the fixVersion/Backlog label May 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve AutoML Target Encoding integration (auto mode) #7847

Improve AutoML Target Encoding integration (auto mode) #7847

exalate-issue-sync bot commented May 11, 2023

h2o-ops commented May 14, 2023

Improve AutoML Target Encoding integration (auto mode) #7847

Improve AutoML Target Encoding integration (auto mode) #7847

Comments

exalate-issue-sync bot commented May 11, 2023

h2o-ops commented May 14, 2023