Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PUBDEV-6422 Add Python API support for integration of target encoding into AutoML #3760

Closed
wants to merge 56 commits into from

Conversation

deil87
Copy link
Contributor

@deil87 deil87 commented Jul 30, 2019

This PR was postponed until we are back on track for TE integration with AutoML. And we are not there yet. Full Mojo support comes first.

wendycwong and others added 30 commits March 22, 2019 10:03
PUBDEV-6319: Added Andrey Spiridonov's groupby test.
PUBDEV-6319: Removed printOutFrameAsTable to reduce runtime.
…ge because we still loosing an original order of rows.
…ng in h2o-automl module. Added test for BroadcastJoin (distributed case)
…ass to reduce complexity of AutoML class and to reduce coupling by introducing clear interface/dependencies.
…dded FixedTEParamsStrategy to specify single target encoding params for whole automl process.
…strategy and fixed target encoding parameters strategy
…nto AutoML. Fix creation of the zero vecs with incompatible layouts in TE.
…r evaluation during TE hyperparameters grid search.
…erboard frame. It is closer to our attempt to use sampling for hyper parameters search but seems that found parameters are not doing good during final scoring in automl.
… hp search. HP grid search's evaluations is performed based on the original data. Holdout.None strategy quite often hurts as we reduce amount of training data.
… sequence of models. Added benchmark that compares performance of thee first models with and without TE.
…equential Model-Based Optimization with DFR. Benchmarks for comparing results.
…or SMBO. Added theBiggerTheBetter parameter to support different metrics.
…ne per tree predictions from DRF in order to get variances for predictions. It should go to a separate PR if we decide to stay with DRF as a surrogate model for SMBO."

This reverts commit 67cbd03
… keys leakages in RGS TE params selection strategy.
…gy. All related files were moved into a separate branch for future investigation.
…rch over categorical columns to apply target encoding to.
…. Fixed ignored tests in TargetEncodingHyperparamsEvaluatorTest.
…tionStrategy instead of multiple parameters.
… is no columns to encode. Added ability to easily switch off TE in AutoML
@deil87 deil87 added the WIP label Jul 30, 2019
@deil87 deil87 requested a review from michalkurka July 30, 2019 10:27
@michalkurka
Copy link
Contributor

@deil87 please rebase & resolve conflicts

@mmalohlava
Copy link
Member

Test comment - can i reopen this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants