Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PUBDEV-4940 implement UpliftRandomForest #5224

Merged
merged 11 commits into from
Nov 30, 2021

Conversation

maurever
Copy link
Contributor

@maurever maurever commented Jan 12, 2021

JIRA:
https://h2oai.atlassian.net/jira/software/c/projects/PUBDEV/issues/PUBDEV-4940

Sources:

Existing implementation:

TODO:

  • implement uplift calculation to find the best split
  • implement AUUC metric calculation
  • test basic functionality
  • plot AUUC
  • implement gainsUplift table
  • check early stopping
  • improve AUUC calculation accuracy due to binning
  • test against UpliftRF and CausalML (predictions, AUUC metric calculation)
  • run performance test against DRF
  • run and check GBM benchmarks against this branch (done -> slower GBM and DRF for higher numbers in nbins parameter)
  • add smoke tests
  • separate uplift tree code from tree algos due to slowdown
  • documentation (all doc commits: 62c62ee)
  • python smoke test
  • improve AUUC plot - add the random curve, change naming
  • Add auuc_type to performance to get AUUC for unseen test data

In this PR:

  • disable cross-validation
  • disable weights
  • disable early stopping
  • grid search not implemented

Improvements for future PR:

  • add Q coeficient calculation
  • add new parameters: minimal sample size in control and treatment group
  • add cross-validation
  • add early stopping
  • add MOJO
  • optimize AUUC calculation
  • calculate AUUC exactly for small datasets?
  • calculate variable importances
  • implement grid search

Implementation problems:

  • saving treatment and control leaves into two trees instead of one, the reason is the new tree structure should be implemented for this special case of tree -> task for future improving the algorithm, for example when MOJO will be implementing

Uplift Random Forest API:

  • Uplift trees implementation is currently supported only for binomial classification.

  • API for Python:

    • H2OUpliftRandomForestEstimator : train a model
    • performance.auuc : get the default AUUC from the performance object
    • model.auuc : get the default AUUC from the model
    • performance.auuc_table : get all types of AUUCs from the performance object
    • model.auuc_table : get all types of AUUCs from the model
    • plot_uplift : plot uplift curve or returns values for x and y axis of the plot, you can set a metric type
    • make_metrics : make H2OBinomialUpliftMetrics metrics
  • API for R

    • h2o.upliftRandomForest : train a model
    • h2o.performance : returns H2OBinomialUpliftMetrics
      -h2o.auuc: get the default AUUC
    • h2o.auuc_table : get all types of AUUCs
    • plot.H2OBinomialUpliftMetrics : plot uplift curve or returns values for x and y-axis of the plot, you can set a metric type
    • h2o.make_metrics : make H2OBinomialUpliftMetrics metrics

Python Uplift curve:

uplift_curve_gain_python

uplift_curve_lift_python

uplift_curve_qini_python

R Uplift curve:

uplift_curve_gain_R

uplift_curve_lift_R

uplift_curve_qini_R

@maurever maurever self-assigned this Jan 12, 2021
@maurever maurever marked this pull request as draft January 12, 2021 14:25
@maurever maurever force-pushed the maurever_PUBDEV-4940_uplift_trees_poc branch from 5a27bf8 to 249c61a Compare February 16, 2021 13:02
@maurever maurever changed the title PUBDEV-4904 implement uplift into h2o structure PUBDEV-4940 implement uplift into h2o structure Mar 19, 2021
@valenad1 valenad1 self-requested a review May 19, 2021 14:39
@maurever maurever force-pushed the maurever_PUBDEV-4940_uplift_trees_poc branch 2 times, most recently from f02cf61 to df747d9 Compare June 17, 2021 15:48
@maurever maurever force-pushed the maurever_PUBDEV-4940_uplift_trees_poc branch from 6df75fd to 3ec1532 Compare June 21, 2021 09:01
@maurever maurever changed the title PUBDEV-4940 implement uplift into h2o structure PUBDEV-4940 implement uplift trees into h2o structure Jun 23, 2021
@maurever maurever force-pushed the maurever_PUBDEV-4940_uplift_trees_poc branch from 1efa255 to cf8e9c5 Compare June 28, 2021 14:14
@maurever maurever force-pushed the maurever_PUBDEV-4940_uplift_trees_poc branch from 2d03b1b to f003ea0 Compare August 10, 2021 14:20
@valenad1 valenad1 force-pushed the maurever_PUBDEV-4940_uplift_trees_poc branch from 7065013 to 98a8667 Compare August 31, 2021 15:24
@valenad1 valenad1 force-pushed the maurever_PUBDEV-4940_uplift_trees_poc branch from 98a8667 to 1affd8a Compare September 7, 2021 21:32
@maurever maurever force-pushed the maurever_PUBDEV-4940_uplift_trees_poc branch 13 times, most recently from 3f4a9e1 to 29292b0 Compare September 24, 2021 12:53
@maurever maurever force-pushed the maurever_PUBDEV-4940_uplift_trees_poc branch 4 times, most recently from b747f99 to 8df0e90 Compare October 5, 2021 10:40
@maurever maurever force-pushed the maurever_PUBDEV-4940_uplift_trees_poc branch from d83ee93 to c43940b Compare November 19, 2021 15:18
Copy link
Contributor

@michalkurka michalkurka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @maurever! Looks good to me - great start - further refactoring will be needed but this is good enough to go into master

@maurever maurever force-pushed the maurever_PUBDEV-4940_uplift_trees_poc branch from 3d2bd01 to 3ae93d3 Compare November 26, 2021 15:58
maurever and others added 10 commits November 29, 2021 15:35
- added documentation pages
- added params/algo to toctree; minor syntax updates, still need to add image to Uplift DRF; rst files only
- added image; toctree algo shift
- updated api comments
- add comma
- fix python example, it was failing on assertion error and uplift_model does not have plot_auuc() method
- fix test in uplift_metric, it failing on assert
- fix all available in
@maurever maurever force-pushed the maurever_PUBDEV-4940_uplift_trees_poc branch from 52186b7 to 5fd92a8 Compare November 29, 2021 14:35
Copy link
Contributor

@koniecsveta koniecsveta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👌 Thanks @maurever !

Copy link
Collaborator

@valenad1 valenad1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! 💯

@maurever maurever merged commit ade07e6 into master Nov 30, 2021
@maurever maurever deleted the maurever_PUBDEV-4940_uplift_trees_poc branch November 30, 2021 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants