-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request:uplift trees #11818
Comments
Neema Mashayekhi commented: Related Support Ticket requesting uplift: [https://support.h2o.ai/a/tickets/90043|https://support.h2o.ai/a/tickets/90043] Reference: R’s uplift RF implementation: [https://www.rdocumentation.org/packages/uplift/versions/0.3.5/topics/upliftRF|https://www.rdocumentation.org/packages/uplift/versions/0.3.5/topics/upliftRF] (it implements Random Forests with split criteria designed for binary uplift modeling tasks) |
Neema Mashayekhi commented: Recent support ticket requesting uplift split criterion for decision tree-based models:[https://support.h2o.ai/a/tickets/97307|https://support.h2o.ai/a/tickets/97307] Attached are reference articles: [^Radcliffe NJ, Surry PD 2011 - Real-World Uplift Modelling with Significance-Based Uplift Trees.pdf] [^Gutierrez P, Gerardy JY 2016 - Causal Inference and Uplift Modeling A review of the literature.pdf] Uber's implementation: [https://github.com/uber/causalml|https://github.com/uber/causalml|smart-link] |
Grigorios Fousas commented: I am very keen to help with this if you need help! I am very interested in Uplift modelling. I have run several projects with uplift modelling and I have done my MSc dissertation on uplift modelling. Also, N Radcliffe is a mentor and friend of mine (having a beer every now and then in Edinburgh) and P Surry an old colleague. I have placed some more info here: [https://github.com/h2oai/dai-domain-solution-recipes/tree/master/uplift|https://github.com/h2oai/dai-domain-solution-recipes/tree/master/uplift|smart-link] , and I was planning to work on it more when I would have time. Essentially, uplift modelling needs two things which makes it different from the traditional classification|regrassion modelling cases: The capability to consume a control flag, which indicates if someone is in a control or treated group.A different split criterion. In addition, to what it is mentioned above, I would suggest Qini a slit criterion, which the equivalent of Gini for Uplift modelling cases. This is described in the above Real-World Uplift Modelling with Significance-Based Uplift Trees paper. |
Neema Mashayekhi commented: Greg, this is a bit complex to solve on the H2O side. We plan to start it in Q4 and will reach out to you once we start. Thanks for the help! |
Grigorios Fousas commented: Thanks for the update [~accountid:5dc4f5bbb6e6b50c58af0624] ! I am uploading part on my dissertation with some theory on Uplift modelling and then how Portrait Miner, a software that is now almost dead, was approaching the Uplift modelling task. Portrait Miner is a child of Radcliffe and Surry and I can maybe dig it up from my old files and show you how it works if you are interested. [^Dissertation chapters 2-3.pdf] |
Grigorios Fousas commented: Another great resource! [https://github.com/uber/causalml|https://github.com/uber/causalml|smart-link] The most complete I have seen so far out there. Thanks to [~accountid:5b8d534896cb052b5f659f47] who found it. |
Juan Telleria commented: Note that [Booking.com|http://Booking.com] has already released an uplift modeling Python Package, based on H2O-3 for performance: [https://github.com/bookingcom/upliftml|https://github.com/bookingcom/upliftml|smart-link] |
Juan Telleria commented: The Python H2O-3 code can be found here: [https://github.com/bookingcom/upliftml/blob/main/upliftml/models/h2o.py|https://github.com/bookingcom/upliftml/blob/main/upliftml/models/h2o.py|smart-link] |
Veronika Maurerová commented: Uplift trees implemented via DRF algorithm, currently only for binomial classification and one treatment group. From metrics, the AUUC is available now. More features will be implemented soon (more metrics, grid search, early stopping, etc.). |
JIRA Issue Migration Info Jira Issue: PUBDEV-4940 Linked PRs from JIRA #5546 Attachments From Jira Attachment Name: Dissertation chapters 2-3.pdf Attachment Name: Gutierrez P, Gerardy JY 2016 - Causal Inference and Uplift Modeling A review of the literature.pdf Attachment Name: Radcliffe NJ, Surry PD 2011 - Real-World Uplift Modelling with Significance-Based Uplift Trees.pdf Attachment Name: Rzepakowski, Jaroszewicz 2012 - Decision trees for uplift modeling with single and multiple treatments.pdf |
Request for new split criterion available for trees, and therefore for Distributed Random Forests and the Gradient Boosting Machine,
The split criterion has been originally proposed (uplift trees) by Guelman et al. (2013) and Rzepakowski & Jaroszewicz (2012) and also concisely described by Gutierrez & Gerardy (2017, http://proceedings.mlr.press/v67/gutierrez17a/gutierrez17a.pdf) at Equations 12 and 14. It is currently available in R in the upliftRF package (a package for a similar algorithm by Athey&Imbens is called causalTree).
In a few words, instead of learning splits based on Gini / information gain on the outcome P(Y) as in traditional decision trees, an uplift tree learn splits based on information gain on the difference of outcomes on two groups of users (test T and control C). Therefore, in addition to features X and outcome Y, an uplift tree takes as a input also a 'treatment' W=[T,C] used in the learning of the splits.
The text was updated successfully, but these errors were encountered: