Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request:uplift trees #11818

Closed
exalate-issue-sync bot opened this issue May 12, 2023 · 10 comments
Closed

Feature Request:uplift trees #11818

exalate-issue-sync bot opened this issue May 12, 2023 · 10 comments

Comments

@exalate-issue-sync
Copy link

Request for new split criterion available for trees, and therefore for Distributed Random Forests and the Gradient Boosting Machine,

The split criterion has been originally proposed (uplift trees) by Guelman et al. (2013) and Rzepakowski & Jaroszewicz (2012) and also concisely described by Gutierrez & Gerardy (2017, http://proceedings.mlr.press/v67/gutierrez17a/gutierrez17a.pdf) at Equations 12 and 14. It is currently available in R in the upliftRF package (a package for a similar algorithm by Athey&Imbens is called causalTree).

In a few words, instead of learning splits based on Gini / information gain on the outcome P(Y) as in traditional decision trees, an uplift tree learn splits based on information gain on the difference of outcomes on two groups of users (test T and control C). Therefore, in addition to features X and outcome Y, an uplift tree takes as a input also a 'treatment' W=[T,C] used in the learning of the splits.

@exalate-issue-sync
Copy link
Author

Neema Mashayekhi commented: Related Support Ticket requesting uplift: [https://support.h2o.ai/a/tickets/90043|https://support.h2o.ai/a/tickets/90043]

Reference: R’s uplift RF implementation: [https://www.rdocumentation.org/packages/uplift/versions/0.3.5/topics/upliftRF|https://www.rdocumentation.org/packages/uplift/versions/0.3.5/topics/upliftRF] (it implements Random Forests with split criteria designed for binary uplift modeling tasks)

@exalate-issue-sync
Copy link
Author

Neema Mashayekhi commented: Recent support ticket requesting uplift split criterion for decision tree-based models:[https://support.h2o.ai/a/tickets/97307|https://support.h2o.ai/a/tickets/97307]

Attached are reference articles:

[^Radcliffe NJ, Surry PD 2011 - Real-World Uplift Modelling with Significance-Based Uplift Trees.pdf]
[^Rzepakowski, Jaroszewicz 2012 - Decision trees for uplift modeling with single and multiple treatments.pdf]

[^Gutierrez P, Gerardy JY 2016 - Causal Inference and Uplift Modeling A review of the literature.pdf]

Uber's implementation: [https://github.com/uber/causalml|https://github.com/uber/causalml|smart-link]

@exalate-issue-sync
Copy link
Author

Grigorios Fousas commented: I am very keen to help with this if you need help! I am very interested in Uplift modelling.

I have run several projects with uplift modelling and I have done my MSc dissertation on uplift modelling.

Also, N Radcliffe is a mentor and friend of mine (having a beer every now and then in Edinburgh) and P Surry an old colleague.

I have placed some more info here: [https://github.com/h2oai/dai-domain-solution-recipes/tree/master/uplift|https://github.com/h2oai/dai-domain-solution-recipes/tree/master/uplift|smart-link] , and I was planning to work on it more when I would have time.

Essentially, uplift modelling needs two things which makes it different from the traditional classification|regrassion modelling cases:

The capability to consume a control flag, which indicates if someone is in a control or treated group.

A different split criterion. In addition, to what it is mentioned above, I would suggest Qini a slit criterion, which the equivalent of Gini for Uplift modelling cases. This is described in the above Real-World Uplift Modelling with Significance-Based Uplift Trees paper.

@exalate-issue-sync
Copy link
Author

Neema Mashayekhi commented: Greg, this is a bit complex to solve on the H2O side. We plan to start it in Q4 and will reach out to you once we start. Thanks for the help!

@exalate-issue-sync
Copy link
Author

Grigorios Fousas commented: Thanks for the update [~accountid:5dc4f5bbb6e6b50c58af0624] !

I am uploading part on my dissertation with some theory on Uplift modelling and then how Portrait Miner, a software that is now almost dead, was approaching the Uplift modelling task. Portrait Miner is a child of Radcliffe and Surry and I can maybe dig it up from my old files and show you how it works if you are interested.

[^Dissertation chapters 2-3.pdf]

@exalate-issue-sync
Copy link
Author

Grigorios Fousas commented: Another great resource!

[https://github.com/uber/causalml|https://github.com/uber/causalml|smart-link] The most complete I have seen so far out there.

Thanks to [~accountid:5b8d534896cb052b5f659f47] who found it.

@exalate-issue-sync
Copy link
Author

Juan Telleria commented: Note that [Booking.com|http://Booking.com] has already released an uplift modeling Python Package, based on H2O-3 for performance:

[https://github.com/bookingcom/upliftml|https://github.com/bookingcom/upliftml|smart-link]

@exalate-issue-sync
Copy link
Author

Juan Telleria commented: The Python H2O-3 code can be found here: [https://github.com/bookingcom/upliftml/blob/main/upliftml/models/h2o.py|https://github.com/bookingcom/upliftml/blob/main/upliftml/models/h2o.py|smart-link]

@exalate-issue-sync
Copy link
Author

Veronika Maurerová commented: Uplift trees implemented via DRF algorithm, currently only for binomial classification and one treatment group. From metrics, the AUUC is available now.

More features will be implemented soon (more metrics, grid search, early stopping, etc.).

@hasithjp
Copy link
Member

JIRA Issue Migration Info

Jira Issue: PUBDEV-4940
Assignee: Veronika Maurerová
Reporter: Nidhi Mehta
State: Resolved
Fix Version: 3.36.0.1
Attachments: Available (Count: 4)
Development PRs: Available

Linked PRs from JIRA

#5546
#5547
#5565
#5576
#5170
#5224
#5918
#5919
#5927
#5968
#5620
#5624
#5681

Attachments From Jira

Attachment Name: Dissertation chapters 2-3.pdf
Attached By: Grigorios Fousas
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-4940/Dissertation chapters 2-3.pdf

Attachment Name: Gutierrez P, Gerardy JY 2016 - Causal Inference and Uplift Modeling A review of the literature.pdf
Attached By: Neema Mashayekhi
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-4940/Gutierrez P, Gerardy JY 2016 - Causal Inference and Uplift Modeling A review of the literature.pdf

Attachment Name: Radcliffe NJ, Surry PD 2011 - Real-World Uplift Modelling with Significance-Based Uplift Trees.pdf
Attached By: Neema Mashayekhi
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-4940/Radcliffe NJ, Surry PD 2011 - Real-World Uplift Modelling with Significance-Based Uplift Trees.pdf

Attachment Name: Rzepakowski, Jaroszewicz 2012 - Decision trees for uplift modeling with single and multiple treatments.pdf
Attached By: Neema Mashayekhi
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-4940/Rzepakowski, Jaroszewicz 2012 - Decision trees for uplift modeling with single and multiple treatments.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant