Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is perfect uplift calculated differently for uplift curve and qini curve? #93

Open
steprandelli opened this issue May 1, 2021 · 2 comments

Comments

@steprandelli
Copy link

馃挕 Feature request

Hi! Perfect uplift is required to compute both perfect uplift curve and perfect qini curve. Why is the formula to generate the perfect uplift different? Does it make sense to unify the perfect uplift formula?

perfect uplift curve

cr_num = np.sum((y_true == 1) & (treatment == 0)) # Control   Responders
tn_num = np.sum((y_true == 0) & (treatment == 1))聽 # Treated Non-Responders
summand = y_true if cr_num > tn_num else treatment
perfect_uplift = 2 * (y_true == treatment) + summand

perfect qini curve
perfect_uplift = y_true * treatment - y_true * (1 - treatment)

@Irek21
Copy link

Irek21 commented Aug 2, 2022

Same question. I also don't understand the idea of counting perfect uplift in the perfect_uplift_curve, no descriptions anywhere

@maks-sh
Copy link
Owner

maks-sh commented Aug 9, 2022

@steprandelli @Irek21 Thanks for your question!

Recall that in the classical uplift problem we are dealing with vectors, target is the value of the target variable and treatment is the value of influence (communication in marketing, treatment in medicine, etc.), which are binary.

Thus, we have only 4 different classes that we need to sort correctly ((1, 1), (0, 0), (0, 1), (1, 0)).

In order to understand what an ideal curve should look like, you need to understand in what order you need to arrange these 4 classes (pairs). Obviously, by moving observations inside each of the classes, the value of the curve will not change.

Let's call the ideal curve the curve with the maximum area under it. So, you need to understand how to rank 4 classes so that the area under the curve is maximal.

In the code, you can find an implementation of how these classes should be sorted. I hope someday we will add a section about metrics, in which there will be material about ideal curves.

If you describe the proofs of sorting these classes in more detail, we will be happy to add it to the user guide.

Many thanks to @kirrlix1994 for consultations on the metrics issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants