# Calculating Lift

### Introduction

In the previous lesson, we learned about the roc-auc metric.  One of the nice things about this metric is that it is a ranking metric.  The more our observations are properly ranked by score, the better our roc-auc curve performs.  However, let's say we want to call the 1000 people who are most likely to churn, or signup for a product.  What's important then, is how well we rank the top portion of our observations.  For this we can use the lift metric.

### Working with Lift

Let's say that we have a classifier that helps us by ranking job candidates for a sales position.

In [2]:
import pandas as pd
df = pd.read_csv('./mixed.csv', index_col = 0)
df

Unnamed: 0,score,target
0,0.8,1
1,0.75,1
2,0.6,0
3,0.5,1
4,0.3,0
5,0.05,0


The target represents whether the sales candidate ultimately received a job offer.  The idea is that we hope, with the top values, we have as many positive observations as possible.  In this scenario, let's say that we care about the top four ranked observations.

To calculate lift, we begin by calculating the share of 1s, out of the first k entries, here four.

$\hat{y}_{top_k}$

In [3]:
df.target[:4].values.mean()

0.75

Then we divide this value by the percentage of positive values throughout the dataset.

In [4]:
df.target.values.mean()

0.5

In [5]:
.75/.5

1.5

$Lift_k = \frac{\hat{y}_{top_k}}{\hat{y}}$

So this tells us that we'd have 1.5 times the number of positive observations, by using our model than if we just selected a random sample of our population.  That's it.

$Lift_4 = 1.5$

### Benefits of Lift?

One benefit of the lift metric is that it is interpretable to non-technical stakeholders.  Lift calculates the percentage of positive events found by using the model, versus the percentage of positive events throughout the dataset.

Also, notice that sometimes what we are looking to optimize is not how the classifier performs at ranking the entire dataset, but those observations ranked most highly.  This is the case when there are a limited number of positive observations a stakeholder can ultimately act on.

### Summary

In this lesson, we learned about the lift metric.  Lift is defined as the following:

$lift_k = \frac{\hat{y}_{top_k}}{\hat{y}}$

Lift calculates the percentage of positive events found by using the model, versus the percentage of positive events throughout the dataset.  Lift is beneficial for communicating the benefit of the model to stakeholders.  It also can be a valuable metric to optimize for, when there are a limited number of positive events to take action on.

### Resources

[Calculating a Lift Curve](https://towardsdatascience.com/the-lift-curve-unveiled-998851147871)