## Presentations

In [5]:
import pandas as pd
import numpy as np
from pycaret.utils.generic import check_metric
from pycaret.classification import *

### The Predicted_Churn and Score columns are added onto the dataset where:

Predicted_Churn is the prediction (where 1 = churn, 0 = not churn)

Score is the probability of the prediction

In [6]:
data_predictions = pd.read_csv('data_predictions.csv')
data_predictions.rename(columns = {'Label': 'Predicted_Churn'}, inplace=True)
data_predictions.head()

Unnamed: 0,channel_sales,cons_12m,cons_gas_12m,cons_last_month,forecast_cons_12m,forecast_cons_year,forecast_discount_energy,forecast_meter_rent_12m,forecast_price_energy_off_peak,forecast_price_energy_peak,...,pow_max,price_off_peak_var,price_peak_var,price_mid_peak_var,price_off_peak_fix,price_peak_fix,price_mid_peak_fix,churn,prediction_label,prediction_score
0,foosdfpfkusacimwkcsosbicdxkicaua,8760,0,145,741.89,0,0.0,16.81,0.120372,0.0,...,13.2,0.131032,0.0,0.0,41.06397,0.0,0.0,0,0,1.0
1,foosdfpfkusacimwkcsosbicdxkicaua,16358,0,1029,1249.89,464,0.0,19.61,0.144038,0.08638,...,13.856,0.1476,0.085725,0.0,44.26693,0.0,0.0,0,0,1.0
2,foosdfpfkusacimwkcsosbicdxkicaua,10423,0,365,858.23,0,0.0,17.67,0.141434,0.0,...,13.2,0.144065,0.0,0.0,44.26693,0.0,0.0,0,0,1.0
3,usilxuppasemubllopkaafesmlibmsdf,904954,75074,82136,6125.98,5968,0.0,145.72,0.166178,0.10175,...,41.5,0.17059,0.107163,0.076311,44.44471,24.43733,16.291555,1,1,1.0
4,usilxuppasemubllopkaafesmlibmsdf,334821,22485,31128,4855.8,4464,0.0,143.88,0.164637,0.100572,...,31.5,0.168185,0.105842,0.075096,44.44471,24.43733,16.291555,1,1,1.0


In [18]:
data_predictions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 149753 entries, 0 to 149752
Data columns (total 29 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   channel_sales                   149753 non-null  object 
 1   cons_12m                        149753 non-null  int64  
 2   cons_gas_12m                    149753 non-null  int64  
 3   cons_last_month                 149753 non-null  int64  
 4   forecast_cons_12m               149753 non-null  float64
 5   forecast_cons_year              149753 non-null  int64  
 6   forecast_discount_energy        149753 non-null  float64
 7   forecast_meter_rent_12m         149753 non-null  float64
 8   forecast_price_energy_off_peak  149753 non-null  float64
 9   forecast_price_energy_peak      149753 non-null  float64
 10  forecast_price_pow_off_peak     149753 non-null  float64
 11  has_gas                         149753 non-null  object 
 12  imp_cons        

## Evaluate the performance of the model on the data

Below is an overview of the performance of the model on the full data. The model is perfect, it only misclassified 6 retained customers as churned, other than that it performs very well 99.9% of the time

In [10]:
pd.crosstab(data_predictions['churn'], data_predictions['prediction_label'])

prediction_label,0,1
churn,Unnamed: 1_level_1,Unnamed: 2_level_1
0,133630,6
1,0,16117


## Accuracy : 
    The model has a 99% accuracy.

    Accuracy = Predicted/total observed

In [12]:
check_metric(data_predictions['churn'], data_predictions['prediction_label'], metric = 'Accuracy')

1.0

### Precision
The ratio of correctly predicted positive observations to the total predicted positive observations. The model has a 99% precision

In [14]:
check_metric(data_predictions['churn'], data_predictions['prediction_label'], metric = 'Precision')

0.9996

### Recall (a.k.a Sensitivity)

The ratio of correctly predicted positive observations to the all observations in actual class

In [16]:
check_metric(data_predictions['churn'], data_predictions['prediction_label'], metric = 'Recall')

1.0

### Business Impact

The SME division head proposed that we give a 20% discount to high propensity-to-churn customers.

However we need a cut-off to implement this, for this study, i used 75%. In other words, we will offer the discount to customers with 75% or higher probability of churning.

In [19]:
data_predictions['revenue'] = data_predictions['forecast_cons_12m'] * data_predictions['forecast_price_energy_off_peak'] 
data_new = data_predictions[['churn','prediction_label', 'prediction_score', 'revenue']]

In [25]:
def churn_cutoff(df, cutoff=0.75, churn='prediction_label', score='prediction_score'):
    df = df[df[churn] == 1]
    df = df[df[score] >= cutoff]
    df = df.sort_values(by='prediction_score', ascending=False)
    df = df.reset_index(drop=True)
    return df

In [26]:
data_churn = churn_cutoff(data_new)
data_churn.shape

(16114, 4)

In [30]:
data_churn.head(10)

Unnamed: 0,churn,prediction_label,prediction_score,revenue
0,1,1,1.0,1018.003104
1,1,1,1.0,2556.426362
2,1,1,1.0,1954.888421
3,1,1,1.0,1108.583864
4,1,1,1.0,2206.867234
5,1,1,1.0,855.42359
6,1,1,1.0,1668.362531
7,1,1,1.0,962.135741
8,1,1,1.0,776.875905
9,1,1,1.0,1058.606577


In [31]:
print("Percentage of revenue spent on discount strategy for churning customers: ",
      round(data_churn['revenue'].sum()*0.2/data_predictions['revenue'].sum(),2)*100)

Percentage of revenue spent on discount strategy for churning customers:  9.0


#### The company will spend 9.0% of its revenue on offering a 20% discount to high-propensity-to-churn customers. This is not the best strategy as the company as the cost is too high