# Predicting customer churn

Churn prediction--predicting whether a customer will stay or leave a company--is one of the more popular applications of machine learning for business, especially among consulting companies trying to sell their services.

Typically the performance of a churn classifier (0 for customer stays, 1 for customer leaves, i.e. churns) is evaluated by a standard metric such as accuracy, precision, recall or ROC-AUC. In real-life, these metrics can be misleading, as they do not reflect the costs and benefits of the different outcomes being summarized by a given metric.

In the case of churn, these costs and benefits can be made very explicit in terms of the classifier's confusion matrix, https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html.

\begin{equation*}
C =  
\begin{pmatrix}
\mathrm{true \, positives} &  \mathrm{false \, positives}  \\
\mathrm{false \, negatives} &  \mathrm{true \, negatives}
\end{pmatrix},
\end{equation*}

or, more generally, for a classifer with $n$ outcomes, the entries of the confusion matrix $C = (C_{ij})$ are the counts of observations known to be in class $i$ and predicted to be in class $j$.

To calculate a business-relevant metric, we need to know the cost for trying to retain a customer and the benefit of retaining a customer.

## Churn reward: simplest case

The first case we consider is for a single action of sending customers an email. The reward is the revenue from the customer over the next year minux expenses per customer. Let's make the assumptions more explicit, and flag the ones that are reasonable or not as an approximation of reality.

* action $a$ is defined by $a \in (0,1) \leftrightarrow (\mathrm{no\,email\,sent}, \mathrm{email\,sent})$ has a fixed cost for all customers (reasonable),
* there are no costs except the marketing action above (unreasonable)
* revenue $\mathrm{rev}$ is the same for all customers (unreasonable):

\begin{equation*}
\mathrm{rev} = \begin{cases}
0,  & \text{if customer churns} \\
\mathrm{rev}_1, & \text{if customer stays}
\end{cases}
\end{equation*}

## References

* Sutton and Bartol, Reinforcement Learning, Chapter 2
* ??? A Multi-Armed Bandit Approach for Online Expert
Selection in Markov Decision Processes, https://arxiv.org/pdf/1707.05714.pdf



In [None]:
import os
from pathlib import Path
import pickle

from pprint import pprint

import pandas as pd
import numpy as np

from sklearn.preprocessing import LabelBinarizer, StandardScaler, OneHotEncoder, LabelEncoder
from sklearn.preprocessing import FunctionTransformer


from sklearn_pandas import DataFrameMapper

import fake_data_for_learning
from fake_data_for_learning import SampleValue

from risk_learning.config import filenames

%matplotlib inline

## Churn data

In [None]:
df = pd.read_csv(filenames.fake_churn)
cols = ['year', 'gender', 'age', 'profession', 'action', 'churn']
df[cols].head()

In [None]:
def summarize_counts(data):
    '''
    Parameters
    ----------
    data : pandas.Series
        Data whose counts are to be summarized
    '''
    counts = data.value_counts()
    print(counts)
    counts.sort_index().plot(kind='bar')

In [None]:
summarize_counts(df['year'])

In [None]:
summarize_counts(df['gender'])

In [None]:
summarize_counts(df['age'])

In [None]:
summarize_counts(df['profession'])

In [None]:
summarize_counts(df['action'])

In [None]:
summarize_counts(df['churn'])

## Revenue per customer

In [None]:
r_gender = np.array([0.6, 0.4]).reshape(2,1,1)
r_age = np.array([0.5, 1, 0.75]).reshape(1,3,1)
r_profession = np.array([1.5, 1, 0.5, 0.7]).reshape(1,1,4)
revenue = 1000 * r_gender * r_age * r_profession
revenue

## Preprocess

To access elements in the revenue matrix, we use a label encoder on the customer data.

In [None]:
mapper = DataFrameMapper([
    ('gender', LabelEncoder()),
    ('age', LabelEncoder()),
    ('profession', LabelEncoder()),
    ('action', LabelEncoder())#id_trans)
])

data4revenue = pd.DataFrame(
    mapper.fit_transform(df),
    columns=mapper.transformed_names_
).astype(int)
data4revenue.head()

In [None]:
with open(filenames.churn_bn, 'rb') as f:
    churn_bn = pickle.load(f)
churn_rv = churn_bn.get_node('churn')

In [None]:
print(f"Have {df.shape[0]} customers. First action is to pick one for contact according to some policy.")
print("Take policy of contacting first customer in list.")

In [None]:
idx=4
record = df.to_dict(orient="records")[idx]
print('Current customer:')
customer = {key: record[key] for key in record.keys() if key in ['age', 'gender', 'profession']}
pprint(customer)

print('\nWith hidden nodes:')
#pprint(record)

parent_values = {key: SampleValue(record[key], churn_bn.get_node(key).label_encoder) for key in record.keys() if key in ['patience', 'thriftiness']}
# Assign action value to 1
parent_values['action'] = SampleValue(1)
#parent_values['action'] = SampleValue(0)
res = churn_rv.rvs(parent_values=parent_values, size=1)
#print(f'\nSimulated churn: {res}')
print(f'Mean of simulated churn: {np.mean(res)}')

In [None]:
# Calculate revenue for no-churn (retention)
r_idx = revenue[data4revenue.loc[idx, 'gender'], data4revenue.loc[idx, 'age'], data4revenue.loc[idx, 'profession']]
r_idx

In [None]:
reward_idx = (1-res[0]) * r_idx
reward_idx

## Churn as a multi-armed bandit

Customer churn can be modeled as one of the classic applications of reinforcement. The *multi-armed bandit* problem is a thought-experiment generalization of a casino slot machine ('one-armed bandit') to a slot-machine  with $n$-different arms. Each of the arms gives a reward $r_i$ with probability $p_i$, $i = 1, ..., n$. The task is to determine which of the $n$ arms maximizes your reward.

In the context of customer churn, the analogue of slot-machine 'arms' are the different combinations of customer features, of which there are

$n = |\textrm{age}| \times |\textrm{gender}| \times |\textrm{profession}| = 3 \times 2 \times 4 = 24$

The task is to find which of 24 customer types maximizes the reward.

### Greedy methods

Denote

* $a$: action, contact one of the $n$ customer types
* $q(a)$: true (i.e. mean) reward for action $a$

The first task is to estimate $q(a)$ given observed rewards $R_1(a), R_2(a), ...$ from action $a$. Set

* $R_1(a), \ldots, R_{N_t(a)}(a)$ to be the rewards of action $a$ up through time $N_t(a)$

so an estimate for $q(a)$ is the average

\begin{equation*}
Q_t(a) = \frac{R_1(a) + ... + R_{N_t(a)}(a)}{N_t(a)}
\end{equation*}

Note that, unlike the standard multi-armed bandit, the reward for churn must be calculated over all customers, whether contacted or not, as some customers will remain even if not contacted.

**Exercise** Show that the above equation for $Q_t(a)$ can be reformulated as follows, where $Q_k(a)$ is the average of first $k$ awards for choosing $a$,

\begin{equation*}
Q_{k+1} = Q_k + \frac{1}{k} \left[ R_k - Q_k \right]
\end{equation*}

In [None]:
# Case of contacting 1 customer per year / underwriting cycle

#customer_universe

from itertools import product
age_rv = churn_bn.get_node('age')
gender_rv = churn_bn.get_node('gender')
profession_rv = churn_bn.get_node('profession')
customer_types = product(*[age_rv.values, gender_rv.values, profession_rv.values])
for age, gender, profession in customer_types:
    print(age, gender, profession)
  

In [None]:
year = 2010
sample = df.loc[np.logical_and(df['year'] == year, df['churn'] == 0), :].reset_index(drop=True)
sample = data.groupby(['age', 'gender', 'profession']).first()
sample.reset_index(inplace=True)


action_idx = 0

rewards = []

for idx in range(sample.shape[0]):
    record = sample.to_dict(orient="records")[idx]
    print('Current customer:')
    customer = {key: record[key] for key in record.keys() if key in ['age', 'gender', 'profession']}
    pprint(customer)

#    print('\nWith hidden nodes:')
    #pprint(record)

    parent_values = {key: SampleValue(record[key], churn_bn.get_node(key).label_encoder) for key in record.keys() if key in ['patience', 'thriftiness']}
    # Assign action value to 1
    if idx == action_idx:
        parent_values['action'] = SampleValue(1)
    else:
        parent_values['action'] = SampleValue(0)
    res = churn_rv.rvs(parent_values=parent_values, size=1)
    
    
#    print(f'\nSimulated churn: {res}')
    
    r_idx = revenue[data4revenue.loc[idx, 'gender'], data4revenue.loc[idx, 'age'], data4revenue.loc[idx, 'profession']]
    reward_idx = (1-res[0]) * r_idx
#    print(reward_idx)
    rewards.append(reward_idx)
print(rewards)

**Question** Consider the customers from year 2012. What could go wrong in this year?