In [None]:
import pandas as pd
telco=pd.read_csv('../input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv',skipinitialspace=True)
telco.head()

 - #### Examples Y (target variable) of supervised learning on marketing are:
 - Which customers will churn?[Classification]
 - Which customers will buy again?[Classification]
 - How much will customers spend in the next 30 days?[Regression]
 - #### Examples X (feature variable) of supervised learning on marketing are:
 - Purchase patterns prior churning
 - Number of missed loan payments prior defaulting on a loan

Data format for supervised learning is 
- X by N+1 matris:
    * X number of observations (customer, vendor, product)
    * N+1 number of columns (N features + 1 target variable)
- Note: In unsupervised learning techniques, dependent variable (target) is not used.

## Preparation for Modeling

In [None]:
telco.dtypes

### A) Categorical and Numberical Colums Splitting

In [None]:
#Identifier and target variable
customer_id=['customerID']
target=['Churn']
#Split categorical and numerical column NAMES as lists
categories=telco.nunique()[telco.nunique()<10].keys().tolist()
categories.remove(target[0])

In [None]:
numerical=[col for col in telco.columns if col not in customer_id+target+categories]

In [None]:
print(numerical)
print(categories)

### B) One-Hot Encoding

In [None]:
telco_raw=pd.get_dummies(data=telco,columns=categories,drop_first=True)
telco_raw.iloc[:5,5:]

### C) Scaling Numerical Features

In [None]:
# Before scaling lets check types if we have an object scaler will not work
telco[numerical].dtypes

In [None]:
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
scaled_numerical=scaler.fit_transform(telco[numerical])

In [None]:
scaled_numerical=pd.DataFrame(scaled_numerical,columns=numerical)
print(telco[numerical])
print(scaled_numerical)

### Final..Bringing all together

In [None]:
telco_raw=telco_raw.drop(columns=numerical,axis=1)
# Merge with scaled numerical data on left
telco=telco_raw.merge(right=scaled_numerical,how='left',left_index=True,right_index=True)
telco.head()

In [None]:
telco=telco.dropna()
telco.Churn=telco.Churn.replace({'No':0,'Yes':1})
X = telco.drop(columns={'customerID','Churn'})
Y = telco['Churn']

# MACHINE LEARNING PART

### A) Supervised Learning-Decision Tree

In [None]:
from sklearn import tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
train_X,test_X,train_Y,test_Y=train_test_split(X,Y,test_size=0.25)
# [0] is for dimension of row number 
print(train_X.shape[0]/X.shape[0])
print(test_X.shape[0]/X.shape[0])

In [None]:
mytree = tree.DecisionTreeClassifier(max_depth=5)
treemodel = mytree.fit(train_X, train_Y)
pred_Y = treemodel.predict(test_X)
pred_Y_ = treemodel.predict(train_X)
import numpy as np
print('Training accuracy:',np.round(accuracy_score(train_Y, pred_Y_),3))
print('Test accuracy:',np.round(accuracy_score(test_Y, pred_Y),3))

#### Gini Index as impurity criterion

In [None]:
mytree = tree.DecisionTreeClassifier(max_depth=7,criterion='gini',splitter='best')
treemodel = mytree.fit(train_X, train_Y)
pred_Y = treemodel.predict(test_X)
pred_Y_ = treemodel.predict(train_X)
print('Gini Training accuracy:',np.round(accuracy_score(train_Y, pred_Y_),3))
print('Gini Test accuracy:',np.round(accuracy_score(test_Y, pred_Y),3))

### B) Unsupervised Learning

In [None]:
telco=telco.drop(columns='customerID')
from sklearn.cluster import KMeans
kmeans=KMeans(n_clusters=3)
kmeans.fit(telco)
telco=telco.assign(Cluster=kmeans.labels_)
telco.groupby('Cluster').mean()

In [None]:
telco.groupby('Cluster').size()

# What is churn?
When buying and engaging stops by a customer churn happens. There is two main churn typology.
- Constractual
   - Customers decide to cancel subscriptions or services.
- Non-constractual
   - Happens in settings like grocery or online shopping. Customers just stop buying or using products without contract. It's more harder to track.

"In this part, we'll discover telecommunications company under one master agreement which defices whether customer is still active, or has churned, which means they have terminated their contract."

In [None]:
set(telco['Churn'])


In [None]:
telco.groupby(['Churn']).size()/telco.shape[0]*100

In this case, the minority class is higher than 5% so this is not savere case of imbalanced datasets. Otherwise, we could increase the minority class of decrease the majority class with oversampling or undersampling techniques.

### A) Predict Churn with LogReg
Logreg is
- Statistical classification model for binary outcomes
- Computes the probability of the target. 
- Odds=P(A)/P'(A) or P(A)/1-P(A)
   - If P(A)=75%, odds ratio=3, log-odd is roughly 0.
- LogReg helps to find the decision boundary btw 2 classes while keeping the coefficients linearly related to the target variable. 

Key Metrics:
- #### Accuracy
% of correctly predicted labels-1 and 0.
- #### TP+TN/TP+TN+FP+FN
- #### Precision
% of total positive class predictions (here is 1) that are correctly classified.
- #### TP/TP+FP
- #### Recall
% of total positive class samples (all 1's) that were correctly classified.
- #### TP/TP+FN

In [None]:
from sklearn.linear_model import LogisticRegression
logreg=LogisticRegression()
logreg.fit(train_X,train_Y)
pred_train_Y=logreg.predict(train_X)
pred_test_Y=logreg.predict(test_X)
train_accuracy=accuracy_score(train_Y,pred_train_Y)
test_accuracy=accuracy_score(test_Y,pred_test_Y)
print('Training accuracy:', round(train_accuracy,4))
print('Test accuracy:', round(test_accuracy,4))

In [None]:
from sklearn.metrics import precision_score,recall_score
train_precision=round(precision_score(train_Y,pred_train_Y),4)
test_precision=round(precision_score(test_Y,pred_test_Y),4)
train_recall=round(recall_score(train_Y,pred_train_Y,average="binary"),4)
test_recall=round(recall_score(test_Y,pred_test_Y,average="binary"),4)

In [None]:
print('Train precision is:',train_precision)
print('Test precision is:',test_precision)
print('Train recall is:',train_recall)
print('Test recall is:',test_recall)

Regularization & Parameter Tuning
Some regularization techniques keeps all the features, but reduce magnitude/values of parameters to avoid overfitting. LogisticRegression from sklearn already is this type and performs L2 regularization by default. You can call L1 or LASSO explicitly. L1 performs feature selection by shrinking some of the beta parameters to zero.

In [None]:
C=[1,.5,.25,.1,.05,.025,.01,.005,.0025]
l1_metrics=np.zeros((len(C),5))
l1_metrics[:,0]=C

In [None]:
l1_metrics

In [None]:
# C tuning
for index in range (0, len(C)):
    logreg=LogisticRegression(penalty='l1', C=C[index], solver='liblinear')
    logreg.fit(train_X,train_Y)
    pred_test_Y=logreg.predict(test_X)
    l1_metrics[index,1]=np.count_nonzero(logreg.coef_)
    l1_metrics[index,2]=accuracy_score(test_Y,pred_test_Y)
    l1_metrics[index,3]=precision_score(test_Y,pred_test_Y)
    l1_metrics[index,4]=recall_score(test_Y,pred_test_Y)
col_names=['C','Non-ZeroCoeffs','Accuracy','Precision','Recall']
print(pd.DataFrame(l1_metrics, columns=col_names))

In [None]:
logreg=LogisticRegression(penalty='l1', C=.025, solver='liblinear')
logreg.fit(train_X,train_Y)
pred_test_Y=logreg.predict(test_X)

### Tree depth parameter tuning

In [None]:
import numpy as np
depth_list=list(range(2,15))
depth_tuning=np.zeros((len(depth_list),4))
# Storing depth candidates into the first column
depth_tuning[:,0]=depth_list

In [None]:
depth_tuning

In [None]:
from sklearn.tree import DecisionTreeClassifier
mytree =DecisionTreeClassifier()
treemodel=mytree.fit(train_X,train_Y)

In [None]:
for index in range(len(depth_list)):
    mytree=DecisionTreeClassifier(max_depth=depth_list[index])
    mytree.fit(train_X,train_Y)
    pred_test_Y=mytree.predict(test_X)
    depth_tuning[index,1]=accuracy_score(test_Y,pred_test_Y)
    depth_tuning[index,2]=precision_score(test_Y,pred_test_Y)
    depth_tuning[index,3]=recall_score(test_Y,pred_test_Y)
col_names=['Max_Depth','Accuracy','Precision','Recall']
print(pd.DataFrame(depth_tuning,columns=col_names))

Testing accuracy first increases with more depth and then start decline.The precision declines with more depth, yet the recall increases first, then starts falling.
At max_depth of 4, tree solution produces good scores and pretty high recall before it starts declining.

### The Interpretion of Churn Drivers

In [None]:
mytree =DecisionTreeClassifier(max_depth=4)
treemodel=mytree.fit(train_X,train_Y)
cols=train_X.columns

####  Displaying Decision Tree

In [None]:
import graphviz
exported=tree.export_graphviz(
decision_tree=mytree,
out_file=None,
feature_names=cols,
precision=1,
class_names=['Not churn','churn'],
filled=True)
graph=graphviz.Source(exported)
display(graph)
graph.format = 'png'
graph.render('dtree_render',view=True)

As you can see customer tenure is the most important variable while predicting churn. You should update mytree on max_depth=4

### Logreg coefficients

In [None]:
logreg.coef_

List of beta coefficients are difficult to interpret since there are in log scale. Solution is to calculate their exponents .

In [None]:
# axis=1 means along the columns, axis=0 means along the rows.
coeffs=pd.concat([pd.DataFrame(train_X.columns),
                 pd.DataFrame(np.transpose(logreg.coef_))],
                axis=1)
coeffs.columns=['Feature','Coefficient']
coeffs['Exp_Coefficient']=np.exp(coeffs['Coefficient'])
coeffs=coeffs[coeffs['Coefficient']!=0]
coeffs.sort_values(by=['Coefficient'])

This is consistent with our findings from decision tree. In addition, we can intepret effect of odds as follows:
the effect of one additional year of tenure decreases the odds of churn by 1-0.429. This roughly 60% decrease in the churn odds.

# Customer Lifetime
It's measurement of how much a company expects to earn from an average customer in a lifetime.


### a) Historical CLV
Historical cvl does not account for:
- Customer tenure
- Retention
- Churn Rates

### b) Traditional CLV
Churn= 1-retention rate
The retention to churn ratio gives us a multiplier meaning the expected length of the customer lifespan (loyalty).

#### Standard Transactional Dataset - Online Retail

In [None]:
online_retail=pd.read_csv('../input/online-retail-ii-uci/online_retail_II.csv',encoding = 'unicode_escape')
online_retail['InvoiceDate'] = pd.to_datetime(online_retail['InvoiceDate']).dt.date
online_retail.head()

#### Time Based Cohort Dataset - Derived from Online Retail
This dataset is created by assigning each customer to a monthly cohort, bsed on the month they made their first purchase. We'll use this cohort to calculate retention rates.

In [None]:
import datetime as dt
def get_month(x): return dt.datetime(x.year,x.month,1)
online_retail['InvoiceMonth']=online_retail['InvoiceDate'].apply(get_month)
group=online_retail.groupby('Customer ID')['InvoiceMonth']
online_retail['CohortMonth']=group.transform('min')

We have assigned the acquisition month cohort to each customer.

#### Extracting year, month and day integer values

In [None]:
def get_date_int(df, column):
    year=df[column].dt.year
    month=df[column].dt.month
    day=df[column].dt.day
    return year, month, day

In [None]:
invoice_year,invoice_month,_=get_date_int(online_retail,'InvoiceMonth')
cohort_year,cohort_month,_=get_date_int(online_retail,'CohortMonth')
years_diff=invoice_year-cohort_year
months_diff=invoice_month-cohort_month
online_retail['CohortIndex']=years_diff*12+months_diff+1
online_retail.head()

#### Count monthly active customers from each cohort

In [None]:
group=online_retail.groupby(['CohortMonth','CohortIndex'])
#Count the number of customers in each group by applying pandas nunique().
cohort_data=group['Customer ID'].apply(pd.Series.nunique)
cohort_data=cohort_data.reset_index()
cohort_counts=cohort_data.pivot(index='CohortMonth',columns='CohortIndex',values='Customer ID')
cohort_counts

#### Customer Retention Rate

In [None]:
cohort_sizes=cohort_counts.iloc[:,0]
# Divide them along the row axis.
retention=cohort_counts.divide(cohort_sizes,axis=0)
retention.round(3)*100

#### Other Metrics

In [None]:
grouping=online_retail.groupby(['CohortMonth','CohortIndex'])
cohort_data=group['Quantity'].mean()
cohort_data=cohort_data.reset_index()
average_quantity=cohort_data.pivot(index='CohortMonth',
                                  columns='CohortIndex',
                                  values='Quantity')

In [None]:
average_quantity.round(1)

#### Visualizing cohort analysis

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(10,8))
plt.title('Retention Rates')
sns.heatmap(data=retention,
           annot=True,
           fmt='.0%',
           vmin=0.0,
           vmax=0.5,
           cmap='BuGn')
plt.show()

First month retention is 100% because this is the month when the customers had first started buying. By definition, all of the customers from this cohort are active in their first month.

####  Churn analysis from cohort & visualization 

In [None]:
churn=1-retention
churn

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(10,8))
plt.title('Churn')
sns.heatmap(data=churn,
           annot=True,
           fmt='.0%',
           vmin=0.0,
           vmax=0.5,
           cmap='YlGn')
plt.show()

In [None]:
retention_rate=retention.iloc[:,1:].mean().mean()
churn_rate=churn.iloc[:,1:].mean().mean()
print('Retention rate: {:.2f};Churn rate: {:.2f}'.format(retention_rate,churn_rate))

## Calculating Revenue Based (Traditional) CLV


In [None]:
online_retail['TotalSum']=online_retail['Price']*online_retail['Quantity']
monthly_revenue=online_retail.groupby(['Customer ID','InvoiceMonth'])['TotalSum'].sum()
monthly_revenue

In [None]:
monthly_revenue=np.mean(monthly_revenue)
monthly_revenue

In [None]:
lifespan_months=36
clv=monthly_revenue*lifespan_months
print('Average Basic  CLV is {:.1f} USD'.format(clv))

In [None]:
# Calculate average revenue per invoice
revenue_per_purchase = online_retail.groupby(['Invoice'])['TotalSum'].mean().mean()

# Calculate average number of unique invoices per customer per month
frequency_per_month = online_retail.groupby(['Customer ID','InvoiceMonth'])['Invoice'].nunique().mean()

# Define lifespan to 36 months
lifespan_months = 36

# Calculate granular CLV
clv_granular = revenue_per_purchase * frequency_per_month * lifespan_months

# Print granular CLV value
print('Average granular CLV is {:.1f} USD'.format(clv_granular))

In [None]:
# Calculate monthly spend per customer
monthly_revenue = online_retail.groupby(['Customer ID','InvoiceMonth'])['TotalSum'].sum().mean()

# Calculate average monthly retention rate
retention_rate = retention.iloc[:,1:].mean().mean()

# Calculate average monthly churn rate
churn_rate = 1 - retention_rate

# Calculate traditional CLV 
clv_traditional = monthly_revenue * (retention_rate / churn_rate)

# Print traditional CLV and the retention rate values
print('Average traditional CLV is {:.1f} USD at {:.1f} % retention_rate'.format(clv_traditional, retention_rate*100))

Traditional CLV formula yield a much lower estimate as it accounts for monthly retention which is quite low for this company.

# Purchase Prediction

In [None]:
online_retail.groupby(['InvoiceMonth']).size()

In [None]:
online_retail['InvoiceDate'] = pd.to_datetime(online_retail['InvoiceDate'])
# Exclusion of target variable
online_X=online_retail[online_retail['InvoiceMonth']!='2011-11-01']
Now=dt.datetime(2011,11,1)
features=online_retail.groupby('Customer ID').agg({
    'InvoiceDate':lambda x: (Now-x.max()).days,
    'Invoice':pd.Series.nunique,
    'TotalSum':np.sum,
    'Quantity':['mean','sum'],
    }).reset_index()

- We first calculate the recency by calculating the difference in days between the previously defined snapshot date and the latest invoice date for the customer.
- Then we falculate frequency by counting the unique number of invoices.
- We sum the revenue spent by that customer to get the monetary value.
- In the final step, we calculate both the average and the total quantity purchased by the customer.

In [None]:
features.columns=['CustomerID','Recency','Frequency','Monetary','Quantity_Avg','Quantity_Sum']

In [None]:
features.head()

In [None]:
features.shape

### Calculate Target Variable

In [None]:
# Build pivot table with monthly transactions per customer
cust_month_tx=pd.pivot_table(data=online_retail,index=['Customer ID'],
                            values='Invoice',
                            columns=['InvoiceMonth'],
                            aggfunc=pd.Series.nunique,fill_value=0)

In [None]:
cust_month_tx.head()

In [None]:
cust_month_tx.shape

In [None]:
custid=['Customer ID']
target=['2011-11-01']
Y=cust_month_tx['2011-11-01']
cols=[col for col in features.columns if col not in custid]

In [None]:
X=features[cols]
X.head()

In [None]:
X.shape

In [None]:
Y.head()

In [None]:
Y.shape

In [None]:
from sklearn.model_selection import train_test_split
train_X, test_X, train_Y, test_Y = train_test_split(X, Y, test_size=0.25, random_state=99)

### Regression Performance Metrics
- ###### Root Mean Squared Error (RMSE)
  sqrt of the average squared difference between prediction and actuals. 
- ###### Mean Absolute Error (MAE)
  avgs absolute differences between predictions and actuals.
- ###### Mean Absolute Percentage Error (MAPE)
  avgs percentage difference between prediction and actuals. Actuals must be higher than zero.
- ###### R-squared (R^2)
  percentage proportion of variance that is explained by the model. Only applicable to regression, not classification. The higher the R^2 value, the better the model explains variance.
- ###### Coefficient p-values
  probability that the regression or classification coefficient is observed toe to chance. Lower is better. Typical thresholds are 5% and 10%.

In [None]:
from sklearn.linear_model import LinearRegression
linreg=LinearRegression()
linreg.fit(train_X,train_Y)

In [None]:
train_pred_Y=linreg.predict(train_X)
test_pred_Y=linreg.predict(test_X)

### Measuring model performance

In [None]:
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
rmse_train=np.sqrt(mean_squared_error(train_Y,train_pred_Y))
mae_train=np.sqrt(mean_absolute_error(train_Y,train_pred_Y))
rmse_test=np.sqrt(mean_squared_error(test_Y,test_pred_Y))
mae_test=np.sqrt(mean_absolute_error(test_Y,test_pred_Y))
print('RMSE train: {:.3f}; RMSE test: {:.3f}\nMAE train: {:.3f}, MAE test: {:.3f}'.format(rmse_train,rmse_test,mae_train,mae_test))

MAE is smaller, as it is less sensitive to outliers. MAE means that comparing actual transactions in November to the predicted transactions in November, our model is off by %70 of transactions.

### Interpret coefficients

In [None]:
import statsmodels.api as sm
train_Y=np.array(train_Y)
olsreg=sm.OLS(train_Y,train_X)
olsreg=olsreg.fit()
print(olsreg.summary())

R-squared is 0.609 means that the model explains %60 variance. Coefficient of frequency is 0.12, the customer who  have 1 unit higher frequency or invoice number in pre-november period will have 0.12 invoices more in November on average. If we assume the significance level at 95%, we will interpret coefficients with p-value lower or equal to 1 minus the significance level or 5%. There are only two coefficients with p-value lower or equal to 5%. There are 4 of them recency, frequency, monetary and quantity total.

REFERENCES: Datacamp, Market Analytics with Python