# Telecom Company Customer Churn<a id='top'></a>
##  Association Rule Modeling <a id='top'></a>

#### October 9, 2020

## Contents <a id='top'></a>
1. <a href=#object>Objective</a>
1. <a href=#data>Data exploration</a>
1. <a href=#model> Unsupervised Learning Modeling</a>
1. <a href=#bs> Business Strategy</a>
1. <a href=#conclusion>Conclusion</a>


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import scale
import seaborn as sns
from mlxtend.frequent_patterns import apriori,association_rules

<a id='object'></a>
## 1. Objective
<a href=#top>(back to top)</a>

The objective of this Python Code is to provide the Unsupervised learning modelling for customer churn using Association Rules to find underlying relationship between online services (streamming TV etc.) and develop new strategies for customers churn.

<a id='data'></a>
# 2. Data exploration
<a href=#top>(back to top)</a>

For this project the dataset is downloaded from Kaggle.com, a website which is known as a machine learning and data analytics competition platform. It contains 21 columns in total, and there are 7043 samples in the dataset.

Attribute Information:
1. Customer ID
2. gender: Whether the customer is a male or a female
3. SeniorCitizen: Whether the customer is a senior citizen or not (1, 0)
4. Partner: Whether the customer has a partner or not (Yes, No)
5. Dependents: Whether the customer has dependents or not (Yes, No)
6. tenure: Number of months the customer has stayed with the company
7. **PhoneService**: Whether the customer has a phone service or not (Yes, No)
8. MultipleLines: Whether the customer has multiple lines or not (Yes, No, No phone service)
9. InternetService: Customer’s internet service provider (DSL, Fiber optic, No)
10. **OnlineSecurity**: Whether the customer has online security or not (Yes, No, No internet service)
11. **OnlineBackup**: Whether the customer has online backup or not (Yes, No, No internet service)
12. **DeviceProtection**: Whether the customer has device protection or not (Yes, No, No internet service)
13. **TechSupport**: Whether the customer has tech support or not (Yes, No, No internet service)
14. **StreamingTV**: Whether the customer has streaming TV or not (Yes, No, No internet service)
15. **StreamingMovies**: Whether the customer has streaming movies or not (Yes, No, No internet service)
16. Contract: The contract term of the customer (Month-to-month, One year, Two year)
17. PaperlessBilling: Whether the customer has paperless billing or not (Yes, No)
18. PaymentMethod:The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))
19. MonthlyCharges: The amount charged to the customer monthly
20. TotalCharges: The total amount charged to the customer
21. Churn: Whether the customer churned or not (Yes or No)

***Notice:***

Services with **bold** are what we focus on because there are correlations between certain service status (status of PhoneService will affect status of MultipleLines, and status of InternetService will influence rest of online service status such as StreamingtTV) and only features with binary status (True or False) can be applied in **Association Rule**

### 2.1 Loading data

In [None]:
# loading the data 
Telco_customer_churn = pd.read_csv('../input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv')
Telco_customer_churn.head()

### 2.2 Preprocessing data

#### check for missing value

In [None]:
Telco_customer_churn.info()
# take a look of the dataframe information

In [None]:
Telco_customer_churn.replace(to_replace= r'^\s*$', value=np.nan,regex=True, inplace=True ) 
#replace any unit value that only contains " ", space with 0.
Telco_customer_churn.isnull().any() 
#check whether each column contains a missing value

In [None]:
Telco_customer_churn.isnull().sum()
# check number of missing value

In [None]:
# find out all the rows with NAN values
Telco_customer_churn[Telco_customer_churn.isnull().T.any()]

Close scrutiny of the case reveals that ***those blank cells are caused by tenure = 0,*** which means the customers may be new arrival at that time. Therefore, we convert blank cells into float number 0.

In [None]:
Telco_customer_churn=Telco_customer_churn.fillna(0)

#### Convert TotalCharge column into float type

The reason why we convert this 'object ' column to numerical is that the `onehot encoding` method we use later will convert all the object columns into onehot code, and every price in TotalCharge column will be represented with a large number of 0s and 1s, which is definitely not what we want

In [None]:
Telco_customer_churn['TotalCharges'] = pd.to_numeric(Telco_customer_churn['TotalCharges']) 
#convert datatype in TotalCharges from 'object' to numeric, preventing TotalCharges from onehot encoding

### 2.2 Data visualization

In [None]:
fig, ax=plt.subplots(5,4,figsize=(20,28))
sns.countplot(x="gender", data=Telco_customer_churn,ax=ax[0,0])
sns.countplot(x="SeniorCitizen", data=Telco_customer_churn,ax=ax[0,1])
sns.countplot(x="Partner", data=Telco_customer_churn,ax=ax[0,2])
sns.countplot(x="Dependents", data=Telco_customer_churn,ax=ax[0,3])
sns.distplot(Telco_customer_churn['tenure'],bins=10,ax=ax[1,0],axlabel='tenure distribution')
sns.countplot(x="PhoneService", data=Telco_customer_churn,ax=ax[1,1])
sns.countplot(x="MultipleLines", data=Telco_customer_churn,ax=ax[1,2])
sns.countplot(x="InternetService", data=Telco_customer_churn,ax=ax[1,3])
sns.countplot(x="OnlineSecurity", data=Telco_customer_churn,ax=ax[2,0])
sns.countplot(x="OnlineBackup", data=Telco_customer_churn,ax=ax[2,1])
sns.countplot(x="DeviceProtection", data=Telco_customer_churn,ax=ax[2,2])
sns.countplot(x="TechSupport", data=Telco_customer_churn,ax=ax[2,3])
sns.countplot(x="StreamingTV", data=Telco_customer_churn,ax=ax[3,0])
sns.countplot(x="StreamingMovies", data=Telco_customer_churn,ax=ax[3,1])
sns.countplot(x="Contract", data=Telco_customer_churn,ax=ax[3,2])
sns.countplot(x="PaperlessBilling", data=Telco_customer_churn,ax=ax[3,3])
sns.countplot(x="PaymentMethod", data=Telco_customer_churn,ax=ax[4,0])
sns.distplot(Telco_customer_churn['MonthlyCharges'],bins=10,ax=ax[4,1],axlabel='MonthlyCharges distribution')
sns.distplot(Telco_customer_churn['TotalCharges'],bins=10,ax=ax[4,2],axlabel='TotalCharges distribution')
sns.countplot(x="Churn", data=Telco_customer_churn,ax=ax[4,3])

<a id='model'></a>
## 3. Unsupervised Learning (Association Rule)
<a href=#top>(back to top)</a>

In addition to build supervised learning models dealing with a clear target, we also apply unsupervised learning to discover a underlaying relationship between features.

Association Analysis

While there is fewer people using unsupervied learning to explore this dataset, some researchers still apply `Association Rule` to find relationships between churn and other features. To be specific, by given consequent set churn = 0, they want to find certain condition sets containing rest of features that have high support, confidence and lift value.
In our project, our interest is much more clear: our goal is to find the underlying relationship among all the service status. In other words, we want to find robust association rules among services, so if customers open certain services already, what services else should we recommend to them that is most likely to succeed. 

Preprocessing Data

Since MultipleLines status has correlation with PhoneSerive, and InternetService status is also affecting the rest online services, so we decide analyze data that already has internet service. Our focus is among PhoneService, OnlineSecurity, OnlineBackup,DeviceProtection, TechSupport, StreamingTV, StreamingMovies.

In [None]:
#select rows that all have internet service.
Telco_customer_churn_with_InternetService = Telco_customer_churn[Telco_customer_churn.InternetService!='No']

#extract features we want.
association_rule_features = Telco_customer_churn_with_InternetService[['PhoneService', 'OnlineSecurity', 'OnlineBackup','DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies']]

#replace Yes and No with boolean expression
association_rule_features.replace({'Yes': True, 'No': False}, inplace=True)
association_rule_features

### Modeling

In [None]:
# find all item set that its support is greater than 0.2, and the max item number is 4
ap_out=apriori(association_rule_features,min_support=0.2,use_colnames=True,max_len=4,verbose=True)
ap_out.head()

In [None]:
# find all the association rules that its confidence is greater than 0.7
ar = association_rules(ap_out, metric='confidence',min_threshold=0.7).round(2)
ar.head()

In [None]:
# find rules that it lift value is greater than 1.2, and sorted by descending order.
ar[(ar.lift>1.2)&(ar.antecedents.apply(len)<=2)].sort_values(by='lift',ascending=False).round(2)

From table above we can see the underlying relationship between streamingTV and StreamingMovies: 

1. People who open device protection and streamingTV services are very likely to open streaming movies service, with support = 0.23, confidence = 0.80, lift = 1.62; 

2. with openning device protection and streamming TV services, customer are also very likely to open phone service and streaming movies together, with support = 0.20, confidence =0.71, and lift = 1.61; 

3. People who open device protection and streamingMovies services are very likely to open streamingTV service, with support = 0.23, confidence = 0.79, lift = 1.60;  

4. People who open phone service and streamingMovies services are very likely to open streamingTV service, with support = 0.32, confidence = 0.71, lift = 1.46;

5. With openning phone service and streamming TV services, customer are also very likely to open streaming movies, with support = 0.32, confidence =0.72, and lift = 1.45;

6. With openning streamming Movies services,customer are also very likely to open streaming TV service, with support = 0.35, confidence = 0.72, and lift = 1.45,vice versa.

<a id='bs'></a>
## Business Strategy
<a href=#top>(back to top)</a>

Since our company aim is to keep customer within our company, thus provide them with bundle services as many as we could, there are a few strateges we could consider:

1. Compared with rest rows, rules where antecedents include device protection have lift value 1.60 or higher. (index 17 vs index 6). Convince customers to open Device Protection Service after buying a Streaming Service by highlighting the importance of privacy issues. By improving lift value, probability of buying consequents will be improved largely. 

2. Confidence and lift values are high among these rules, which indicates they are bonded tightly, so the important thing is to improve support score of rules, making them happen more frequently. In order to do that, we recommend offering some nice packages (TV, Movies streaming, phone service, device protection) with attractive **discount** to improve the selling amount.

#### Here is a theoretical example for coming up with a optimal discount percentage.
To give an example for determining discount percentage, we decided to calculate how much discount to give to customers who have a Streaming Service (TV or Movies) to entice them to buy another Service. Our goal was to find the optimal discount percentage that can bring maximum profit to the Telecom Company. 

**Analyzed Data**
Selected data that pertains to Streaming Movies TV: 
1. Average monthly charges for customers with both Movies and TV = 93.24
2. No.of customer with both  Streaming Movies and TVs =  1940
3. Average monthly charge for customers with only one service) = 77.08
4. No.of customer with only 1 Service = 1559


The difference between the two monthly charges, which is 16.16, indicates marginal money can be gained when people who only have one service are willing to open the other.

In [None]:
#find MonthlyCharges for customer who both open StreamingTV and StreamingMovies
MC_TV_with_Movies=Telco_customer_churn[(Telco_customer_churn.StreamingMovies=='Yes')&(Telco_customer_churn.StreamingTV=='Yes')][['MonthlyCharges']]
MC_TV_with_Movies.head()

In [None]:
number_two_services=MC_TV_with_Movies.count().MonthlyCharges
mean_TV_with_Movies=MC_TV_with_Movies.mean().MonthlyCharges    
print('There are ',number_two_services,'customers who open two streaming service both with average monthly charge ',mean_TV_with_Movies)

In [None]:
#find MonthlyCharges for customer who open StreamingTV but no StreamingMovies
#find MonthlyCharges for customer who open StreamingMovies but no StreamingTV
MC_Movies_no_TV=Telco_customer_churn[(Telco_customer_churn.StreamingMovies=='Yes')&(Telco_customer_churn.StreamingTV=='No')][['MonthlyCharges']]
MC_TV_no_Movies=Telco_customer_churn[(Telco_customer_churn.StreamingMovies=='No')&(Telco_customer_churn.StreamingTV=='Yes')][['MonthlyCharges']]

In [None]:
MC_Movies_no_TV.head()

In [None]:
#number of people who open streaming movies but do not open streamingTV
number_Movies_no_TV=MC_Movies_no_TV.count()
number_Movies_no_TV
# find the average monthly charges of people only open streaming movies
mean_Movies_no_TV=MC_Movies_no_TV.mean()
mean_Movies_no_TV

In [None]:
MC_TV_no_Movies.head()

In [None]:
#find number of people who open streaming TV but do not open streaming movies.
number_TV_no_Movies=MC_TV_no_Movies.count()
number_TV_no_Movies
#find the average monthly charges for people only open streaming TV
mean_TV_on_Movies=np.mean(MC_TV_no_Movies)
mean_TV_on_Movies

In [None]:
# integrate all the people who only have one streaming service.
number_one_service=(number_TV_no_Movies+number_Movies_no_TV).MonthlyCharges
mean_one_service=((mean_Movies_no_TV*number_Movies_no_TV+mean_TV_on_Movies*number_TV_no_Movies)/number_one_service).MonthlyCharges
print('There are ',number_one_service, 'customers in total who only open one streaming service with average monthly charge of ',mean_one_service)

#### Assumptions:
1. Assume a linear relationship between amount of discount and number of purchasing customers (the bigger the discount, more customers purchase) 
2. Set maximum allowed discount to 30% 
3. Assume half of the customers (50%) with only one Streaming Service will also have another Streaming Service by given 30% discount.

#### Formulation:
1. 0<= ***discount percentage*** <=30%
2. ***attractive percentage*** = discount percentage / 60% 
3. ***customer cost for open one more service*** = average monthly charge with two service - average monthly charge with one service
4. ***company gain per customer for opening one more*** = customer cost ×(1-discount percentage)
5. ***number of customer attracted***= number of customer only have one service × attractive percentage
6. ***total gain*** = gain per customer * number of customer attracted
7. ***total loss*** = customer cost for open one more service * discount percentage * number of people open two service
8. ***revenue*** = total gain - total loss

(attractive percentage means with a discount rate, how many percent of customers with only one streaming service are interested in opening another one)

In [None]:
# set the range of discount percentage.
discount_percent=np.linspace(0,0.3,11)
discount_percent

In [None]:
# looping to find a optimal discount percentage
for i in discount_percent:
    attractive_percent=i/0.6
    loss=(mean_TV_with_Movies-mean_one_service)*i*number_two_services
    gain=(mean_TV_with_Movies-mean_one_service)*(1-i)*number_one_service*attractive_percent
    marginal_revenue=gain-loss
    print('the marginal revenue is: ',marginal_revenue, ' when the discount is ',i)

from above we can find when discount =0.12, the marginal revenue is maximum, so in order to make maximum money, we should consider to give 12% discount.customers already open one streaming service, there would be a 12% discount for them to open the other one.

<a id='conclusion'></a>
## 5. Conclusion
<a href=#top>(back to top)</a>

1.  Offers discount to existing streaming services customers
For customers with an existing streaming service,  offer a 5 – 12 % discount (12% is optimum) to entice them to buy an additional service. 

2. Recommend Device Protection to customers with  Streaming Service
Inform customers with Streaming Service the security of having Device Protection and even offer discount
