***
# Telecom-loyal Customer

## Data understanding
***
¶
The reason for selecting Telecom-Customer dataset is to find factors affecting on customer churn and cluster loyal customers.

Content: Each row represents a customer, each column contains customer’s attributes as described in the column Metadata.

The data set includes information about:
* Customers who left or stayed with Telco – the column is called Churn
* Services each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
* Customer account information – customer tenure, contract Length, payment methods, paperless billin option, monthly charges, and total charges
* Demographic info about customers – gender, age, and if they have partners and dependents

In [None]:
import pandas as pd
import numpy as np
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
sns.set(color_codes=True)

In [None]:
df=pd.read_csv('../input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv')
df.head()

In [None]:
df.info()

#### Data preparation:
As following the code  used to, first convert the argument to a numeric data type as well as the process of assessing the dataset for missing values, there were 11 missing values. 

In [None]:
df.TotalCharges = pd.to_numeric(df.TotalCharges, errors='coerce')
df.isnull().sum()

In [None]:
df.dropna(inplace = True)

In [None]:
df1 = df.iloc[:,1:]

#transform to binary code
df1['Churn'].replace(to_replace='Yes', value=1, inplace=True)
df1['Churn'].replace(to_replace='No',  value=0, inplace=True)

df_dummies = pd.get_dummies(df1)
df_dummies.head()


In [None]:
# Statictice information and Variables relationshipe
df_dummies.describe()

In [None]:
# correlation in glimps:
df_corr= df_dummies.iloc[:,0:]

plt.figure(figsize=(20,5))
df_corr.corr()['Churn'].sort_values(ascending = False).plot(kind='bar')

The figure shows:
* Correlation of gender, SeniorCitizen, Partner and Dependents with Churn are low. 
* Month to month contracts, absence of online security and tech support seem to be positively correlated with churn.While, tenure, two year contracts seem to be negatively correlated with churn.
* It seems that 'OnlineSecurity','TechSupport','InternetService_DSL','OnlineBackup','DeviceProtection' have lower correlation whith churn.

# <div >Data Exploration</div>

 For data exploration at first I will look at the distribution of individual variables and then check relation of the with churn.The variables are seen in two groups:Demography and customer account information.

### A. Check Demography of Dataset

#### Distribution of Demographic variables ( Gender, Senior citizen, Partner and Dependent):

In [None]:

figure, axes = plt.subplots(nrows=1, ncols=4,figsize=(16,4))

total = float(len(df)) 
ax=sns.countplot(x="gender", data=df, palette="Blues_d",ax=axes[0])
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}'.format(height/total),
            ha="center")
    ax.set_box_aspect(3/len(ax.patches))
    
total = float(len(df))
ax=sns.countplot(x="SeniorCitizen", data=df, palette="Blues_d",ax=axes[1])
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}'.format(height/total),
            ha="center")
    ax.set_box_aspect(3/len(ax.patches))
    
total = float(len(df)) 
ax=sns.countplot(x="Partner", data=df, palette="Blues_d",ax=axes[2])
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}'.format(height/total),
            ha="center")
    ax.set_box_aspect(3/len(ax.patches))

total = float(len(df)) 
ax=sns.countplot(x="Dependents", data=df, palette="Blues_d",ax=axes[3])
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}'.format(height/total),
            ha="center")
    ax.set_box_aspect(3/len(ax.patches))

plt.show()
 

Demography Results:
* Half of the customers are male and the other half are female
* 16% of the customers are senior citizens and most of customers are younger people.
* About 50% of the customers have a partner
* 30% of the total customers have dependents.

#### Relationship between demographic Characteristics and Churn

In [None]:
figure, axes = plt.subplots(nrows=2, ncols=2,figsize=(12,10))

total = float(len(df)) 
ax = sns.countplot(x="Churn", data=df,hue='gender', palette="Greens_d", ax=axes[0,0])
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}'.format(height/total),
            ha="center")

total = float(len(df)) 
ax = sns.countplot(x="Churn", data=df,hue='SeniorCitizen', palette="Greens_d",ax=axes[0,1])
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}'.format(height/total),
            ha="center")

total = float(len(df)) 
ax = sns.countplot(x="Churn", data=df,hue='Partner', palette="Greens_d", ax=axes[1,0])
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}'.format(height/total),
            ha="center")
    
total = float(len(df)) 
ax = sns.countplot(x="Churn", data=df,hue='Dependents', palette="Greens_d", ax=axes[1,1])
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}'.format(height/total),
            ha="center")

plt.show()

Observations:
* Both Genders seem to churn at a similar rate. 
* Customers considered Senior Citizens have a significantly higher rate of churn.
* Customers with a partner seem less likely to churn.
* While customers without dependents are less likely to churn in general, they make up the bulk of customers who have churn.

### B: Customer Account Information 

#### Churn distribution
 Churn  of customers from Telco is not equally distributed.

In [None]:
f, ax = plt.subplots(figsize=(5,5))
total = float(len(df)) 
ax = sns.countplot(x="Churn", data=df, palette="Oranges_d")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}'.format(height/total),
            ha="center")

#### Tenure Info:
Tenure is between 1 and 72 month;

In [None]:
f, ax = plt.subplots(figsize=(10, 5))
ax = sns.distplot(df['tenure'], hist=True, kde=False, 
             bins=36, color = 'blue')



#### Contract,PaymentMethod,Billing info:
* most of the customers are in the month to month contract and the number of customers in the 1 year and 2 year contracts are almost equal.
* Electronic check method has the highest number of customers for payment and others almost equal
* 58% of customers have paperless billing and there are just 10% differences with paper billing. 

In [None]:
figure, axes = plt.subplots(nrows=1, ncols=3,figsize=(15,4))

total = float(len(df)) 
ax = sns.countplot(x="Contract", data=df, palette="Blues_d", ax=axes[0])
ax.set_xticklabels(ax.get_xticklabels(),rotation=60)
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}'.format(height/total),
            ha="center")
ax.set_box_aspect(3/len(ax.patches))

total = float(len(df)) 
ax = sns.countplot(x="PaymentMethod", data=df, palette="Blues_d", ax=axes[1])
ax.set_xticklabels(ax.get_xticklabels(),rotation=60)
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}'.format(height/total),
            ha="center")

    
total = float(len(df)) 
ax = sns.countplot(x="PaperlessBilling", data=df, palette="Blues_d",  ax=axes[2])
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}'.format(height/total),
            ha="center")
ax.set_box_aspect(3/len(ax.patches))
        
plt.show()

#### Services info:
Phone services have a higher number of customers and other services almost in the same range. 

In [None]:
services = ['PhoneService','MultipleLines','InternetService','OnlineSecurity',
           'OnlineBackup','DeviceProtection','TechSupport','StreamingTV','StreamingMovies']
total = float(len(df))
fig, axes = plt.subplots(nrows = 3,ncols = 3,figsize = (20,20))
for i, item in enumerate(services):
    if i < 3:
        ax = df[item].value_counts().plot(kind = 'bar',ax=axes[i,0],rot = 0)
        for p in ax.patches:
            height = p.get_height()
            ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}'.format(height/total),
            ha="center")
        
    elif i >=3 and i < 6:
        ax = df[item].value_counts().plot(kind = 'bar',ax=axes[i-3,1],rot = 0)
        for p in ax.patches:
            height = p.get_height()
            ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}'.format(height/total),
            ha="center")
    
    elif i < 9:
        ax = df[item].value_counts().plot(kind = 'bar',ax=axes[i-6,2],rot = 0)
        for p in ax.patches:
            height = p.get_height()
            ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}'.format(height/total),
            ha="center")
    ax.set_title(item)

#### Charges:
The large number of customers have low charges and some of  the total charges increases as the monthly bill for a customer increases. 

In [None]:
f, ax = plt.subplots(figsize=(10, 5))
ax = sns.distplot(df['MonthlyCharges'], hist=True, kde=False, 
             bins=60, color = 'blue')

In [None]:
f, ax = plt.subplots(figsize=(10, 5))
ax = sns.distplot(df['TotalCharges'], hist=True, kde=False, 
             bins=60, color = 'blue')

#### Relationship Between Churn and Customer info:

In [None]:
figure, axes = plt.subplots(nrows=2, ncols=3,figsize=(20,8))

ax= sns.boxplot(x="Churn", y="tenure", data=df, dodge=False, ax=axes[0,0]);


ax= sns.boxplot(x="Churn", y="TotalCharges", data=df, dodge=False,  ax=axes[0,1]);


ax= sns.boxplot(x="Churn", y="MonthlyCharges", data=df, dodge=False,  ax=axes[0,2]);



total = float(len(df)) 
ax = sns.countplot(x="Contract",hue="Churn", data=df, palette="Greens_d",  ax=axes[1,0])
ax.set_xticklabels(ax.get_xticklabels(),rotation=60)
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}'.format(height/total),
            ha="center")

total = float(len(df))
ax = sns.countplot(x="PaymentMethod",hue="Churn", data=df, palette="Greens_d",  ax=axes[1,1])
ax.legend(loc='best',prop={'size':7},title = 'Churn')
ax.set_xticklabels(ax.get_xticklabels(),rotation=60)
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}'.format(height/total),
            ha="center")

            
total = float(len(df)) 
ax = sns.countplot(x="PaperlessBilling",hue="Churn", data=df, palette="Greens_d",  ax=axes[1,2])
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}'.format(height/total),
            ha="center")
ax.set_box_aspect(3/len(ax.patches))


plt.show()

Observations:
* Persons on monthly contracts, among those who have churned, are most likely to do leave.
* Tenure ranges from 1 to 72 months, persons with tenures more than 30 months seem most likely to remain with the company.
* those who churned, persons who opt for Paper Billing have a lower rate of churn.
* Automatic payments (Credit card, Bank transfer) show lower levels of churn.
* Higher percent of customers churn when the monthly charges are high while there is higher churn when the total charges are lower.


## Modeling
### Supervised Learning:
Three classification models were used to predict the likelihood of customer churn and find important factors affect on churn:
* Decision Tree
* Random Forest
* Logistic Regression


In [None]:
from scipy import stats, integrate
from sklearn import metrics
from sklearn.model_selection import train_test_split

#### Scaling: 
The df_dummies dataset was then separated into Y output variable and X input variables. X is then transformed  to ensure that each variable is in a given range on the training set, in this case it is 0,1, the same as the y variable.


In [None]:
y = df_dummies['Churn'].values
x = df_dummies.drop(columns = ['Churn'])

from sklearn.preprocessing import MinMaxScaler
features = x.columns.values
scaler = MinMaxScaler(feature_range = (0,1))
scaler.fit(x)
x = pd.DataFrame(scaler.transform(x))
x.columns = features

#### Splitting testing and training the data:
A 20/80 testing and training split was applied.

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test,y_train,y_test=train_test_split(x, y, test_size=0.2, random_state=0)

#### A: Decsion Trees 

In [None]:
from sklearn import tree
dt_model= tree.DecisionTreeClassifier()
dt_model.fit(x_train,y_train)

In [None]:
dt_scored=dt_model.score(x_test,y_test)
dt_scored

In [None]:
dt_predicted=dt_model.predict(x_test)
dt_predicted

In [None]:
importances = dt_model.feature_importances_
weights = pd.Series(importances,
                 index=x.columns.values)
weights.sort_values()[-10:].plot(kind ='barh')

In [None]:
importances = dt_model.feature_importances_
weights = pd.Series(importances,
                 index=x.columns.values)
weights.sort_values()[:10].plot(kind = 'barh')

#### B: Random Forest

In [None]:
from sklearn.ensemble import RandomForestClassifier
rfc_model = RandomForestClassifier(max_depth=20, random_state=0)
rfc_model.fit(x_train,y_train)

In [None]:
rfc_scored=rfc_model.score(x_test,y_test)
rfc_scored

In [None]:
importances = rfc_model.feature_importances_
weights = pd.Series(importances,
                 index=x.columns.values)
weights.sort_values()[-10:].plot(kind = 'barh')

In [None]:
importances = rfc_model.feature_importances_
weights = pd.Series(importances,
                 index=x.columns.values)
weights.sort_values()[:10].plot(kind = 'barh')

#### C: Logistic Regression

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test,y_train,y_test=train_test_split(x, y, test_size=0.3, random_state=0)
from sklearn.linear_model import LogisticRegression
logr_model= LogisticRegression()
result = logr_model.fit(x_train, y_train)

In [None]:
logr_score=logr_model.score(x_test,y_test)
logr_score

In [None]:
weights = pd.Series(logr_model.coef_[0],
                 index=x.columns.values)
print (weights.sort_values(ascending = False)[:10].plot(kind='bar'))

In [None]:
weights = pd.Series(logr_model.coef_[0],
                 index=x.columns.values)
print (weights.sort_values(ascending = False)[-10:].plot(kind='bar'))

Results:
* The Random forest Classifier out performed the decision tree, with an accuracy score of 79.24 percent and a maximum depth of the model was 20. That said there is a noticeable increase in the false negative result.
* Logistic Regression have an accuracy score of 79.62 percent and outperforming the other two models.
* The random forest model is outperforming the other two models.
* Total charges, monthly contracts and tenure are the most important features affect on churn accordingly tenure,total charges and contract-two year(vs. monthly contract) could be considered for recognizing loyal customers.

### Unsupervised learning:
#### <div style= "color:green">Loyal customer clustering</div>
In this section Loyal customers are clustered by 2 methods: K_means and Hierarchical method


Researches show that the cost of find new customer for each company is 5 time more than Cost of retaining current customers thuse the loyal customer are gold for each company.

-For cluster loyal customer I got help from RFM Model as following:

- Recency: unfortunately we dont have last contract date of customers and this attribute could not be calculated whit Teleco dataset information. 
- Frequency: this attribute could be observed by time of contract and tenure
- MonetaryValue: we could find this kind of loyal customer by total charges


#### A: Clustering using K-means method
Key steps done as flowing:
* Data pre-processing
* Choosing a number of clusters
* Running k-means clustering on pre-processed data 
* Analyzing average of each cluster


##### Data pre-processing done in below steps:
* Defining Three atributs(Frequency, MonetaryValue and tenure) as loyalty characteristics.
* Removing skew and Normalizing

In [None]:
df_dummies['Frequency']= (df_dummies['tenure']//24)*df_dummies['Contract_Two year']
print(df_dummies['Frequency'].max(), df_dummies['Frequency'].min())

In [None]:
df_dummies['MonetaryValue']=df_dummies['TotalCharges']

In [None]:
df_lc= df_dummies.loc[:,['tenure','MonetaryValue','Frequency']]
df_lc.head()

In [None]:
df_lc.describe()

In [None]:
df_lc['MonetaryValue']=np.log(df_lc['MonetaryValue'])
df_lc['tenure']=np.log(df_lc['tenure'])
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(df_lc)
df_lc_normalized = scaler.transform(df_lc)
print('mean: ', df_lc_normalized.mean(axis=0).round(2))
print('std: ', df_lc_normalized.std(axis=0).round(2))

##### Choosing a number of clusters:
 In order to ensure that too much modeling of the data is not given,the elbow method is used to determine optimal number of clusters. 

In [None]:
from sklearn.cluster import KMeans
import seaborn as sns
from matplotlib import pyplot as plt

sse = {}
for k in range(1, 11):
    kmeans = KMeans(n_clusters=k, random_state=1)
    kmeans.fit(df_lc_normalized)
    sse[k] = kmeans.inertia_

plt.title('The Elbow Method')
plt.xlabel('k'); plt.ylabel('SSE')
sns.pointplot(x=list(sse.keys()), y=list(sse.values()))
plt.show()

Based on Elbow graph The optimum point for clustering is K=3

##### Running k-means clustering:

In [None]:
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3, random_state=1)
kmeans.fit(df_lc_normalized)
cluster_attribute_labels = kmeans.labels_

In [None]:
cluster_labels = kmeans.labels_
df_lc_k3=df_lc.assign(Cluster = cluster_labels)

In [None]:
df_lc_k3.groupby(['Cluster']).agg({
'tenure': 'mean'
,
'Frequency': 'mean'
,
'MonetaryValue': ['mean','count'],
}).round(0)

In [None]:
df_lc_normalized = pd.DataFrame(df_lc_normalized,index= df_lc.index,columns= df_lc.columns)
df_lc_normalized['Cluster'] = df_lc_k3['Cluster']

In [None]:
df_lc_melt = pd.melt(df_lc_normalized.reset_index(),
                     id_vars=['Cluster'],
                     value_vars=['tenure','MonetaryValue','Frequency'],
                     var_name='Attribute',
                     value_name='Value')

In [None]:
plt.title('Snake plot of standardized variables')
sns.lineplot(x="Attribute", y="Value", hue='Cluster', data=df_lc_melt)

It seems that cluster 2 is desired cluster for loyal customer segmentation

##### Analyzing average of each clusters:
For average of clusters analysis the clustering done for  K=4 and result shown in below:

In [None]:
kmeans = KMeans(n_clusters=4, random_state=1)
kmeans.fit(df_lc_normalized)
cluster_attribute_labels = kmeans.labels_

In [None]:
cluster_labels = kmeans.labels_
df_lc_k4=df_lc.assign(Cluster = cluster_labels)

In [None]:
df_lc_k4.groupby(['Cluster']).agg({
'tenure': 'mean'
,
'Frequency': 'mean'
,
'MonetaryValue': ['mean','count'],
}).round(0)

Comparing between mean tables with k=3 and k=4 shows that optimal cluster has not changing.

#### B: Hierarchical clustering:
Clustering done by Hierarchical method too; first at all optimum number of clusters obtained from dendrogram and clustering visualised by scatter plot. 

In [None]:
import scipy.cluster.hierarchy as sch
plt.figure(figsize=(8,8))
dendrogram = sch.dendrogram(sch.linkage(df_lc_normalized, method = 'ward'))
plt.title('Dendrogram')
plt.xlabel('Customers')
plt.ylabel('Euclidean distances')
plt.show()

The combination of 3 lines are not joined on the Y-axis from 50 to 140, for about 90 units. So, the optimal number of clusters will be 3 for hierarchical clustering. 

In [None]:
x=df_lc_normalized[:]
from sklearn.cluster import AgglomerativeClustering
hc = AgglomerativeClustering(n_clusters = 3, affinity = 'euclidean', linkage = 'ward')
y_hc = hc.fit_predict(x)

In [None]:
x= np.array(x)
plt.figure(figsize=(6,6))
plt.scatter(x[y_hc == 0, 0],x[y_hc == 0, 1], s = 100, c = 'red',label='Cluster0')
plt.scatter(x[y_hc == 1, 0],x[y_hc == 1, 1], s = 100, c = 'blue',label='Cluster1')
plt.scatter(x[y_hc == 2, 0],x[y_hc == 2, 1], s = 100, c = 'green',label='Cluster2')
plt.title('Clusters of customers using Hierarchical Clustering')
plt.legend()
plt.show()

cluster 1 is the best cluster in Hierarchical Clustering.