**PROBLEM STATEMENT -**

Customer churn means shifting from one service provider to its competitor in the market. Customer churn is one of the biggest fears of any industry, particularly for the telecom industry. With an increase in the number of telecom service providers in South Asia, the level of competition is quite high. Although there are many reasons for customer churn, some of the major reasons are service dissatisfaction, costly subscription, and better alternatives. The telecom service providers strive very hard to sustain in this competition. So to sustain this competition they often try to retain their customers than acquiring new ones as it proved to be much costlier. Hence predicting churn in the telecom industry is very important. To reduce customer churn, telecom companies need to predict which customers are at high risk of churn.


**About the dataset**

customerID - customer id

gender - client gender (male / female)

SeniorCitizen - is the client retired (1, 0)

Partner - is the client married (Yes, No)

tenure - how many months a person has been a client of the company

PhoneService - is the telephone service connected (Yes, No)

MultipleLines - are multiple phone lines connected (Yes, No, No phone service)

InternetService - client's Internet service provider (DSL, Fiber optic, No)

OnlineSecurity - is the online security service connected (Yes, No, No internet service)

OnlineBackup - is the online backup service activated (Yes, No, No internet service)

DeviceProtection - does the client have equipment insurance (Yes, No, No internet service)

TechSupport - is the technical support service connected (Yes, No, No internet service)

StreamingTV - is the streaming TV service connected (Yes, No, No internet service)

StreamingMovies - is the streaming cinema service activated (Yes, No, No internet service)

Contract - type of customer contract (Month-to-month, One year, Two year)

PaperlessBilling - whether the client uses paperless billing (Yes, No)

PaymentMethod - payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit 

card (automatic))

MonthlyCharges - current monthly payment

TotalCharges - the total amount that the client paid for the services for the entire time

Churn - whether there was a churn (Yes or No)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split , cross_val_score , RepeatedStratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier, plot_tree

In [None]:
df = pd.read_csv("../input/telecom-users/telecom_users.csv")
df.shape

In [None]:
df.head()

# EDA

In [None]:
df.columns.values

In [None]:
df.drop(columns=['Unnamed: 0' ], inplace=True) #dropping the first column
df.columns

In [None]:
df.info()

- TotalCharges should be converted to numerical (to find missing values)

- All dtype of categorical features to be converted from dtype=object to dtype=category 

In [None]:
df.Churn.value_counts()

- There is clear imbalance in the two classes


In [None]:
def get_categorical(X, threshold=10):
    """
        Identifies categorical columns in the dataframe.
        threshold = 10 because we do not want to include continuous variables like tenure,monthly and 
        total charges which have more than 10 unique values
    """
    cat_cols = []
    for col in X.columns:
       # print(col)
        if len(X[col].unique()) <= threshold:
            cat_cols.append(col)
    return cat_cols


def convert_to_categorical(columns, X):
    """
       Converts the columns passed in `columns` to categorical datatype
    """
    print(columns)
    for col in columns:
        X[col] = X[col].astype('category')
    return X



In [None]:
df = convert_to_categorical(get_categorical(df), df)
df.TotalCharges = pd.to_numeric(df.TotalCharges,errors='coerce')
df.SeniorCitizen.astype('str')
df.info()

In [None]:
df.isnull().sum()   

- 10 missing values

We'll just simply drop these samples since they're very less

In [None]:
df.dropna(inplace=True)

In [None]:
df.isnull().sum()

## Visualizatiions 


### - *Exploring demographic Caterical features*

In [None]:
%matplotlib inline

In [None]:
sns.catplot(x='gender', hue='Churn', kind='count', palette='bright', data=df)

- About half of the customers in our data set are male while the other half are female and both are equally likely to churn.

In [None]:
sns.catplot(x='SeniorCitizen',hue='Churn', palette='bright',kind='count',data=df)

- There are approx 15-20% of the customers who are senior citizens and most of our customers in the data are younger people.

In [None]:
sns.catplot(x='Dependents',hue='Churn',kind='count',data=df)

- About 30-40% of customers have dependents and thus they are less likely to churn.
- Most Customers are those who do not have anyone who depends on them 

In [None]:
sns.catplot(x='Partner',hue='Churn',kind='count',data=df)

 - Around 45-50% Customers have partners while Single customers Churn more
    
    

### - *Exploring features desciribing services used by customers*

In [None]:
sns.catplot(x='PhoneService', hue='Churn', kind='count', palette='pastel', data=df)

In [None]:
sns.catplot(x='MultipleLines', hue='Churn', kind='count', palette='pastel', data=df)

In [None]:
pd.crosstab(index = df["PhoneService"], columns = df["MultipleLines"])

**INFERENCE-**

- If the subscribers have phone service, they may have multiple lines (yes or no).

- But if the subscribers don't have phone service, the subscribers will never have multiple lines.

In [None]:
sns.catplot(x='InternetService', hue='Churn', kind='count', palette='pastel', data=df)

In [None]:
sns.catplot(x='OnlineSecurity', hue='Churn', kind='count', palette='pastel', data=df)

In [None]:
pd.crosstab(index = df["InternetService"], columns = df["OnlineSecurity"])

In [None]:
sns.catplot(x='OnlineBackup', hue='Churn', kind='count', palette='pastel', data=df)

In [None]:
pd.crosstab(index = df["InternetService"], columns = df["OnlineBackup"])

In [None]:
sns.catplot(x='DeviceProtection', hue='Churn', kind='count', palette='pastel', data=df)

In [None]:
pd.crosstab(index = df["InternetService"], columns = df["DeviceProtection"])

In [None]:
sns.catplot(x='TechSupport', hue='Churn', kind='count', palette='pastel', data=df)

In [None]:
pd.crosstab(index = df["InternetService"], columns = df["TechSupport"])

In [None]:
sns.catplot(x='StreamingTV', hue='Churn', kind='count', palette='pastel', data=df)

In [None]:
pd.crosstab(index = df["InternetService"], columns = df["StreamingTV"])

In [None]:
sns.catplot(x='StreamingMovies', hue='Churn', kind='count', palette='pastel', data=df)

In [None]:
pd.crosstab(index = df["InternetService"], columns = df["StreamingMovies"])

**INFERENCE-**

- If the subscribers have Internet services (either DSL or Fiber optic), the subscribers may opt to have other services related to Internet (i.e. streaming TV, device protection).

- But if the subscribers don't have the Internet services, this secondary service will not be available for the subscribers.

***conclusion*** : *No Internet/Phone service is just creating redundancy. When we preprocess the data, we will replace NO SERVICE with NO and will proceed with only two categories for these features*

In [None]:
sns.catplot(x='PaperlessBilling', hue='Churn', kind='count', palette='dark', data=df)

- Customers who use paperless billing methods churn at a comparatively very higher rate than those who does not use paperless billing service.

In [None]:
sns.catplot(y='Contract', hue='Churn', kind='count', palette='dark', data=df)

- Shorter the Contract of a customer, higher the chance that he/she will churn.

In [None]:
sns.catplot(y='PaymentMethod', hue='Churn', kind='count', palette="dark", data=df)

- Most Churn Customers were those who used Electronic Check for Payment.

*-Continuous Features*

In [None]:
df[['tenure', 'MonthlyCharges', 'TotalCharges']].corr()

Tenure-TotalCharges & Tenure-MonthlyCharges are positively correlated

In [None]:
df.describe()

In [None]:
df['Churn']

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(15, 5))
cols = ['TotalCharges', 'MonthlyCharges', 'tenure']
for i in range(len(cols)):
    sns.boxplot(x=df["Churn"], y=cols[i], data=df, ax=axes[i])
fig.tight_layout()

 Churn Customers have:

- tenure around of 0 to 25 months , mostly 10(mean)

- Monthly Charges from 60 to 85

- Total Charges upto 2000


# Data Preprocessing

In [None]:
X = df.iloc[:,1:-1]
y = df.loc[:,['Churn']]

In [None]:
from sklearn.preprocessing import FunctionTransformer

def convert_no_services(X):
    """This function replaces 'No Internet/Phone' Service with 'No' for the apt. variables"""
    for col in X.columns.values:
        for cat in X[col].unique():
            try:
                if cat.find('service') != -1:

                    if cat.find('internet') != -1:

                        X[col] = X[col].replace({'No internet service':'No'}).astype("category")

                    elif cat.find('phone') !=  -1:

                        X[col] = X[col].replace({'No phone service':'No'}).astype("category")

            except:
                pass
    return X

    
services_ = FunctionTransformer(convert_no_services, validate=False)

In [None]:
X.head(4)

In [None]:
X = services_.fit_transform(X)

In [None]:
X.head(4)

In [None]:
for col in X.columns:
    print(col.upper())
    print(X[col].unique())
    print()

In [None]:
df.info()

In [None]:
categorical_columns = list(X.select_dtypes(include='category').columns)
numeric_columns = list(X.select_dtypes(exclude='category').columns)

In [None]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, LabelEncoder

## Column Transformer
transformers = [('one_hot_encoder', OneHotEncoder(drop='if_binary',dtype='int'),categorical_columns),
               ('Standard_Scaler', StandardScaler(), numeric_columns)]

                
x_trans = ColumnTransformer(transformers, remainder='passthrough')

## Applying Column Transformer
X = pd.DataFrame(x_trans.fit_transform(X))


## Label encoding --> Yes=1 , No=0
y_trans = LabelEncoder()
y = pd.DataFrame(y_trans.fit_transform(y.Churn), columns=["Churn"])



## Save feature names after one-hot encoding
feature_names = list(x_trans.named_transformers_['one_hot_encoder'] \
                            .get_feature_names(input_features=categorical_columns))
feature_names = feature_names + numeric_columns

In [None]:
X.shape, y.shape

In [None]:
X.head()

In [None]:
X.columns=feature_names
X.head().T

In [None]:
X.describe().T

## PCA


In [None]:
from sklearn.decomposition import PCA


In [None]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X) 
# In the transformer, we only standardized the numeric features.
# PCA works better if the entire dataset is standardized.

In [None]:
pca = PCA(n_components=2) # reducing the dimensions of the dataset to 2 for visualization 

In [None]:
X2D = pca.fit_transform(X_scaled)

In [None]:
X2D = pd.DataFrame(X2D,columns=['PC1','PC2'])
X2D

In [None]:
%matplotlib inline
plt.figure(figsize=(12,10))
sns.scatterplot(X2D.iloc[:,0], X2D.iloc[:,1], hue=y.iloc[:,0])
plt.show()

**The data is not linearly separable as the churn and non-churn classes overlap each other**


# Splitting the data into train & test data 

In [None]:
X.shape, y.shape

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,shuffle=True,
                                                    stratify=y, random_state=0)


X_train.shape, X_test.shape, y_train.shape,  y_test.shape 


# XGBoost

### Model Training

In [None]:
from sklearn.model_selection import RandomizedSearchCV

In [None]:
XGB = XGBClassifier(random_state=0)

In [None]:
param_grid = {
    'silent': [False],
    'max_depth': [ 4, 5, 6, 7, 8],
    'learning_rate': [0.01,0.05, 0.08, 0.1, 0.1, 0.2, 0.3],
    'subsample': [0.7, 0.8, 0.9, 1.0],
    'colsample_bytree': [0.7, 0.8, 0.9, 1.0],
    'colsample_bylevel': [0.7, 0.8, 0.9, 1.0],
    'min_child_weight': [0.5, 1.0, 3.0],
    'gamma': [0, 0.25, 0.5, 1.0],
    'reg_lambda': [0.1, 1.0, 5.0, 10.0, 50.0, 100.0],
    'n_estimators': [50, 100, 150, 200, 300],
    'scale_pos_weight': [1, 1.5, 2, 2.5, 3],
    'max_delta_step': [1, 2, 3]
}

clf = XGBClassifier(objective = 'binary:logistic', random_state=0)

In [None]:
rs_clf = RandomizedSearchCV(clf, param_grid, n_iter=10,
                            n_jobs=4, verbose=2, cv=3,
                            scoring= 'roc_auc', refit=True, random_state=0)

import time
print("Randomized search..")
search_time_start = time.time()
rs_clf.fit(X_train, y_train)
print("Randomized search time:", time.time() - search_time_start,"seconds.")

In [None]:
rs_clf.best_estimator_

In [None]:
rs_clf.best_params_

In [None]:
rs_clf.best_score_

In [None]:
positive = np.count_nonzero(y_train)
negative = y_train.shape[0] - positive
positive, negative

In [None]:
estimate = negative / positive
estimate # Weight for cost-sensitivity parameter of xgboost. (scale_pos_weight)

In [None]:
xgb_clf = XGBClassifier(colsample_bylevel=0.8, colsample_bytree=0.9, gamma=1.0,
              learning_rate=0.08, max_delta_step=1, max_depth=4,
              min_child_weight=1.0, n_estimators=50, reg_lambda=50.0,
              scale_pos_weight=estimate, silent=False, subsample=1.0,objective = 'binary:logistic')
xgb_clf.fit(X_train, y_train)

### Model Evaluation

In [None]:
from sklearn.metrics import accuracy_score,precision_score, recall_score, f1_score
from sklearn.metrics import roc_curve, roc_auc_score
from sklearn.metrics import plot_confusion_matrix, plot_roc_curve, plot_precision_recall_curve
from sklearn.metrics import classification_report, precision_recall_curve
%matplotlib inline

In [None]:
y_pred = xgb_clf.predict(X_test)   
y_prob = xgb_clf.predict_proba(X_test)

In [None]:
y_prob_1 = y_prob[:, 1] # probabilties of positive class i.e Churn

In [None]:
print("Accuracy Score = ",accuracy_score(y_test, y_pred))

*Since the data is imbalanced, accuracy is not a good measure to evaluate the model.*

*So we don't mind if the accuracy of the model is low*

In [None]:
print(" Roc_Auc_Score = ",np.round(roc_auc_score(y_test, y_prob_1),3))

In [None]:
plot_roc_curve(xgb_clf, X_test, y_test)
plt.plot([0,1],[0,1], 'k--',label="RandomClassifier(AUC = 0.5)")
plt.legend()

Area under the curve = 0.85 indicates that the classifier is very good.

This is so because the classifier is able to detect more numbers of True positives and True negatives than False negatives and False positives.

In [None]:
plt.figure(figsize=(10,10))
plot_confusion_matrix(xgb_clf, X_test, y_test)

In [None]:
Precision =  precision_score(y_test, y_pred)
Recall = recall_score(y_test, y_pred)
F1 = f1_score(y_test, y_pred)

print(f'Precision : {Precision}\nRecall : {Recall}\nF1 Score : {F1}')

**It's okay if we sacrifice precision for a good recall here, because:**

- we can afford misclassifying a non-churner as a churn customer and puuting extra efforts in oreder to retain him/her for the company 
- But we cannot afford missing out churn customers and lose them.

- So recall is more important than precision in this case.

In [None]:
plot_precision_recall_curve(xgb_clf, X_test, y_test)
plt.plot(Recall,Precision,marker="o",c='red')
plt.grid()

In [None]:
print(classification_report(y_test, y_pred))

In [None]:
precisions, recalls, thresholds = precision_recall_curve(y_test, y_prob_1)
def plot_precision_recall_vs_threshold(precisions, recalls, thresholds):
    %matplotlib inline
    plt.figure(figsize=(10,6))
    plt.plot(thresholds, precisions[:-1], "b--", label="Precision")
    plt.plot(thresholds, recalls[:-1], 'g-', label="Recall")
    plt.legend()
    plt.grid(True)
    plt.show()

In [None]:
plot_precision_recall_vs_threshold(precisions, recalls, thresholds)

The reason for the classifier is not achieving higher F1 scores (like > 95%) is because the churn and no-churn classes are overlapping making it difficult for the classifier to identify a perfect decision boundary without sacrificing either precision or recall.

## Important Features

In [None]:
from xgboost import plot_importance
fig, ax = plt.subplots(figsize=(10,8))
plot_importance(xgb_clf, ax=ax)

# Application

Normally in business setting, we will use the prediction score, rather than the class prediction. This score is used to associate with the subscriber profiles, and can be used to adjust for campaign targeting.

In [None]:
churn_probs = xgb_clf.predict_proba(X)[:,1]
df["Predicted Churn Prob"] = churn_probs
sns.distplot(df["Predicted Churn Prob"])

In [None]:
df[['customerID', 'Churn', 'Predicted Churn Prob']].tail(10)

----------------------------------------**THANK YOU**----------------------------------------------