# Business Problem: 
***Get insights from the dataset of No-Churn Telecom, to find out why the more Customers are leaving the company than expected and what can be done to improve the current situation***

# Objective: 
- In this notebook we use the Processed Data that we have transformed from Raw Data and built a Machine Learning Model.
- Here we use "No-Churn_Telecom_Europe_churnFlag_only_final01.xlsx"

**Steps in Train_Model**

Step 1 : Import the libraries

Step 2 : Import the Processed data-set

Step 3 : Split the Processed data-set

Step 4 : Try Different Machine Learning Model

Step 5 : Select the Model,Hypertune it and Train it

Step 6 : Export the Trained Model

# Step 1 : Import the libraries

In [1]:
# Import the libraries
import numpy as np  #NumPy is the fundamental package for scientific computing with Python.
import pandas as pd #andas is for data manipulation and analysis. 
import matplotlib.pyplot as plt #Matplotlib is a Python 2D plotting library which produces publication quality figures.
import seaborn as sns #Seaborn is a Python data visualization library based on matplotlib
%matplotlib inline
import joblib 

# Step 2 : Import the Processed data-set

In [2]:
#pd.set_option('display.height', 500)
#pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
df = pd.read_excel("No-Churn_Telecom_Europe_churnFlag_only_final01.xlsx")
print(df.shape)
df.head()

(4617, 7)


Unnamed: 0,International_Plan,International_calls,International_Mins,VMail_Plan,Total_charges,CustServ_Calls,churn_flag
0,0,3,10.0,1,75.56,1,0
1,0,3,13.7,1,59.24,1,0
2,0,5,12.2,0,62.29,0,0
3,1,7,6.6,0,66.8,2,1
4,1,3,10.1,0,52.09,3,1


# Step 3 : Split the Processed data-set

In [3]:
# Create train and test splits
target_name = 'churn_flag'
X = df.drop('churn_flag', axis=1)
y = df[target_name]

In [4]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.22, random_state=123, stratify=y)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(3601, 6)
(1016, 6)
(3601,)
(1016,)


# Step 4 : Try Different Machine Learning Model

In [5]:
from sklearn.metrics import roc_auc_score

from sklearn.naive_bayes import GaussianNB
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import AdaBoostClassifier

# Import Different Models 
from sklearn.linear_model import LogisticRegression
from sklearn import svm, tree
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
import xgboost
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
import joblib 

# Python script for confusion matrix creation. 
from sklearn.metrics import confusion_matrix 
from sklearn.metrics import accuracy_score 
from sklearn.metrics import classification_report 

In [6]:
classifiers = []
model1 = xgboost.XGBClassifier()
classifiers.append(model1)
model2 = svm.SVC()
classifiers.append(model2)
model3 = tree.DecisionTreeClassifier()
classifiers.append(model3)
model4 = RandomForestClassifier()
classifiers.append(model4)
model5 = KNeighborsClassifier()
classifiers.append(model5)
model6 =GaussianNB()
classifiers.append(model6)
model7 =MLPClassifier(alpha=1, max_iter=1000)
classifiers.append(model7)
model8 = AdaBoostClassifier()
classifiers.append(model8)

In [7]:
for clf in classifiers:
    clf.fit(X_train, y_train)
    y_pred= clf.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    print("Accuracy of %s is %s"%(clf, acc))

Accuracy of XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1) is 0.9881889763779528
Accuracy of SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False) is 0.9438976377952756
Accuracy of DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                       max_depth=None, max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0

# Step 5 : Select the Model,Hypertune it and Train it

In [8]:
# Using 10 fold Cross-Validation to train our RandomForestClassifier
from sklearn.model_selection import cross_val_score

model4 = RandomForestClassifier()
scoring = 'roc_auc'
#The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its 
#best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal. 
scores = cross_val_score(model4 ,X, y, cv=10,scoring=scoring )
print(scores)
#The mean score and the 95% confidence interval of the score estimate are hence given by:
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

[0.99797107 0.99805928 0.99947071 0.99982357 0.99620677 0.99558927
 0.99475124 0.99802752 0.99743119 0.99973475]
Accuracy: 1.00 (+/- 0.00)


In [9]:
# xgboost Forest Model
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier()
rf.fit(X_train, y_train)
print('Confusion Matrix :')
print(confusion_matrix(y_test, rf.predict(X_test)))
print( 'Accuracy Score :',accuracy_score(y_test, rf.predict(X_test)) )
print( '---classification_report---')
print(classification_report(y_test, rf.predict(X_test)))

Confusion Matrix :
[[956   3]
 [  8  49]]
Accuracy Score : 0.9891732283464567
---classification_report---
              precision    recall  f1-score   support

           0       0.99      1.00      0.99       959
           1       0.94      0.86      0.90        57

    accuracy                           0.99      1016
   macro avg       0.97      0.93      0.95      1016
weighted avg       0.99      0.99      0.99      1016



In [10]:
# Using 10 fold Cross-Validation to train our  XGBClassifier
from sklearn.model_selection import cross_val_score

model1 = xgboost.XGBClassifier()
scoring = 'roc_auc'
#The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its 
#best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal. 
scores = cross_val_score(model1 ,X, y, cv=10,scoring=scoring)
print(scores)
#The mean score and the 95% confidence interval of the score estimate are
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

[0.99858857 0.99894143 0.99920607 0.99823571 0.99673606 0.9969566
 0.99276641 0.99678899 0.99844037 0.99938108]
Accuracy: 1.00 (+/- 0.00)


In [11]:
# xgboost Forest Model
import xgboost
xgb = xgboost.XGBClassifier(max_depth=16,#Used to control over-fitting as higher depth will allow model to learn relations very specific to a particular sample
                     n_estimators=100,
                     learning_rate=0.05,
                     booster='gbtree',
                     subsample=1, # regularization parameter,Lower values make the algorithm more conservative and prevents overfitting but too small values might lead to under-fitting.
                     colsample_bylevel=1,
                     colsample_bynode=1,
                     colsample_bytree=1, # it works better than other two,
                     min_child_weight=0, # can be 1,10,100 etc it parctically works,
                     reg_alpha= 0,
                     reg_lambda=1,
                     gamma=0,
                    scale_pos_weight=1,
                    objective='binary:logistic',
                        seed=1)
xgb.fit(X_train, y_train)
print('---Confusion Matrix---')
print(confusion_matrix(y_test, xgb.predict(X_test)))
print('\n')
print( 'Accuracy Score :-->',accuracy_score(y_test, xgb.predict(X_test)) )
print('\n')
print( '---classification_report---')
print(classification_report(y_test, xgb.predict(X_test)))

---Confusion Matrix---
[[952   7]
 [  5  52]]


Accuracy Score :--> 0.9881889763779528


---classification_report---
              precision    recall  f1-score   support

           0       0.99      0.99      0.99       959
           1       0.88      0.91      0.90        57

    accuracy                           0.99      1016
   macro avg       0.94      0.95      0.95      1016
weighted avg       0.99      0.99      0.99      1016



## Key Observations:
- In our passage of creating the Machine Learning Model, we found that (after using 10 k-fold cross validation) **XGBoost Classifier() algorithm proves to the winner among others**, both in terms of accuracy and speed.
- We **use k-fold cross validation because it is more accurate estimate of out-of-sample accuracy** and More "efficient" use of data (every observation is used for both training and testing) than train/test split.


## Handling Imbalanced Data with SMOTE
- In Machine Learning and Data Science we often come across a term called Imbalanced Data Distribution, generally happens when observations in one of the class are much higher or lower than the other classes.
- Standard ML techniques such as **Logistic Regression,Decision Tree, Logistic Regression, XGBoost** have a bias towards the majority class, and they tend to ignore the minority class. They tend only to predict the majority class,
- In more technical words, if we have imbalanced data distribution in our dataset then our model becomes more prone to the case when minority class has negligible or very lesser **Recall**.

Since SMOTE use Euclidean distance in its calculations, it is always better to fisrt 

In [12]:
from sklearn import preprocessing 
  
""" MIN MAX SCALER """
  
min_max_scaler = preprocessing.MinMaxScaler(feature_range =(0, 1)) 
  
# Scaled feature 
X = min_max_scaler.fit_transform(X) 
  
print ("\nAfter min max Scaling : \n",X) 


After min max Scaling : 
 [[0.         0.15       0.5        1.         0.71879268 0.11111111]
 [0.         0.15       0.685      1.         0.49590276 0.11111111]
 [0.         0.25       0.61       0.         0.53755804 0.        ]
 ...
 [0.         0.15       0.65       1.         0.49808795 0.11111111]
 [0.         0.15       0.715      1.         0.49617591 0.        ]
 [0.         0.2        0.605      0.         0.47869435 0.22222222]]


In [13]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.22, random_state=123, stratify=y)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(3601, 6)
(1016, 6)
(3601,)
(1016,)


In [14]:
print("Before OverSampling, counts of label '1': {}".format(sum(y_train == 1))) 
print("Before OverSampling, counts of label '0': {} \n".format(sum(y_train == 0))) 
  
# import SMOTE module from imblearn library 
# pip install imblearn (if you don't have imblearn in your system) 
from imblearn.over_sampling import SMOTE 
sm = SMOTE(random_state = 123) 
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel()) 
  
print('After OverSampling, the shape of train_X: {}'.format(X_train_res.shape)) 
print('After OverSampling, the shape of train_y: {} \n'.format(y_train_res.shape)) 
  
print("After OverSampling, counts of label '1': {}".format(sum(y_train_res == 1))) 
print("After OverSampling, counts of label '0': {}".format(sum(y_train_res == 0))) 

Before OverSampling, counts of label '1': 201
Before OverSampling, counts of label '0': 3400 



Using TensorFlow backend.


After OverSampling, the shape of train_X: (6800, 6)
After OverSampling, the shape of train_y: (6800,) 

After OverSampling, counts of label '1': 3400
After OverSampling, counts of label '0': 3400


In [15]:
# xgboost Forest Model
import xgboost
xgb_after_smote = xgboost.XGBClassifier(max_depth=11, n_estimators=600,
                     learning_rate=0.1,
                     booster='gbtree',
                     subsample=1, # regularization parameter,Lower values make the algorithm more conservative and prevents overfitting but too small values might lead to under-fitting.
                     colsample_bylevel=1,
                     colsample_bynode=1,
                     colsample_bytree=1, # it works better than other two,
                     min_child_weight=0, # can be 1,10,100 etc it parctically works,
                     reg_alpha= 0,
                     reg_lambda=1,
                     gamma=0,
                    scale_pos_weight=1,
                    objective='binary:logistic',
                        seed=1)
xgb_after_smote.fit(X_train_res, y_train_res)
print('---Confusion Matrix---')
print(confusion_matrix(y_test, xgb_after_smote.predict(X_test)))
print('\n')
print( 'Accuracy Score :-->',accuracy_score(y_test, xgb_after_smote.predict(X_test)) )
print('\n')
print( '---classification_report---')
print(classification_report(y_test, xgb_after_smote.predict(X_test)))

---Confusion Matrix---
[[955   4]
 [  3  54]]


Accuracy Score :--> 0.9931102362204725


---classification_report---
              precision    recall  f1-score   support

           0       1.00      1.00      1.00       959
           1       0.93      0.95      0.94        57

    accuracy                           0.99      1016
   macro avg       0.96      0.97      0.97      1016
weighted avg       0.99      0.99      0.99      1016



In [16]:
# Using 10 fold Cross-Validation to train our  XGBClassifier
from sklearn.model_selection import cross_val_score

model1 = xgboost.XGBClassifier(max_depth=11, n_estimators=600,
                     learning_rate=0.1,
                     booster='gbtree',
                     subsample=1, # regularization parameter,Lower values make the algorithm more conservative and prevents overfitting but too small values might lead to under-fitting.
                     colsample_bylevel=1,
                     colsample_bynode=1,
                     colsample_bytree=1, # it works better than other two,
                     min_child_weight=0, # can be 1,10,100 etc it parctically works,
                     reg_alpha= 0,
                     reg_lambda=1,
                     gamma=0,
                    scale_pos_weight=1,
                    objective='binary:logistic',
                        seed=1)

#The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its 
#best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal. 
scores = cross_val_score(model1 ,X, y, cv=10,scoring='f1_micro')
print(scores)
#The mean score and the 95% confidence interval of the score estimate are
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

[0.98268398 0.99134199 0.98917749 0.99350649 0.98701299 0.98701299
 0.98051948 0.98698482 0.99132321 0.99566161]
Accuracy: 0.99 (+/- 0.01)


## Train and fit using all the Dataset
- This is done not to loose any given information in our datasets.

In [17]:
print("Before OverSampling, counts of label '1': {}".format(sum(y == 1))) 
print("Before OverSampling, counts of label '0': {} \n".format(sum(y == 0))) 
  
# import SMOTE module from imblearn library 
# pip install imblearn (if you don't have imblearn in your system) 
from imblearn.over_sampling import SMOTE 
sm = SMOTE(random_state = 123) 
X_res, y_res = sm.fit_sample(X, y.ravel()) 
  
print('After OverSampling, the shape of train_X: {}'.format(X_res.shape)) 
print('After OverSampling, the shape of train_y: {} \n'.format(y_res.shape)) 
  
print("After OverSampling, counts of label '1': {}".format(sum(y_res == 1))) 
print("After OverSampling, counts of label '0': {}".format(sum(y_res == 0))) 

Before OverSampling, counts of label '1': 258
Before OverSampling, counts of label '0': 4359 

After OverSampling, the shape of train_X: (8718, 6)
After OverSampling, the shape of train_y: (8718,) 

After OverSampling, counts of label '1': 4359
After OverSampling, counts of label '0': 4359


In [18]:
# xgboost Forest Model
import xgboost
xgb_after_smote_final = xgboost.XGBClassifier(max_depth=11, n_estimators=600,
                     learning_rate=0.1,
                     booster='gbtree',
                     subsample=1, # regularization parameter,Lower values make the algorithm more conservative and prevents overfitting but too small values might lead to under-fitting.
                     colsample_bylevel=1,
                     colsample_bynode=1,
                     colsample_bytree=1, # it works better than other two,
                     min_child_weight=0, # can be 1,10,100 etc it parctically works,
                     reg_alpha= 0,
                     reg_lambda=1,
                     gamma=0,
                    scale_pos_weight=1,
                    objective='binary:logistic',
                        seed=1)
xgb_after_smote_final.fit(X_res, y_res)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=11,
              min_child_weight=0, missing=None, n_estimators=600, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=1,
              silent=None, subsample=1, verbosity=1)

In [19]:
xgb_after_smote_final.predict([[1.        , 0.23174211, 0.61196845, 0.        , 0.38022451,
        0.51498247]])

array([1], dtype=int64)

In [20]:
y_res

array([0, 0, 0, ..., 1, 1, 1], dtype=int64)

### Key Observations:
 After Using SMOTE Technique & 10 fold Cross-Validation
- Accuracy of XGBClassifier() is after  0.99 (+/- 0.01)
- Recall of Minority class(i.e Yes churn_flag) id 95%
- Precision of Minority class(i.e Yes churn_flag) id 93%

Since our focus class is Minority class(i.e Yes churn_flag), we are more concerned about Recall here.

### Why we use k-fold Cross-Validation?
#### Comparing cross-validation to train/test split

Advantages of cross-validation:

- More accurate estimate of out-of-sample accuracy
- More "efficient" use of data (every observation is used for both training and testing)

Advantages of train/test split:

- Runs K times faster than K-fold cross-validation
- Simpler to examine the detailed results of the testing process

# Step 6 : Export the Trained Model

In [21]:
# Save the model as a pickle in a file 
joblib.dump(xgb_after_smote_final, 'Xbgboost_after_smote_Classifier_No-Churn Telecom_predict_Churn_Flag_real.pkl')       
#joblib.dump to serialize an object hierarchy

['Xbgboost_after_smote_Classifier_No-Churn Telecom_predict_Churn_Flag_real.pkl']