## Telcom Churn Prediction 

- The Telcom Churn Prediction project aims to predict customer churn in a telecommunications dataset.
- Customer churn refers to the phenomenon where customers discontinue their services. 
- Predicting churn is crucial for businesses to implement proactive strategies to retain customers.



### Importing Libraries

In [1]:
import os
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Importing Dataset

In [2]:
dataset = pd.read_csv('Telcom Data.csv')
dataset.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


### Basic Data Description

In [3]:
dataset.shape

(7043, 21)

In [4]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


In [5]:
dataset['Churn'].value_counts()

No     5174
Yes    1869
Name: Churn, dtype: int64

### Data Preprocessing

In [6]:
dataset['Churn'] = dataset['Churn'].replace({'Yes':1, "No":0})

In [7]:
dataset.columns

Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],
      dtype='object')

In [8]:
for i in dataset.columns:
  print("***********************************************",i,
        "*******************************************************")
  print()
  print(set(dataset[i].tolist()))
  print()

*********************************************** customerID *******************************************************

{'7567-ECMCM', '3545-CNWRG', '1820-TQVEV', '0330-BGYZE', '1766-GKNMI', '1932-UEDCX', '5985-TBABQ', '8387-UGUSU', '3301-LSLWQ', '8640-SDGKB', '8901-UPRHR', '2180-DXNEG', '4322-RCYMT', '9801-NOSHQ', '5868-CZJDR', '8212-CRQXP', '1133-KXCGE', '9972-EWRJS', '2606-PKWJB', '1697-LYYYX', '0769-MURVM', '5639-NTUPK', '0378-XSZPU', '1269-FOYWN', '2530-FMFXO', '4957-TREIR', '6729-GDNGC', '0722-SVSFK', '9330-DHBFL', '9625-QSTYE', '4139-SUGLD', '4942-VZZOM', '0930-EHUZA', '4915-BFSXL', '6834-NXDCA', '0106-UGRDO', '5440-FLBQG', '1552-TKMXS', '3580-GICBM', '8444-WRIDW', '2599-CZABP', '5669-SRAIP', '1265-XTECC', '8603-IJWDN', '5876-QMYLD', '2074-GUHPQ', '2325-ZUSFD', '6894-LFHLY', '1480-IVEVR', '4767-HZZHQ', '1154-HYWWO', '9600-UDOPK', '7434-SHXLS', '0623-IIHUG', '9746-YKGXB', '8257-RZAHR', '8593-WHYHV', '0303-WMMRN', '8263-OKETD', '6537-QLGEX', '2346-DJQTB', '9269-CQOOL', '6583-KQJLK', '

### Installing PyCaret Library

In [9]:
!pip install pycaret

Collecting pycaret
  Obtaining dependency information for pycaret from https://files.pythonhosted.org/packages/eb/43/ec8d59a663e0a1a67196b404ec38ccb0051708bad74a48c80d96c61dd0e5/pycaret-3.2.0-py3-none-any.whl.metadata
  Downloading pycaret-3.2.0-py3-none-any.whl.metadata (17 kB)
Collecting category-encoders>=2.4.0 (from pycaret)
  Obtaining dependency information for category-encoders>=2.4.0 from https://files.pythonhosted.org/packages/7f/e5/79a62e5c9c9ddbfa9ff5222240d408c1eeea4e38741a0dc8343edc7ef1ec/category_encoders-2.6.3-py2.py3-none-any.whl.metadata
  Downloading category_encoders-2.6.3-py2.py3-none-any.whl.metadata (8.0 kB)
Collecting deprecation>=2.1.0 (from pycaret)
  Downloading deprecation-2.1.0-py2.py3-none-any.whl (11 kB)
Collecting kaleido>=0.2.1 (from pycaret)
  Downloading kaleido-0.2.1-py2.py3-none-win_amd64.whl (65.9 MB)
     ---------------------------------------- 0.0/65.9 MB ? eta -:--:--
     ---------------------------------------- 0.0/65.9 MB 1.9 MB/s eta 0:00:35

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tables 3.8.0 requires blosc2~=2.0.0, which is not installed.


In [10]:
from pycaret.classification import *

### Data Splitting

In [11]:
train_data = dataset.sample(frac = 0.8, random_state=101).reset_index(drop=True)
test_data = dataset.drop(train_data.index).reset_index(drop=True)

In [12]:
print(train_data.shape, test_data.shape)

(5634, 21) (1409, 21)


### Model Building with PyCaret 

The setup() function configures the dataset, specifies the target variable ('Churn'), and enables Principal Component Analysis (PCA).

In [13]:
exp_clf = setup(data=dataset, target='Churn', pca=True, pca_components=0.95, session_id=123)

Unnamed: 0,Description,Value
0,Session id,123
1,Target,Churn
2,Target type,Binary
3,Original data shape,"(7043, 21)"
4,Transformed data shape,"(7043, 3)"
5,Transformed train set shape,"(4930, 3)"
6,Transformed test set shape,"(2113, 3)"
7,Ordinal features,5
8,Numeric features,3
9,Categorical features,17


In [14]:
# exp_clf1 = setup(data=dataset, target='Churn', fix_imbalance_method='SMOTE', session_id=124)

In [15]:
# Comparing all model
compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
gbc,Gradient Boosting Classifier,0.7913,0.8105,0.406,0.6782,0.5066,0.3852,0.4061,0.552
ridge,Ridge Classifier,0.7909,0.0,0.3769,0.6962,0.4879,0.3712,0.3992,0.238
ada,Ada Boost Classifier,0.7864,0.8031,0.3731,0.678,0.4802,0.3601,0.386,0.376
lda,Linear Discriminant Analysis,0.7858,0.8081,0.4328,0.6451,0.5169,0.3863,0.3996,0.349
lr,Logistic Regression,0.7844,0.8081,0.445,0.6353,0.5222,0.3887,0.3996,2.373
lightgbm,Light Gradient Boosting Machine,0.7822,0.8,0.435,0.6291,0.5136,0.3795,0.3906,0.864
qda,Quadratic Discriminant Analysis,0.7686,0.8051,0.513,0.5721,0.5398,0.3861,0.3878,0.204
nb,Naive Bayes,0.7682,0.8036,0.5069,0.5719,0.5365,0.3829,0.3847,0.309
knn,K Neighbors Classifier,0.7669,0.758,0.4664,0.5771,0.5149,0.364,0.3681,1.337
rf,Random Forest Classifier,0.7554,0.7658,0.448,0.5492,0.4926,0.3338,0.3373,0.459


### Creating a Gradient Boosting Classifier

In [16]:
# create a model
gbc = create_model('gbc')

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.7972,0.8284,0.4462,0.6744,0.537,0.414,0.4285
1,0.7911,0.8083,0.4308,0.6588,0.5209,0.3947,0.4093
2,0.7647,0.7811,0.313,0.6119,0.4141,0.2857,0.3108
3,0.8032,0.8148,0.4198,0.7237,0.5314,0.4178,0.4426
4,0.787,0.8084,0.4351,0.6477,0.5205,0.3904,0.4031
5,0.7931,0.7868,0.4275,0.6747,0.5234,0.3996,0.4166
6,0.7951,0.8325,0.374,0.7206,0.4925,0.3798,0.4119
7,0.8012,0.8012,0.4122,0.72,0.5243,0.4101,0.4356
8,0.8032,0.8274,0.4656,0.6932,0.5571,0.4368,0.4511
9,0.7769,0.8161,0.3359,0.6567,0.4444,0.3226,0.351


### Hyperparameter Tuning

In [17]:
tuned_gbc = tune_model(gbc)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.7972,0.8332,0.4154,0.6923,0.5192,0.4007,0.4217
1,0.7911,0.8012,0.4,0.6753,0.5024,0.381,0.4019
2,0.7708,0.7934,0.2977,0.65,0.4084,0.2898,0.3238
3,0.787,0.8179,0.3206,0.7241,0.4444,0.3362,0.379
4,0.785,0.8057,0.3511,0.6866,0.4646,0.3473,0.3778
5,0.7972,0.7941,0.3893,0.7183,0.505,0.3912,0.4203
6,0.7972,0.8377,0.3511,0.7541,0.4792,0.3734,0.4155
7,0.8032,0.8092,0.374,0.7656,0.5026,0.3975,0.4371
8,0.787,0.8288,0.3817,0.6757,0.4878,0.3662,0.39
9,0.7627,0.8126,0.2443,0.64,0.3536,0.2424,0.2847


Fitting 10 folds for each of 10 candidates, totalling 100 fits
Original model was better than the tuned model, hence it will be returned. NOTE: The display metrics are for the tuned model (not the original one).


### Model Evaluation

In [18]:
evaluate_model(gbc)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Pipeline Plot', 'pipelin…

In [19]:
predict_model(gbc)

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,Gradient Boosting Classifier,0.7832,0.8121,0.3476,0.6794,0.4599,0.3416,0.3716


Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,prediction_label,prediction_score
2937,5312-IRCFR,Female,0,Yes,Yes,64,Yes,Yes,Fiber optic,No,...,No,Yes,One year,Yes,Electronic check,92.849998,5980.75,0,0,0.9048
3276,4210-QFJMF,Female,0,No,No,4,Yes,No,Fiber optic,No,...,No,Yes,Month-to-month,Yes,Electronic check,79.150002,317.25,1,1,0.6998
4374,2718-YSKCS,Male,0,Yes,Yes,71,Yes,No,No,No internet service,...,No internet service,No internet service,Two year,Yes,Bank transfer (automatic),19.600000,1387.45,0,0,0.9776
4375,9896-UYMIE,Male,0,No,No,66,Yes,Yes,Fiber optic,Yes,...,Yes,Yes,One year,Yes,Bank transfer (automatic),114.300003,7383.7,0,0,0.9390
237,9903-LYSAB,Male,0,Yes,No,18,Yes,Yes,Fiber optic,No,...,No,No,Month-to-month,Yes,Electronic check,73.150002,1305.95,0,0,0.6802
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4496,9489-JMTTN,Female,0,Yes,Yes,72,Yes,Yes,DSL,Yes,...,Yes,Yes,Two year,No,Credit card (automatic),89.750000,6595.9,0,0,0.9048
921,8942-DBMHZ,Male,0,No,No,12,Yes,No,No,No internet service,...,No internet service,No internet service,Month-to-month,No,Mailed check,20.450001,255.35,0,0,0.9074
5904,9402-CXWPL,Female,0,No,No,70,Yes,Yes,Fiber optic,No,...,Yes,Yes,One year,No,Electronic check,98.900002,6838.6,0,0,0.8913
3088,5751-USDBL,Male,0,Yes,Yes,46,Yes,No,DSL,Yes,...,Yes,Yes,Two year,No,Mailed check,81.000000,3846.35,0,0,0.7811


### Model Prediction

In [20]:
unseen_prediction = predict_model(gbc, data=test_data)
unseen_prediction

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,Gradient Boosting Classifier,0.802,0.8445,0.4199,0.7339,0.5342,0.4201,0.4465


Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,prediction_label,prediction_score
0,2320-JRSDE,Female,0,Yes,Yes,1,Yes,No,No,No internet service,...,No internet service,No internet service,Month-to-month,Yes,Electronic check,19.900000,19.9,1,0,0.5923
1,2087-QAREY,Female,0,Yes,No,22,Yes,No,DSL,No,...,No,No,Month-to-month,Yes,Mailed check,54.700001,1178.75,0,0,0.8096
2,0601-WZHJF,Male,0,Yes,No,14,No,No phone service,DSL,No,...,Yes,Yes,Month-to-month,No,Electronic check,46.349998,667.7,1,0,0.7387
3,4423-JWZJN,Male,0,Yes,Yes,64,Yes,Yes,Fiber optic,No,...,No,Yes,One year,No,Credit card (automatic),90.250000,5629.15,0,0,0.9407
4,5143-WMWOG,Male,0,No,No,1,Yes,No,No,No internet service,...,No internet service,No internet service,Month-to-month,No,Electronic check,19.950001,19.95,1,0,0.6575
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1404,6840-RESVB,Male,0,Yes,Yes,24,Yes,Yes,DSL,Yes,...,Yes,Yes,One year,Yes,Mailed check,84.800003,1990.5,0,0,0.6241
1405,2234-XADUH,Female,0,Yes,Yes,72,Yes,Yes,Fiber optic,No,...,Yes,Yes,One year,Yes,Credit card (automatic),103.199997,7362.9,0,0,0.8876
1406,4801-JZAZL,Female,0,Yes,Yes,11,No,No phone service,DSL,Yes,...,No,No,Month-to-month,Yes,Electronic check,29.600000,346.45,0,0,0.8415
1407,8361-LTMKD,Male,1,Yes,No,4,Yes,Yes,Fiber optic,No,...,No,No,Month-to-month,Yes,Mailed check,74.400002,306.6,1,1,0.5972


### Saving the Model

In [21]:
save_model(gbc, 'gbc_model')

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=Memory(location=None),
          steps=[('numerical_imputer',
                  TransformerWrapper(exclude=None,
                                     include=['SeniorCitizen', 'tenure',
                                              'MonthlyCharges'],
                                     transformer=SimpleImputer(add_indicator=False,
                                                               copy=True,
                                                               fill_value=None,
                                                               keep_empty_features=False,
                                                               missing_values=nan,
                                                               strategy='mean',
                                                               verbose='deprecated'))),
                 ('categorical_imputer',
                  TransformerWrapper(e...
                                             criterion='friedman_mse',

The Telcom Churn Prediction involves loading and exploring the dataset, preprocessing data, and building a predictive model using the PyCaret library. The Gradient Boosting Classifier is identified as a suitable model, and its hyperparameters are tuned for improved performance. The model is evaluated, and predictions are made on both training and unseen test data. The final model is saved for potential deployment in real-world scenarios.
