**Customer Experience Prediction**

* **Understanding the problem statement**

Globally, the number of mobile-cellular subscriptions is approaching the number of people
on the planet, with emerging countries accounting for more than three-quarters of the
total. GPS navigation, voice and text over data, and social media exchanges are just a
few instances of how we are becoming increasingly reliant on our mobile phones. We
expect to be online at all times because our business and personal lives would be
disrupted if we weren't. Telecommunications operators (Telcos) are struggling to match
these high expectations in a market where traditional phone and text plans are being
phased out in favor of data offerings that support a wide range of mobile apps. For telcos,
having a clear, up-to-date understanding of customer experience and satisfaction is a
critical competitive advantage. Telcos, on the other hand, face the issue of coping with
massive amounts of data created by mobile consumers every second.

* **Main Goal of the Project**:
To measure and predict the quality of experience (QoE) that a user has while using a telecom network — in real time, using data.

1. This will help the company:

2. Fix problems faster (like slow speed or call drops).

3. Keep customers happy.

4. Reduce customer churn (people switching to other companies).

* **Data Collection**

Source - Kaggle <br>
Link - https://www.kaggle.com/datasets/blastchar/telco-customer-churn

* **Import Required Packages & Data Frame**

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings 
warnings.filterwarnings("ignore")

In [2]:
df = pd.read_csv("E:/users/USER/Desktop/DataAnalyticsAndgenAI/ML Project/Telecom/nootbook/WA_Fn-UseC_-Telco-Customer-Churn.csv")

* **Profile Of Data**

In [3]:
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [4]:
df.columns

Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],
      dtype='object')

In [5]:
df.shape

(7043, 21)

In [6]:
df.dtypes

customerID           object
gender               object
SeniorCitizen         int64
Partner              object
Dependents           object
tenure                int64
PhoneService         object
MultipleLines        object
InternetService      object
OnlineSecurity       object
OnlineBackup         object
DeviceProtection     object
TechSupport          object
StreamingTV          object
StreamingMovies      object
Contract             object
PaperlessBilling     object
PaymentMethod        object
MonthlyCharges      float64
TotalCharges         object
Churn                object
dtype: object

In [7]:
df.duplicated().sum()

np.int64(0)

In [8]:
df.isnull().sum()

customerID          0
gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
Churn               0
dtype: int64

In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


This data frame has't any null values and not contain any duplicate row

* **Statistical based analysis**

In [10]:
df.describe()

Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges
count,7043.0,7043.0,7043.0
mean,0.162147,32.371149,64.761692
std,0.368612,24.559481,30.090047
min,0.0,0.0,18.25
25%,0.0,9.0,35.5
50%,0.0,29.0,70.35
75%,0.0,55.0,89.85
max,1.0,72.0,118.75


In [11]:
df.corr(numeric_only=True)

Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges
SeniorCitizen,1.0,0.016567,0.220173
tenure,0.016567,1.0,0.2479
MonthlyCharges,0.220173,0.2479,1.0


Q. What percentage of customers are Senior Citizens?

In [12]:
#0 => No senior citizens
#1 => Senior citizens

print("The percentage of customers are senior citizens are",len(df[df["SeniorCitizen"] == 1]) / len(df) * 100,"%")

The percentage of customers are senior citizens are 16.21468124378816 %


Q. Which InternetService type (DSL, Fiber optic, No) has the highest churn rate

In [13]:
#Churn => "Yes"
#Churn not => "NO"
df.groupby(df[df["Churn"] == "Yes"]["InternetService"])["Churn"].count().sort_values(ascending=False)

InternetService
Fiber optic    1297
DSL             459
No              113
Name: Churn, dtype: int64

Q. What is the churn rate for customers with and without TechSupport?

In [14]:
df.groupby("TechSupport")["Churn"].count()

TechSupport
No                     3473
No internet service    1526
Yes                    2044
Name: Churn, dtype: int64

**Find Isuue <br> Tech support was not provided to many churn customers**

Q. How many Senior Citizens are in the dataset?

In [18]:
#NOT = 0
#YES = 1
print(f"{len(df[df["SeniorCitizen"] == 0])} Customers was senior citizens")

5901 Customers was senior citizens


Q. What is the correlation between tenure and churn?



In [24]:
#No -> 0
#Yes -> 1
df["Churn"] = df["Churn"].map({"No":0,"Yes":1})

In [27]:
print(f"Correlation between tenure and churn: {df["Churn"].corr(df["tenure"])}")
print("Customers who have been with the company for a short period are more likely to churn.")

Correlation between tenure and churn: -0.3522286701130777
Customers who have been with the company for a short period are more likely to churn.


Q. How does Contract type (Month-to-month, One year, Two year) affect churn?

In [36]:
df[df['Churn'] == 1].groupby('Contract').size()

Contract
Month-to-month    1655
One year           166
Two year            48
dtype: int64