In [8]:
#Importing libraries
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
# Input data files are available in the "../input/" directory.
import os
import matplotlib.pyplot as plt#visualization
from PIL import  Image
%matplotlib inline
import pandas as pd
import seaborn as sns#visualization
import itertools
import warnings
warnings.filterwarnings("ignore")
import io
import matplotlib.ticker as mtick # For specifying the axes tick format 
import matplotlib.pyplot as plt#visualization
import plotly.offline as py#visualization
py.init_notebook_mode(connected=True)#visualization
import plotly.graph_objs as go#visualization
import plotly.tools as tls#visualization
import plotly.figure_factory as ff#visualization

In [None]:
df = pd.read_excel('customer_retention_dataset.xlsx')


# Data Information
Customer attrition, also known as customer churn, customer turnover, or customer defection, is the loss of clients or customers.

Customer churn is a key metric for businesses. Service based industries like subscription based TV companies, telecommunication companies, internet service providers, insurance companies benefit from analysing their customer data to identify important features that may be attributed to customers leaving the service. The customer-relationship management departments of companies handle customer retention and defection as maintaining a pre-existing customer or recovering a former customer is more cost efficient, as opposed to obtaining new clients.

The distinction between voluntary and involuntary customer churn is often defined by the companies. Customers who switch voluntarily to another service fall under the voluntary churn category, whereas the customer who switch due to relocation are often involuntary and fall under the involuntary churn category. Companies focus on voluntary churn as they have more control on their customer interactions.

Churn prediction falls under predictive analytics which assess customer interaction with the company and find their churn risk. Retention marketing programs focus on the customers who have high risk of churn using these analytics.

Customer churn occurs when customers or subscribers stop doing business with a company or service. Also known as customer attrition, customer churn is a critical metric because it is much less expensive to retain existing customers than it is to acquire new customers – earning business from new customers means working leads all the way through the sales funnel, utilizing your marketing and sales resources throughout the process. Customer retention, on the other hand, is generally more cost-effective, as you have already earned the trust and loyalty of existing customers.



# Dataset

The data set includes information about:
Customers who left within the last month – the column is called Churn
Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges
Demographic info about customers – gender, age range, and if they have partners and dependents
The data set columns are as following:
customerID : Unique Id for customers
gender : Whether the customer is a male or a female
SeniorCitizen : Whether the customer is a senior citizen or not (1, 0)
Partner : Whether the customer has a partner or not (Yes, No)
Dependents : Whether the customer has dependents or not (Yes, No)
tenure : Number of months the customer has stayed with the company
PhoneService : Whether the customer has a phone service or not (Yes, No)
MultipleLines : Whether the customer has multiple lines or not (Yes, No, No phone service)
InternetService : Customer’s internet service provider (DSL, Fiber optic, No)
OnlineSecurity : Whether the customer has online security or not (Yes, No, No internet service)
OnlineBackup : Whether the customer has online backup or not (Yes, No, No internet service)
DeviceProtection : Whether the customer has device protection or not (Yes, No, No internet service)
TechSupport : Whether the customer has tech support or not (Yes, No, No internet service)
StreamingTV : Whether the customer has streaming TV or not (Yes, No, No internet service)
StreamingMovies : Whether the customer has streaming movies or not (Yes, No, No internet service)
Contract : The contract term of the customer (Month-to-month, One year, Two year)
PaperlessBilling : Whether the customer has paperless billing or not (Yes, No)
PaymentMethod : The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))
MonthlyCharges : The amount charged to the customer monthly
TotalCharges : The total amount charged to the customer
Churn : Whether the customer churned or not (Yes or No)

In [None]:
df.head()

In [None]:
df.info()

In [None]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
customerID          7043 non-null object
gender              7043 non-null object
SeniorCitizen       7043 non-null int64
Partner             7043 non-null object
Dependents          7043 non-null object
tenure              7043 non-null int64
PhoneService        7043 non-null object
MultipleLines       7043 non-null object
InternetService     7043 non-null object
OnlineSecurity      7043 non-null object
OnlineBackup        7043 non-null object
DeviceProtection    7043 non-null object
TechSupport         7043 non-null object
StreamingTV         7043 non-null object
StreamingMovies     7043 non-null object
Contract            7043 non-null object
PaperlessBilling    7043 non-null object
PaymentMethod       7043 non-null object
MonthlyCharges      7043 non-null float64
TotalCharges        7043 non-null object
Churn               7043 non-null object
dtypes: float64(1), int64(2), object(18)
memory usage: 1.1+ MB

In [None]:
Data Preprocessing
Since the TotalCharges is an object, we need to convert it to a numeric value

In [None]:
df.TotalCharges = pd.to_numeric(df.TotalCharges, errors='coerce')


Checking for null values now

In [None]:
df.isnull().sum()


In [None]:
customerID           0
gender               0
SeniorCitizen        0
Partner              0
Dependents           0
tenure               0
PhoneService         0
MultipleLines        0
InternetService      0
OnlineSecurity       0
OnlineBackup         0
DeviceProtection     0
TechSupport          0
StreamingTV          0
StreamingMovies      0
Contract             0
PaperlessBilling     0
PaymentMethod        0
MonthlyCharges       0
TotalCharges        11
Churn                0
dtype: int64

Since there are only 11 rows with null values, we will drop these rows from the dataframe

In [None]:
# Ksh

df = df.dropna()
df.isnull().sum()

In [None]:
customerID          0
gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
Churn               0
dtype: int64

In [None]:
#Data Manipulation

#replace 'No internet service' to No for the following columns
replace_cols = [ 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
                'TechSupport','StreamingTV', 'StreamingMovies']
for i in replace_cols : 
    df[i]  = df[i].replace({'No internet service' : 'No'})
    
#replace values
df["SeniorCitizen"] = df["SeniorCitizen"].replace({1:"Yes",0:"No"})

#Tenure to categorical column
def tenure_lab(df) :
    if df["tenure"] <= 6 :
        return "Months_1-6"
    elif (df["tenure"] > 6) & (df["tenure"] <= 12 ):
        return "Months_7-12"
    elif (df["tenure"] > 12) & (df["tenure"] <= 18) :
        return "Months_13-18"
    elif (df["tenure"] > 18) & (df["tenure"] <= 24) :
        return "Months_19-24"
    elif (df["tenure"] > 24) & (df["tenure"] <= 36) :
        return "Months_24-36"
    elif (df["tenure"] > 36) & (df["tenure"] <= 48) :
        return "Months_36-48"
    elif (df["tenure"] > 48) & (df["tenure"] <= 60) :
        return "Months_48-60"
    elif (df["tenure"] > 60) :
        return "Months_60+"
df["tenure_group"] = df.apply(lambda df:tenure_lab(df),
                                      axis = 1)

#Separating churn and non churn customers
churn     = df[df["Churn"] == "Yes"]
not_churn = df[df["Churn"] == "No"]

#Separating catagorical and numerical columns
Id_col     = ['customerID']
target_col = ["Churn"]
cat_cols   = df.nunique()[df.nunique() < 6].keys().tolist()
cat_cols   = [x for x in cat_cols if x not in target_col]
num_cols   = [x for x in df.columns if x not in cat_cols + target_col + Id_col]

df.head()

In [None]:
	customerID	gender	SeniorCitizen	Partner	Dependents	tenure	PhoneService	MultipleLines	InternetService	OnlineSecurity	...	TechSupport	StreamingTV	StreamingMovies	Contract	PaperlessBilling	PaymentMethod	MonthlyCharges	TotalCharges	Churn	tenure_group
0	7590-VHVEG	Female	No	Yes	No	1	No	No phone service	DSL	No	...	No	No	No	Month-to-month	Yes	Electronic check	29.85	29.85	No	Months_1-6
1	5575-GNVDE	Male	No	No	No	34	Yes	No	DSL	Yes	...	No	No	No	One year	No	Mailed check	56.95	1889.50	No	Months_24-36
2	3668-QPYBK	Male	No	No	No	2	Yes	No	DSL	Yes	...	No	No	No	Month-to-month	Yes	Mailed check	53.85	108.15	Yes	Months_1-6
3	7795-CFOCW	Male	No	No	No	45	No	No phone service	DSL	Yes	...	Yes	No	No	One year	No	Bank transfer (automatic)	42.30	1840.75	No	Months_36-48
4	9237-HQITU	Female	No	No	No	2	Yes	No	Fiber optic	No	...	No	No	No	Month-to-month	Yes	Electronic check	70.70	151.65	Yes	Months_1-6

Visualization

In [None]:
ax = (df['gender'].value_counts() /70.32).plot(kind='bar',stacked = True,rot = 0,color = ['#0000FF','#FFC0CB'])
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
ax.set_title('Gender Distribution of Customers')
ax.set_xlabel('Gender of Customers')
ax.set_ylabel('Percentage of Customers')



# create a list to collect the plt.patches data
totals = []

# # find the values and append to list
for i in ax.patches:
    print(i)
    totals.append(i.get_width())

# set individual bar lables using above list
total = sum(totals)

for i in ax.patches:
    # get_width pulls left or right; get_y pushes up or down
    ax.text(i.get_x()+.15, i.get_height()-3.5, \
            str(round((i.get_height()/total), 1))+'%',fontsize=12,color='white',weight = 'bold')

In [None]:
Rectangle(xy=(-0.25, 0), width=0.5, height=50.4693, angle=0)
Rectangle(xy=(0.75, 0), width=0.5, height=49.5307, angle=0)

In [None]:
#labels
lab = df["Churn"].value_counts().keys().tolist()
#values
val = df["Churn"].value_counts().values.tolist()

trace = go.Pie(labels = lab ,
               values = val ,
               marker = dict(colors =  [ 'royalblue' ,'lime'],
                             line = dict(color = "white",
                                         width =  1.3)
                            ),
               rotation = 90,
               hoverinfo = "label+value+text",
               hole = .5
              )
layout = go.Layout(dict(title = "Customer attrition in data",
                        plot_bgcolor  = "rgb(243,243,243)",
                        paper_bgcolor = "rgb(243,243,243)",
                       )
                  )

data = [trace]
fig = go.Figure(data = data,layout = layout)
py.iplot(fig)

In [None]:
#cusomer attrition in tenure groups
tg_ch  =  churn["tenure_group"].value_counts().reset_index()
tg_ch.columns  = ["tenure_group","count"]
tg_nch =  not_churn["tenure_group"].value_counts().reset_index()
tg_nch.columns = ["tenure_group","count"]

#bar - churn
trace1 = go.Bar(x = tg_ch["tenure_group"]  , y = tg_ch["count"],
                name = "Churn Customers",
                marker = dict(line = dict(width = .5,color = "black")),
                opacity = .9)

#bar - not churn
trace2 = go.Bar(x = tg_nch["tenure_group"] , y = tg_nch["count"],
                name = "Non Churn Customers",
                marker = dict(line = dict(width = .5,color = "black")),
                opacity = .9)

layout = go.Layout(dict(title = "Customer attrition in tenure groups",
                        plot_bgcolor  = "rgb(243,243,243)",
                        paper_bgcolor = "rgb(243,243,243)",
                        xaxis = dict(gridcolor = 'rgb(255, 255, 255)',
                                     title = "tenure group",
                                     zerolinewidth=1,ticklen=5,gridwidth=2),
                        yaxis = dict(gridcolor = 'rgb(255, 255, 255)',
                                     title = "count",
                                     zerolinewidth=1,ticklen=5,gridwidth=2),
                       )
                  )
data = [trace1,trace2]
fig  = go.Figure(data=data,layout=layout)
py.iplot(fig)

In [None]:
avg_tgc = df.groupby(["tenure_group","Churn"])[["MonthlyCharges",
                                                    "TotalCharges"]].mean().reset_index()

#function for tracing 
def mean_charges(column,aggregate) :
    tracer = go.Bar(x = avg_tgc[avg_tgc["Churn"] == aggregate]["tenure_group"],
                    y = avg_tgc[avg_tgc["Churn"] == aggregate][column],
                    name = aggregate,marker = dict(line = dict(width = 1)),
                    text = "Churn"
                   )
    return tracer

#function for layout
def layout_plot(title,xaxis_lab,yaxis_lab) :
    layout = go.Layout(dict(title = title,
                            plot_bgcolor  = "rgb(243,243,243)",
                            paper_bgcolor = "rgb(243,243,243)",
                            xaxis = dict(gridcolor = 'rgb(255, 255, 255)',title = xaxis_lab,
                                         zerolinewidth=1,ticklen=5,gridwidth=2),
                            yaxis = dict(gridcolor = 'rgb(255, 255, 255)',title = yaxis_lab,
                                         zerolinewidth=1,ticklen=5,gridwidth=2),
                           )
                      )
    return layout
    

#plot1 - mean monthly charges by tenure groups
trace1  = mean_charges("MonthlyCharges","Yes")
trace2  = mean_charges("MonthlyCharges","No")
layout1 = layout_plot("Average Monthly Charges by Tenure groups",
                      "Tenure group","Monthly Charges")
data1   = [trace1,trace2]
fig1    = go.Figure(data=data1,layout=layout1)

py.iplot(fig1)


In [None]:
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler

##copy data
tel_df = df.copy()
#Drop tenure group column
df = df.drop(columns = "tenure_group",axis = 1)

#customer id col
Id_col     = ['customerID']
#Target columns
target_col = ["Churn"]
#categorical columns
cat_cols   = df.nunique()[df.nunique() < 6].keys().tolist()
cat_cols   = [x for x in cat_cols if x not in target_col]
#numerical columns
num_cols   = [x for x in df.columns if x not in cat_cols + target_col + Id_col]
#Binary columns with 2 values
bin_cols   = df.nunique()[df.nunique() == 2].keys().tolist()
#Columns more than 2 values
multi_cols = [i for i in cat_cols if i not in bin_cols]

#Label encoding Binary columns
le = LabelEncoder()
for i in bin_cols :
    df[i] = le.fit_transform(df[i])
    
#Duplicating columns for multi value columns
df = pd.get_dummies(data = df,columns = multi_cols )

#Dropping customerId column
df = df.iloc[:,1:]
df.head()

In [None]:
df.info()

In [None]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 7032 entries, 0 to 7042
Data columns (total 29 columns):
gender                                     7032 non-null int32
SeniorCitizen                              7032 non-null int32
Partner                                    7032 non-null int32
Dependents                                 7032 non-null int32
tenure                                     7032 non-null int64
PhoneService                               7032 non-null int32
OnlineSecurity                             7032 non-null int32
OnlineBackup                               7032 non-null int32
DeviceProtection                           7032 non-null int32
TechSupport                                7032 non-null int32
StreamingTV                                7032 non-null int32
StreamingMovies                            7032 non-null int32
PaperlessBilling                           7032 non-null int32
MonthlyCharges                             7032 non-null float64
TotalCharges                               7032 non-null float64
Churn                                      7032 non-null int32
MultipleLines_No                           7032 non-null uint8
MultipleLines_No phone service             7032 non-null uint8
MultipleLines_Yes                          7032 non-null uint8
InternetService_DSL                        7032 non-null uint8
InternetService_Fiber optic                7032 non-null uint8
InternetService_No                         7032 non-null uint8
Contract_Month-to-month                    7032 non-null uint8
Contract_One year                          7032 non-null uint8
Contract_Two year                          7032 non-null uint8
PaymentMethod_Bank transfer (automatic)    7032 non-null uint8
PaymentMethod_Credit card (automatic)      7032 non-null uint8
PaymentMethod_Electronic check             7032 non-null uint8
PaymentMethod_Mailed check                 7032 non-null uint8
dtypes: float64(2), int32(13), int64(1), uint8(13)
memory usage: 666.1 Kb

In [None]:
df.describe()


In [None]:
plt.figure(figsize=(15,8))
df.corr()['Churn'].sort_values(ascending = False).plot(kind='bar')

In [None]:
<matplotlib.axes._subplots.AxesSubplot at 0x1904750cef0>

In [None]:
sns.heatmap(df.corr())
df.corr()

In [None]:
	gender	SeniorCitizen	Partner	Dependents	tenure	PhoneService	OnlineSecurity	OnlineBackup	DeviceProtection	TechSupport	...	InternetService_DSL	InternetService_Fiber optic	InternetService_No	Contract_Month-to-month	Contract_One year	Contract_Two year	PaymentMethod_Bank transfer (automatic)	PaymentMethod_Credit card (automatic)	PaymentMethod_Electronic check	PaymentMethod_Mailed check
gender	1.000000	-0.001819	-0.001379	0.010349	0.005285	-0.007515	-0.016328	-0.013093	-0.000807	-0.008507	...	0.007584	-0.011189	0.004745	-0.003251	0.007755	-0.003603	-0.015973	0.001632	0.000844	0.013199
SeniorCitizen	-0.001819	1.000000	0.016957	-0.210550	0.015683	0.008392	-0.038576	0.066663	0.059514	-0.060577	...	-0.108276	0.254923	-0.182519	0.137752	-0.046491	-0.116205	-0.016235	-0.024359	0.171322	-0.152987
Partner	-0.001379	0.016957	1.000000	0.452269	0.381912	0.018397	0.143346	0.141849	0.153556	0.120206	...	-0.001043	0.001235	-0.000286	-0.280202	0.083067	0.247334	0.111406	0.082327	-0.083207	-0.096948
Dependents	0.010349	-0.210550	0.452269	1.000000	0.163386	-0.001078	0.080786	0.023639	0.013900	0.063053	...	0.051593	-0.164101	0.138383	-0.229715	0.069222	0.201699	0.052369	0.061134	-0.149274	0.056448
tenure	0.005285	0.015683	0.381912	0.163386	1.000000	0.007877	0.328297	0.361138	0.361520	0.325288	...	0.013786	0.017930	-0.037529	-0.649346	0.202338	0.563801	0.243822	0.232800	-0.210197	-0.232181
PhoneService	-0.007515	0.008392	0.018397	-0.001078	0.007877	1.000000	-0.091676	-0.052133	-0.070076	-0.095138	...	-0.452255	0.290183	0.171817	-0.001243	-0.003142	0.004442	0.008271	-0.006916	0.002747	-0.004463
OnlineSecurity	-0.016328	-0.038576	0.143346	0.080786	0.328297	-0.091676	1.000000	0.283285	0.274875	0.354458	...	0.320343	-0.030506	-0.332799	-0.246844	0.100658	0.191698	0.094366	0.115473	-0.112295	-0.079918
OnlineBackup	-0.013093	0.066663	0.141849	0.023639	0.361138	-0.052133	0.283285	1.000000	0.303058	0.293705	...	0.156765	0.165940	-0.380990	-0.164393	0.084113	0.111391	0.086942	0.090455	-0.000364	-0.174075
DeviceProtection	-0.000807	0.059514	0.153556	0.013900	0.361520	-0.070076	0.274875	0.303058	1.000000	0.332850	...	0.145150	0.176356	-0.380151	-0.225988	0.102911	0.165248	0.083047	0.111252	-0.003308	-0.187325
TechSupport	-0.008507	-0.060577	0.120206	0.063053	0.325288	-0.095138	0.354458	0.293705	0.332850	1.000000	...	0.312183	-0.020299	-0.335695	-0.285491	0.096258	0.240924	0.100472	0.117024	-0.114807	-0.084631
StreamingTV	-0.007124	0.105445	0.124483	-0.016499	0.280264	-0.021383	0.175514	0.281601	0.389924	0.277549	...	0.014973	0.329744	-0.414951	-0.112550	0.061930	0.072124	0.046121	0.040010	0.144747	-0.247712
StreamingMovies	-0.010105	0.119842	0.118108	-0.038375	0.285402	-0.033477	0.187426	0.274523	0.402309	0.280155	...	0.025623	0.322457	-0.418450	-0.117867	0.064780	0.075603	0.048755	0.048398	0.137420	-0.250290
PaperlessBilling	-0.011902	0.156258	-0.013957	-0.110131	0.004823	0.016696	-0.004051	0.127056	0.104079	0.037536	...	-0.063390	0.326470	-0.320592	0.168296	-0.052278	-0.146281	-0.017469	-0.013726	0.208427	-0.203981
MonthlyCharges	-0.013779	0.219874	0.097825	-0.112343	0.246862	0.2480

In [None]:
#separating binary columns
bi_cs = df.nunique()[df.nunique() == 2].keys()
dat_rad = df[bi_cs]

#plotting radar chart for churn and non churn customers(binary variables)
def plot_radar(df,aggregate,title) :
    data_frame = df[df["Churn"] == aggregate] 
    data_frame_x = data_frame[bi_cs].sum().reset_index()
    data_frame_x.columns  = ["feature","yes"]
    data_frame_x["no"]    = data_frame.shape[0]  - data_frame_x["yes"]
    data_frame_x  = data_frame_x[data_frame_x["feature"] != "Churn"]
    
    #count of 1's(yes)
    trace1 = go.Scatterpolar(r = data_frame_x["yes"].values.tolist(),
                             theta = data_frame_x["feature"].tolist(),
                             fill  = "toself",name = "count of 1's",
                             mode = "markers+lines",
                             marker = dict(size = 5)
                            )
    #count of 0's(No)
    trace2 = go.Scatterpolar(r = data_frame_x["no"].values.tolist(),
                             theta = data_frame_x["feature"].tolist(),
                             fill  = "toself",name = "count of 0's",
                             mode = "markers+lines",
                             marker = dict(size = 5)
                            ) 
    layout = go.Layout(dict(polar = dict(radialaxis = dict(visible = True,
                                                           side = "counterclockwise",
                                                           showline = True,
                                                           linewidth = 2,
                                                           tickwidth = 2,
                                                           gridcolor = "white",
                                                           gridwidth = 2),
                                         angularaxis = dict(tickfont = dict(size = 10),
                                                            layer = "below traces"
                                                           ),
                                         bgcolor  = "rgb(243,243,243)",
                                        ),
                            paper_bgcolor = "rgb(243,243,243)",
                            title = title,height = 700))
    
    data = [trace2,trace1]
    fig = go.Figure(data=data,layout=layout)
    py.iplot(fig)

#plot
plot_radar(dat_rad,1,"Churn -  Customers")
plot_radar(dat_rad,0,"Non Churn - Customers")

In [None]:
df.to_csv("PreprocessedData.csv", sep=',', encoding='utf-8')

In [None]:
ax = sns.scatterplot(x="tenure", y="MonthlyCharges", hue="Churn" , data=df)


In [None]:
From this we can understand that people who are paying more per month are more likely to churn.

#labels
lab = df["SeniorCitizen"].value_counts().keys().tolist()
#values
val = df["SeniorCitizen"].value_counts().values.tolist()

trace = go.Pie(labels = lab ,
               values = val ,
               marker = dict(colors =  [ 'royalblue' ,'lime'],
                             line = dict(color = "white",
                                         width =  1.3)
                            ),
               rotation = 90,
               hoverinfo = "label+value+text",
               hole = .5
              )
layout = go.Layout(dict(title = "Senior Citizens",
                        plot_bgcolor  = "rgb(243,243,243)",
                        paper_bgcolor = "rgb(243,243,243)",
                       )
                  )

data = [trace]
fig = go.Figure(data = data,layout = layout)
py.iplot(fig)

In [None]:
plt.figure(figsize=(12,6))
plt.subplot(121)
x = sns.countplot(x="gender", hue="Churn", data=df)


plt.subplot(122)
plt.pie(df["gender"].value_counts().values,
        labels=["Female","Male"],
        autopct="%1.0f%%",wedgeprops={"linewidth":2,"edgecolor":"white"})
my_circ = plt.Circle((0,0),.7,color = "white")
plt.gca().add_artist(my_circ)
plt.subplots_adjust(wspace = .2)
plt.title("Proportion of Gender in dataset")
plt.show()

In [None]:
Here in gender 0 stands for Female values and 1 stands for Male values. Churn 0 stands for No Churn and 1 stands for Churn. From this plot we can infer low probability of bias based on gender.

from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize=(13,13))
ax  = fig.add_subplot(111,projection = "3d")

ax.scatter(df[df["Churn"] == 1][["tenure"]],df[df["Churn"] == 1][["MonthlyCharges"]]
           ,df[df["Churn"] == 1][["TotalCharges"]],
           alpha=.5,s=80,linewidth=2,edgecolor="k",color="r",label="Churn")
ax.scatter(df[df["Churn"] == 0][["tenure"]],df[df["Churn"] == 0][["MonthlyCharges"]]
           ,df[df["Churn"] == 0][["TotalCharges"]],
           alpha=.5,s=80,linewidth=2,edgecolor="k",color="lime",label="No Churn")

ax.set_xlabel("tenure",fontsize=15)
ax.set_ylabel("MonthlyCharges",fontsize=15)
ax.set_zlabel("TotalCharges",fontsize=15)
plt.legend(loc="best")
fig.set_facecolor("w")
plt.title("3D PLOT FOR Tenure VS Monthly Charges VS Total Charges",fontsize=10)
plt.show()
There is a class imbalance present. The churn class is much lower in value in contrast to the non churn values.

