## Naive Bayes Classifier

In [12]:
import pandas as pd

# import data
df = pd.read_csv('../data/churn.txt', delimiter=',')

In [13]:
df.head()

Unnamed: 0,State,Account Length,Area Code,Phone,Int'l Plan,VMail Plan,VMail Message,Day Mins,Day Calls,Day Charge,...,Eve Calls,Eve Charge,Night Mins,Night Calls,Night Charge,Intl Mins,Intl Calls,Intl Charge,CustServ Calls,Churn?
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False.
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False.
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False.
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.9,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False.
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False.


For the sake of this demonstration, we will only be using Int'l Plan and VMail Plan.

In [17]:
df2 = df[['Int\'l Plan', 'VMail Plan', 'Churn?']]
df2.head()

Unnamed: 0,Int'l Plan,VMail Plan,Churn?
0,no,yes,False.
1,no,yes,False.
2,no,no,False.
3,yes,no,False.
4,yes,no,False.


In [18]:
# rename columns
df2.rename(columns= 
           {'Int\'l Plan': 'International Plan',
            'VMail Plan': 'Voicemail Plan',
            'Churn?': 'Churn'}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2.rename(columns=


In [19]:
df2.head()

Unnamed: 0,International Plan,Voicemail Plan,Churn
0,no,yes,False.
1,no,yes,False.
2,no,no,False.
3,yes,no,False.
4,yes,no,False.


In [20]:
df2['Churn'] = df2['Churn'].map({'False.': 0,
                                 'True.': 1})

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2['Churn'] = df2['Churn'].map({'False.': 0,


In [21]:
df2

Unnamed: 0,International Plan,Voicemail Plan,Churn
0,no,yes,0
1,no,yes,0
2,no,no,0
3,yes,no,0
4,yes,no,0
...,...,...,...
3328,no,yes,0
3329,no,no,0
3330,no,no,0
3331,yes,no,0


##### Calculate Prior Probability

Since our goal is to classify someone as churning or not churning, we need the prior probability of churning. This is just the prevalence of churning which is **0.1449**. We also need the compliment of this value which is the probability that they will not churn no matter what the predictor variables tell us. This value is **0.8551**. 

In [22]:
# churn value counts
df2['Churn'].value_counts()

Churn
0    2850
1     483
Name: count, dtype: int64

In [151]:
churn_total = sum(df2['Churn'] == 1)
churn_total

483

In [152]:
not_churn_total = sum(df2['Churn'] == 0)
not_churn_total

2850

In [146]:
# calculate probability of churning for entire dataset
churn_probability = round(483 / (3333), 4)
churn_probability

0.1449

In [148]:
# probability of not churning for entire dataset
not_churn_probability = 1 - churn_probability
not_churn_probability

0.8551

#### Conditional Probabilities for Churn

Now, we need to calculate a few conditional probabilities. Our dataset has two predictor variables **International Plan** and **Voicemail Plan**. The two conditional probabilities we need for naive bayes are the probability that someone has the international plan given they have churned and the probability that someone has the voicemail plan given they have churned.

probability of International Plan given Churn: 
+ p(I | C) = 0.2836

probability of Voicemail Plan given Churn:
+ p(V | C) = 0.1656

In [154]:
# probability of having international plan given churn
# p(I | C)
I_given_C = round((df2[(df2['International Plan'] == 'yes') & (df2['Churn'] == 1)].shape[0]) / churn_total, 4)
I_given_C

0.2836

In [155]:
print(f'The probability of International Plan given Churn is {I_given_C}.')

The probability of International Plan given Churn is 0.2836.


In [156]:
# probability of having voicemail plan given churn
# p(V | C)
V_given_C = round((df2[(df2['Voicemail Plan'] == 'yes') & (df2['Churn'] == 1)].shape[0]) / churn_total, 4)
V_given_C

0.1656

Now, with these two posterior probabilities and the prior probabilities, we can classify whether or not a new customer that has both plans will churn or not.

In [158]:
# score for both plans churning
churn_score = round((I_given_C * V_given_C * churn_probability), 4)
churn_score

0.0068

To create a score for a new customer with both plans not churning, we just use the conditional probability for each plan given they will not churn. The product of these two values are multiplied by the prior probability of not churning for the final score. The highest final score will be the classification for a customer with both plans.

The results tell us that this classifier will predict someone with both plans not to churn.

In [163]:
I_given_not_C = round((df2[(df2['International Plan'] == 'yes') & (df2['Churn'] == 0)].shape[0]) / not_churn_total, 4)
I_given_not_C

0.0653

In [164]:
V_given_not_C = round((df2[(df2['Voicemail Plan'] == 'yes') & (df2['Churn'] == 0)].shape[0]) / not_churn_total, 4)
V_given_not_C

0.2954

In [165]:
not_churn_score = round((I_given_not_C * V_given_not_C * not_churn_probability), 4)
not_churn_score

0.0165

As we will see shortly, scikit learn normalizes these values so that they add up to one so we will do that now.

In [202]:
normalization = 0.0068 + 0.0165
normalized_churn = 0.0068 / normalization
normalized_not_churn =  0.0165 / normalization
print(f'Churn  {normalized_churn=:.4f}, Not Churn {normalized_not_churn=:.4f}')

Churn  normalized_churn=0.2918, Not Churn normalized_not_churn=0.7082


In [203]:
normalization

0.0233

##### Scores for Not Having Either Plan

To find the conditional probabilities for not having the plans we can easily use their compliments.

In [168]:
# probability of no international plan given churn
# p(Not I | C)
not_I_given_C = round(1 - I_given_C, 4) 
not_I_given_C

0.7164

In [169]:
# probability of no voicemail plan given churn
not_V_given_C = round(1 - V_given_C, 4)
not_V_given_C

0.8344

In [171]:
# scores for no plan and churning
churn_score = round((not_I_given_C * not_I_given_C * churn_probability), 4)
churn_score

0.0744

In [173]:
# scores for no plan and no churn
not_churn_score = round((not_I_given_C * not_V_given_C * not_churn_probability), 4)
not_churn_score

0.5111

##### No International Plan, Yes Voicemail Plan

This customer is also classified as not churning.

+ p(Not I | C) * p(V | C) * p(C)
+ p(Not I | not C) * p(V | not C) * p(Not C)

In [179]:
not_I_given_not_C = 1 - I_given_not_C
not_I_given_not_C

0.9347

In [180]:
churn_score = round((not_I_given_C * V_given_C * churn_probability), 4)
churn_score

0.0172

In [181]:
not_churn_score = round((not_I_given_not_C * V_given_C * not_churn_probability), 4)
not_churn_score

0.1324

##### Yes International Plan, No Voicemail Plan

For Churners:
+ p(I | C) * p(Not V |C) * p(C)

For Not Churners
+ p(I | Not C) * p(Not V | Not C) * p(Not C)

In [182]:
churn_score = round((I_given_C * not_V_given_C * churn_probability), 4)
churn_score

0.0343

In [183]:
not_churn_score = round((I_given_not_C * not_I_given_not_C * not_churn_probability), 4)
not_churn_score

0.0522

In [184]:
from sklearn.naive_bayes import MultinomialNB


In [185]:
df2

Unnamed: 0,International Plan,Voicemail Plan,Churn
0,no,yes,0
1,no,yes,0
2,no,no,0
3,yes,no,0
4,yes,no,0
...,...,...,...
3328,no,yes,0
3329,no,no,0
3330,no,no,0
3331,yes,no,0


In [192]:
# create X y split

X = df2[['International Plan', 'Voicemail Plan']]
y = df2['Churn']



In [193]:
# change yes values to 1 and no values to 0
X['International Plan'] = X['International Plan'].map({'yes': 1, 'no': 0})
X['Voicemail Plan'] = X['Voicemail Plan'].map({'yes': 1, 'no': 0})

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X['International Plan'] = X['International Plan'].map({'yes': 1, 'no': 0})
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X['Voicemail Plan'] = X['Voicemail Plan'].map({'yes': 1, 'no': 0})


In [206]:
# create model and fit to our data with no smoothing
model = MultinomialNB(alpha=0)
model.fit(X, y)

In [207]:
# new customer with both plans 
new_customer = pd.DataFrame({
    'International Plan': 1,
    'Voicemail Plan': 1
}, index = [0])

The model has classified this customer as one that will not churn.

In [209]:
# predict the class label 
model.predict_proba(new_customer)

array([[0.7897851, 0.2102149]])

In [212]:
# new customer with neither plan 
new_customer = pd.DataFrame({
    'International Plan': 0,
    'Voicemail Plan': 0
}, index = [0])

The model has also predicted this customer will not churn.

In [213]:
model.predict_proba(new_customer)

array([[0.85508551, 0.14491449]])

The model predicts no churn again.

In [215]:
# new customer with only international plan 
new_customer = pd.DataFrame({
    'International Plan': 1,
    'Voicemail Plan': 0
}, index = [0])

model.predict_proba(new_customer)

array([[0.62839799, 0.37160201]])

Since the model predicts no churn for this customer, our model will always predict that a customer will not churn.

In [216]:
# new customer only voicemail plan
# new customer with neither plan 
new_customer = pd.DataFrame({
    'International Plan': 0,
    'Voicemail Plan': 1
}, index = [0])

model.predict_proba(new_customer)

array([[0.92912582, 0.07087418]])