
**<font size="6"> CUSTOMER CHURN PREDICTION </font>**


---

<font size="5"> BUSINESS UNDERSTANDING </font>

### Problem statement 

Customer churn refers to the loss of customers over a certain period of time. Companies should be  very aware of customer churn because it directly *impacts their revenue*. When customers leave, the company loses not only their current business, but also the potential future business that the customer could have generated. Additionally, acquiring new customers is often more *expensive than retaining existing ones, so reducing churn can lead to significant cost savings*. Monitoring and reducing customer churn is therefore an important strategy for companies to maintain and grow their customer base, and ultimately, their revenue.By understanding the reasons behind customer churn and taking steps to retain customers, companies can improve their customer satisfaction, build customer loyalty, and ultimately, grow their business.

> **An Example of poor customer experience that cost the company:** 
    <div class="alert alert-block alert-warning">
    **Blockbuster** is an example of a company that failed to address customer churn and ultimately collapsed. **Blockbuster** was once a dominant player in the video rental industry, but as technology changed and customers increasingly sought more convenient ways to watch movies, the company failed to adapt. Competitors like **Netflix** offered online streaming, while Blockbuster stuck to its traditional brick-and-mortar model and relied on late fees to drive revenue.
As a result, customers began leaving Blockbuster in droves, opting for more convenient and cost-effective options. Despite numerous attempts to pivot and compete with Netflix, Blockbuster was unable to stem the tide of customer churn and ultimately filed for bankruptcy in 2010. This serves as a cautionary tale for companies that ignore the importance of addressing customer churn and failing to adapt to changing market conditions.
    </div>






### Main objective

&#9677; Inline with the importance of preserving existing customers, the main objective if this project is to come up with  a predictive model that will be able to flag customers who are likely to churn inorder to have targeted mitigating actions on them to prevent them from churning.

### Metric for Success 

&#9677; This project will be considered a success if prediction accuracy of the model is **above 95%**

---

<font size="5"> DATA UNDERSTANDING </font>

This dataset contains **3333 instances and 21 columns**. Columns contain different information on customers such as 'state', 'account length', 'area code', 'phone number','international plan', 'voice mail plan', 'number vmail messages','total day minutes', 'total day calls', 'total day charge','total eve minutes', 'total eve calls', 'total eve charge','total night minutes', 'total night calls', 'total night charge','total intl minutes', 'total intl calls', 'total intl charge','customer service calls', 'churn (target)'

## Reading in the data 

In [None]:
# importing all the required libraries 

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns 

In [10]:
# reading in the dataset and previewing the first five rows 

df = pd.read_csv('bigml_59c28831336c6604c800002a.csv')
df.head()

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.9,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


In [19]:
# Previewing the last five rows of the dataset 

df.tail()

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
3328,AZ,192,415,414-4276,no,yes,36,156.2,77,26.55,...,126,18.32,279.1,83,12.56,9.9,6,2.67,2,False
3329,WV,68,415,370-3271,no,no,0,231.1,57,39.29,...,55,13.04,191.3,123,8.61,9.6,4,2.59,3,False
3330,RI,28,510,328-8230,no,no,0,180.8,109,30.74,...,58,24.55,191.9,91,8.64,14.1,6,3.81,2,False
3331,CT,184,510,364-6381,yes,no,0,213.8,105,36.35,...,84,13.57,139.2,137,6.26,5.0,10,1.35,2,False
3332,TN,74,415,400-4344,no,yes,25,234.4,113,39.85,...,82,22.6,241.4,77,10.86,13.7,4,3.7,0,False


In [11]:
# checking the dimensions of the dataset 

df.shape

(3333, 21)

## Data cleaning and wrangling 

In this section, emphasis will be given to :

 &#9677; Checking and Dealing with missing values 
 
 
 &#9677; Checking for and Dealing with duplicated values 
 
 
 &#9677; Checking if dataset has appropriate(expected) data types for each of the columns 
    
   
 &#9677; Checking dealing with outliers 
    

In [20]:
# making a copy of the dataset

df_copy = df.copy()

In [22]:
# checking for any missing values 

df_copy.isna().sum()

state                     0
account length            0
area code                 0
phone number              0
international plan        0
voice mail plan           0
number vmail messages     0
total day minutes         0
total day calls           0
total day charge          0
total eve minutes         0
total eve calls           0
total eve charge          0
total night minutes       0
total night calls         0
total night charge        0
total intl minutes        0
total intl calls          0
total intl charge         0
customer service calls    0
churn                     0
dtype: int64

The dataset **does not have any missing values** for any of the columns 

In [23]:
# checking of columns have the expected data types 

df_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3333 entries, 0 to 3332
Data columns (total 21 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   state                   3333 non-null   object 
 1   account length          3333 non-null   int64  
 2   area code               3333 non-null   int64  
 3   phone number            3333 non-null   object 
 4   international plan      3333 non-null   object 
 5   voice mail plan         3333 non-null   object 
 6   number vmail messages   3333 non-null   int64  
 7   total day minutes       3333 non-null   float64
 8   total day calls         3333 non-null   int64  
 9   total day charge        3333 non-null   float64
 10  total eve minutes       3333 non-null   float64
 11  total eve calls         3333 non-null   int64  
 12  total eve charge        3333 non-null   float64
 13  total night minutes     3333 non-null   float64
 14  total night calls       3333 non-null   

In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3333 entries, 0 to 3332
Data columns (total 21 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   state                   3333 non-null   object 
 1   account length          3333 non-null   int64  
 2   area code               3333 non-null   int64  
 3   phone number            3333 non-null   object 
 4   international plan      3333 non-null   object 
 5   voice mail plan         3333 non-null   object 
 6   number vmail messages   3333 non-null   int64  
 7   total day minutes       3333 non-null   float64
 8   total day calls         3333 non-null   int64  
 9   total day charge        3333 non-null   float64
 10  total eve minutes       3333 non-null   float64
 11  total eve calls         3333 non-null   int64  
 12  total eve charge        3333 non-null   float64
 13  total night minutes     3333 non-null   float64
 14  total night calls       3333 non-null   

In [15]:
df['area code'].unique()

array([415, 408, 510], dtype=int64)

In [18]:
df['state'].unique()

array(['KS', 'OH', 'NJ', 'OK', 'AL', 'MA', 'MO', 'LA', 'WV', 'IN', 'RI',
       'IA', 'MT', 'NY', 'ID', 'VT', 'VA', 'TX', 'FL', 'CO', 'AZ', 'SC',
       'NE', 'WY', 'HI', 'IL', 'NH', 'GA', 'AK', 'MD', 'AR', 'WI', 'OR',
       'MI', 'DE', 'UT', 'CA', 'MN', 'SD', 'NC', 'WA', 'NM', 'NV', 'DC',
       'KY', 'ME', 'MS', 'TN', 'PA', 'CT', 'ND'], dtype=object)