# SyriaTel Customer Churn Project

Name: Henry Kemboi

## 1. Project Overview

Syriatel, a telecommunications company, aims to minimize revenue losses caused by customer churn. The company is focused on identifying the key factors contributing to customer attrition and understanding the reasons behind customers discontinuing their services.

### 1.1. Problem Statement

The main task of this project is to identify the factors driving customer churn and develop actionable strategies to reduce it. This will help the company to take appropriate actions on time to avoid losing customers and thus revenue at the same time.


### 1.2. Objectives

* Develop a predictive model to identify customers at risk of churning and any characteristics that are indicative of churn.
* Focus retention efforts on the most at-risk segments to maximize return on investment in customer satisfaction programs.
* Explore patterns and behaviors and use the insights to implement targeted interventions, such as proactive customer support or offering better plans for high usage customers.

## 2. Data Understanding

### 2.1 Setup

To begin, we will import all the relevant libraries necessary for data analysis.

* Pandas - Enables us to manipulate imported datasets saved as dataframes.
* Numpy - Enables us to compute mathematical functions as well as perform array operations.
* Seaborn - Enables us to visualise the data.
* Matplotlib - Additional library to assist in visualising the data.
* Scipy.stats - Assists in performing statistical calculations.


In [4]:
# importing relevant libraries
import pandas as pd 
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
import sqlite3
import statsmodels.api as sm


%matplotlib inline

### 2.2. Loading Data

Use pandas to read in the data from this CSV file and create a dataframe named churn_df and check the first 5 rows of the data frame to get an understanding of the data. 

In [14]:
churn_df = pd.read_csv("./data/syriatel.csv", index_col=0)
churn_df.head()

Unnamed: 0_level_0,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,total eve minutes,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
state,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
KS,128,415,382-4657,no,yes,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
OH,107,415,371-7191,no,yes,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
NJ,137,415,358-1921,no,no,0,243.4,114,41.38,121.2,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
OH,84,408,375-9999,yes,no,0,299.4,71,50.9,61.9,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
OK,75,415,330-6626,yes,no,0,166.7,113,28.34,148.3,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


As part of this step we observe the dimensions as well.

In [15]:
# Check the shape of the dataframe
churn_df.shape

(3333, 20)

#### 2.2.1 Checking for any missing values

Finally, check for any missing values of null values in the dataset.

In [17]:
churn_df.isnull().sum()

account length            0
area code                 0
phone number              0
international plan        0
voice mail plan           0
number vmail messages     0
total day minutes         0
total day calls           0
total day charge          0
total eve minutes         0
total eve calls           0
total eve charge          0
total night minutes       0
total night calls         0
total night charge        0
total intl minutes        0
total intl calls          0
total intl charge         0
customer service calls    0
churn                     0
dtype: int64