### Introduction
The project is about extracting relevant key performance indicator (KPIs) and metrics that can be used by PhoneNow to develop strategies for better customer retention

### Question(s) for Analysis
* Factors that likely contribute to patients not able to show up for appointments
* Does the interval between scheduled date and appointment date likely to affect patients from showing up for appointments
* Which day(s) of the week does patient skip appointment most

In [2]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

## Data Wrangling

### Data Gathering

In [3]:
#load the dataset
df = pd.read_excel('02-Churn-Dataset.xlsx')

### Assessing Data

### Visual Assessment

In [4]:
#show all rows and columns from dataset
df

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,numAdminTickets,numTechTickets,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,0,0,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,No,No,One year,No,Mailed check,56.95,1889.5,0,0,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,0,0,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,No,No,One year,No,Bank transfer (automatic),42.30,1840.75,0,3,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,Month-to-month,Yes,Electronic check,70.70,151.65,0,0,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,6840-RESVB,Male,0,Yes,Yes,24,Yes,Yes,DSL,Yes,...,Yes,Yes,One year,Yes,Mailed check,84.80,1990.5,0,0,No
7039,2234-XADUH,Female,0,Yes,Yes,72,Yes,Yes,Fiber optic,No,...,Yes,Yes,One year,Yes,Credit card (automatic),103.20,7362.9,0,5,No
7040,4801-JZAZL,Female,0,Yes,Yes,11,No,No phone service,DSL,Yes,...,No,No,Month-to-month,Yes,Electronic check,29.60,346.45,0,0,No
7041,8361-LTMKD,Male,1,Yes,No,4,Yes,Yes,Fiber optic,No,...,No,No,Month-to-month,Yes,Mailed check,74.40,306.6,0,0,Yes


### Programmatic Assessment

In [5]:
#list first 5 rows
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,numAdminTickets,numTechTickets,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,0,0,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,No,No,One year,No,Mailed check,56.95,1889.5,0,0,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,0,0,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,0,3,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,0,0,Yes


In [6]:
#check the number of rows, columns, datatypes and missing data
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 23 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


In [7]:
#check basic statistics
df.describe()

Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges,numAdminTickets,numTechTickets
count,7043.0,7043.0,7043.0,7043.0,7043.0
mean,0.162147,32.371149,64.761692,0.515689,0.419566
std,0.368612,24.559481,30.090047,1.275299,1.250117
min,0.0,0.0,18.25,0.0,0.0
25%,0.0,9.0,35.5,0.0,0.0
50%,0.0,29.0,70.35,0.0,0.0
75%,0.0,55.0,89.85,0.0,0.0
max,1.0,72.0,118.75,5.0,9.0


In [8]:
#check basic statistics
df.describe(include='all')

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,numAdminTickets,numTechTickets,Churn
count,7043,7043,7043.0,7043,7043,7043.0,7043,7043,7043,7043,...,7043,7043,7043,7043,7043,7043.0,7043.0,7043.0,7043.0,7043
unique,7043,2,,2,2,,2,3,3,3,...,3,3,3,2,4,,6531.0,,,2
top,7590-VHVEG,Male,,No,No,,Yes,No,Fiber optic,No,...,No,No,Month-to-month,Yes,Electronic check,,,,,No
freq,1,3555,,3641,4933,,6361,3390,3096,3498,...,2810,2785,3875,4171,2365,,11.0,,,5174
mean,,,0.162147,,,32.371149,,,,,...,,,,,,64.761692,,0.515689,0.419566,
std,,,0.368612,,,24.559481,,,,,...,,,,,,30.090047,,1.275299,1.250117,
min,,,0.0,,,0.0,,,,,...,,,,,,18.25,,0.0,0.0,
25%,,,0.0,,,9.0,,,,,...,,,,,,35.5,,0.0,0.0,
50%,,,0.0,,,29.0,,,,,...,,,,,,70.35,,0.0,0.0,
75%,,,0.0,,,55.0,,,,,...,,,,,,89.85,,0.0,0.0,


In [9]:
#check for null values
df.isna().sum()

customerID          0
gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
numAdminTickets     0
numTechTickets      0
Churn               0
dtype: int64

In [10]:
#check for duplicates
df.duplicated().sum()

0

### Visualization

### Dataset structure
The master dataset that I will be using has 5,000 rows and 11 columns. 7 of the features datatypes are object(strings) while 2 are datetime datatype, and 2 are float.