<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Import-Packages" data-toc-modified-id="Import-Packages-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Import Packages</a></span></li><li><span><a href="#Data-Wrangling" data-toc-modified-id="Data-Wrangling-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Data Wrangling</a></span><ul class="toc-item"><li><span><a href="#Import-Data" data-toc-modified-id="Import-Data-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Import Data</a></span></li><li><span><a href="#Initial-Data-Cleaning" data-toc-modified-id="Initial-Data-Cleaning-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Initial Data Cleaning</a></span></li></ul></li><li><span><a href="#Exploratory-Data-Analysis" data-toc-modified-id="Exploratory-Data-Analysis-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Exploratory Data Analysis</a></span><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#Number-of-Unique-Feature-Values" data-toc-modified-id="Number-of-Unique-Feature-Values-3.0.1"><span class="toc-item-num">3.0.1&nbsp;&nbsp;</span>Number of Unique Feature Values</a></span></li><li><span><a href="#Unique-Non-Numerical-Feature-Values" data-toc-modified-id="Unique-Non-Numerical-Feature-Values-3.0.2"><span class="toc-item-num">3.0.2&nbsp;&nbsp;</span>Unique Non-Numerical Feature Values</a></span></li></ul></li><li><span><a href="#Initial-Visualization" data-toc-modified-id="Initial-Visualization-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Initial Visualization</a></span></li></ul></li></ul></div>

# Import Packages

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Data Wrangling

## Import Data

In [2]:
data = pd.read_csv('Data/Telecom_Customer_Churn.csv')

## Initial Data Cleaning

In [3]:
# Capitalize columns that aren't capitalized
data.rename(columns={'customerID':'CustomerID','gender':'Gender','tenure':'Tenure'}, inplace=True)

# Exploratory Data Analysis

In [4]:
data.head()

Unnamed: 0,CustomerID,Gender,SeniorCitizen,Partner,Dependents,Tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   CustomerID        7043 non-null   object 
 1   Gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   Tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


In [6]:
# Statistically describe numerical features
data.describe()

Unnamed: 0,SeniorCitizen,Tenure,MonthlyCharges
count,7043.0,7043.0,7043.0
mean,0.162147,32.371149,64.761692
std,0.368612,24.559481,30.090047
min,0.0,0.0,18.25
25%,0.0,9.0,35.5
50%,0.0,29.0,70.35
75%,0.0,55.0,89.85
max,1.0,72.0,118.75


### Number of Unique Feature Values

In [7]:
data.nunique()

CustomerID          7043
Gender                 2
SeniorCitizen          2
Partner                2
Dependents             2
Tenure                73
PhoneService           2
MultipleLines          3
InternetService        3
OnlineSecurity         3
OnlineBackup           3
DeviceProtection       3
TechSupport            3
StreamingTV            3
StreamingMovies        3
Contract               3
PaperlessBilling       2
PaymentMethod          4
MonthlyCharges      1585
TotalCharges        6531
Churn                  2
dtype: int64

### Unique Non-Numerical Feature Values

In [8]:
from collections import defaultdict

Find and save all unique non-numerical values to a DataFrame

In [9]:
# Initiate an empty default-dictionary
unique_values = defaultdict(list)

# Find the columns with fewer than 10 unique values
categories = data.loc[:, data.nunique() < 10].columns

# Save each of those column's unique values in a dictionary
max_unique = 0
for cat in categories:
    unique = data[cat].unique().tolist()
    len_unique = len(unique)
    unique_values[cat] = unique
    max_unique = max(max_unique, len_unique)

# Fill dictionary values with - to match the length of the column with the most unique features
for key, val in unique_values.items():
    if len(val) < max_unique:
        diff = max_unique - len(val)
        unique_values[key].extend(['-'] * diff)
        
# Convert the unique values dictionary to a DataFrame
unique_values = pd.DataFrame(unique_values).T

In [10]:
# Show the unique values per feature
unique_values

Unnamed: 0,0,1,2,3
Gender,Female,Male,-,-
SeniorCitizen,0,1,-,-
Partner,Yes,No,-,-
Dependents,No,Yes,-,-
PhoneService,No,Yes,-,-
MultipleLines,No phone service,No,Yes,-
InternetService,DSL,Fiber optic,No,-
OnlineSecurity,No,Yes,No internet service,-
OnlineBackup,Yes,No,No internet service,-
DeviceProtection,No,Yes,No internet service,-


## Initial Visualization