### Classification Module: Telco Assignment 
    date: Monday, July 11th 2022

**Artifact: working jupyter notebook**

----

**Insights from storytelling data analysis:**

1. Month-to-Month customers are more likely to churn than 1 & 2 year contract customers
2. Fiber Optic customers are more likely to churn than DSL & No Internet Customers
3. Customers WITHOUT dependents are more likely to churn than customers WITH dependents
4. Customers who pay/paid with Electronic-checks are more likely to churn than customers who pay with other methods
    - Mailed Check
    - Bank Transfer (automatic)
    - Credit Card (automatic)

----

**Importing Modules and initial Data:**

In [72]:
# tabular data modules:
import pandas as pd
from skimpy import clean_columns

# math modules:
import numpy as np

# visualization modules:
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10, 5)

import seaborn as sns
sns.set_style("whitegrid")
sns.set_palette('RdBu')


import env
from env import user, password, host, get_connection

In [73]:
query = ''' 
SELECT *
        FROM customers
        RIGHT JOIN contract_types using (contract_type_id)
        RIGHT JOIN payment_types using (payment_type_id)
        RIGHT JOIN internet_service_types using (internet_service_type_id)
'''

In [74]:
url = get_connection(user, password, host, "telco_churn")

In [75]:
telco_df = pd.read_sql(query, url)

In [76]:
telco_df.head()

Unnamed: 0,internet_service_type_id,internet_service_type,payment_type_id,payment_type,contract_type_id,contract_type,customer_id,gender,senior_citizen,partner,...,online_security,online_backup,device_protection,tech_support,streaming_tv,streaming_movies,paperless_billing,monthly_charges,total_charges,churn
0,1,DSL,2,Mailed check,2,One year,0002-ORFBO,Female,0,Yes,...,No,Yes,No,Yes,Yes,No,Yes,65.6,593.3,No
1,1,DSL,2,Mailed check,1,Month-to-month,0003-MKNFE,Male,0,No,...,No,No,No,No,No,Yes,No,59.9,542.4,No
2,1,DSL,4,Credit card (automatic),1,Month-to-month,0013-MHZWF,Female,0,No,...,No,No,No,Yes,Yes,Yes,Yes,69.4,571.45,No
3,1,DSL,1,Electronic check,1,Month-to-month,0015-UOCOJ,Female,1,No,...,Yes,No,No,No,No,No,Yes,48.2,340.35,No
4,1,DSL,2,Mailed check,3,Two year,0016-QLJIS,Female,0,Yes,...,Yes,Yes,Yes,Yes,Yes,Yes,Yes,90.45,5957.9,No


In [77]:
# let's go ahead and save this version for future referencing

telco_df.to_csv("telco.csv")

In [78]:
# confirming the new file saved correctly:

pd.read_csv("telco.csv") # checks out!

Unnamed: 0.1,Unnamed: 0,internet_service_type_id,internet_service_type,payment_type_id,payment_type,contract_type_id,contract_type,customer_id,gender,senior_citizen,...,online_security,online_backup,device_protection,tech_support,streaming_tv,streaming_movies,paperless_billing,monthly_charges,total_charges,churn
0,0,1,DSL,2,Mailed check,2,One year,0002-ORFBO,Female,0,...,No,Yes,No,Yes,Yes,No,Yes,65.60,593.3,No
1,1,1,DSL,2,Mailed check,1,Month-to-month,0003-MKNFE,Male,0,...,No,No,No,No,No,Yes,No,59.90,542.4,No
2,2,1,DSL,4,Credit card (automatic),1,Month-to-month,0013-MHZWF,Female,0,...,No,No,No,Yes,Yes,Yes,Yes,69.40,571.45,No
3,3,1,DSL,1,Electronic check,1,Month-to-month,0015-UOCOJ,Female,1,...,Yes,No,No,No,No,No,Yes,48.20,340.35,No
4,4,1,DSL,2,Mailed check,3,Two year,0016-QLJIS,Female,0,...,Yes,Yes,Yes,Yes,Yes,Yes,Yes,90.45,5957.9,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,7038,3,,4,Credit card (automatic),1,Month-to-month,9970-QBCDA,Female,0,...,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,No,19.70,129.55,No
7039,7039,3,,3,Bank transfer (automatic),3,Two year,9972-EWRJS,Female,0,...,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Yes,19.25,1372.9,No
7040,7040,3,,4,Credit card (automatic),3,Two year,9975-GPKZU,Male,0,...,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,No,19.75,856.5,No
7041,7041,3,,2,Mailed check,1,Month-to-month,9975-SKRNR,Male,0,...,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,No,18.90,18.9,No


----
**Initial Data Exploration and Preparation:**

In [79]:
telco_df.info()

# notes:
# col "total_charges" is viewed/encoded as an "object" type 
# will need to convert to "float" type col/values

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 24 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   internet_service_type_id  7043 non-null   int64  
 1   internet_service_type     7043 non-null   object 
 2   payment_type_id           7043 non-null   int64  
 3   payment_type              7043 non-null   object 
 4   contract_type_id          7043 non-null   int64  
 5   contract_type             7043 non-null   object 
 6   customer_id               7043 non-null   object 
 7   gender                    7043 non-null   object 
 8   senior_citizen            7043 non-null   int64  
 9   partner                   7043 non-null   object 
 10  dependents                7043 non-null   object 
 11  tenure                    7043 non-null   int64  
 12  phone_service             7043 non-null   object 
 13  multiple_lines            7043 non-null   object 
 14  online_s

In [80]:
# "astype()" method did not work for "total_charges" column
# therefore, i used the above pandas method to convert this column to float type

telco_df["total_charges"] = pd.to_numeric(telco_df["total_charges"], errors="coerce")

In [81]:
# confirming conversion:

telco_df.dtypes # checks out!

internet_service_type_id      int64
internet_service_type        object
payment_type_id               int64
payment_type                 object
contract_type_id              int64
contract_type                object
customer_id                  object
gender                       object
senior_citizen                int64
partner                      object
dependents                   object
tenure                        int64
phone_service                object
multiple_lines               object
online_security              object
online_backup                object
device_protection            object
tech_support                 object
streaming_tv                 object
streaming_movies             object
paperless_billing            object
monthly_charges             float64
total_charges               float64
churn                        object
dtype: object

In [82]:
telco_df.info()

# notes:
# after converting "total_charges" col to float type, there apears to be 11 missing values in the column
# let's confirm/inspect further

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 24 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   internet_service_type_id  7043 non-null   int64  
 1   internet_service_type     7043 non-null   object 
 2   payment_type_id           7043 non-null   int64  
 3   payment_type              7043 non-null   object 
 4   contract_type_id          7043 non-null   int64  
 5   contract_type             7043 non-null   object 
 6   customer_id               7043 non-null   object 
 7   gender                    7043 non-null   object 
 8   senior_citizen            7043 non-null   int64  
 9   partner                   7043 non-null   object 
 10  dependents                7043 non-null   object 
 11  tenure                    7043 non-null   int64  
 12  phone_service             7043 non-null   object 
 13  multiple_lines            7043 non-null   object 
 14  online_s

In [83]:
telco_df.isnull().sum()

internet_service_type_id     0
internet_service_type        0
payment_type_id              0
payment_type                 0
contract_type_id             0
contract_type                0
customer_id                  0
gender                       0
senior_citizen               0
partner                      0
dependents                   0
tenure                       0
phone_service                0
multiple_lines               0
online_security              0
online_backup                0
device_protection            0
tech_support                 0
streaming_tv                 0
streaming_movies             0
paperless_billing            0
monthly_charges              0
total_charges               11
churn                        0
dtype: int64

In [84]:
telco_df[telco_df.isnull().any(axis=1)]

Unnamed: 0,internet_service_type_id,internet_service_type,payment_type_id,payment_type,contract_type_id,contract_type,customer_id,gender,senior_citizen,partner,...,online_security,online_backup,device_protection,tech_support,streaming_tv,streaming_movies,paperless_billing,monthly_charges,total_charges,churn
318,1,DSL,4,Credit card (automatic),3,Two year,1371-DWPAZ,Female,0,Yes,...,Yes,Yes,Yes,Yes,Yes,No,No,56.05,,No
630,1,DSL,3,Bank transfer (automatic),3,Two year,2775-SEFEE,Male,0,No,...,Yes,Yes,No,Yes,No,No,Yes,61.9,,No
953,1,DSL,2,Mailed check,3,Two year,4075-WKNIU,Female,0,Yes,...,No,Yes,Yes,Yes,Yes,No,No,73.35,,No
1052,1,DSL,3,Bank transfer (automatic),3,Two year,4472-LVYGI,Female,0,Yes,...,Yes,No,Yes,Yes,Yes,No,Yes,52.55,,No
1366,1,DSL,2,Mailed check,3,Two year,5709-LVOEQ,Female,0,Yes,...,Yes,Yes,Yes,No,Yes,Yes,No,80.85,,No
5902,3,,2,Mailed check,3,Two year,2520-SGTTA,Female,0,Yes,...,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,No,20.0,,No
5974,3,,2,Mailed check,2,One year,2923-ARZLG,Male,0,Yes,...,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Yes,19.7,,No
6000,3,,2,Mailed check,3,Two year,3115-CZMZD,Male,0,No,...,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,No,20.25,,No
6019,3,,2,Mailed check,3,Two year,3213-VVOLG,Male,0,Yes,...,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,No,25.35,,No
6188,3,,2,Mailed check,3,Two year,4367-NUYAO,Male,0,Yes,...,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,No,25.75,,No


**key takeaways for handling 11 missing values in "total_charges" column:**

- customers have not churned...and are still with the company
- customers have a tenure of "0" impliying that they have recently joined the company
- since this is the case, i will use their "monthly_charges" as a baseline for their current sum of total charges

In [85]:
filling_values = telco_df[telco_df.isnull().any(axis=1)].monthly_charges
filling_values

318     56.05
630     61.90
953     73.35
1052    52.55
1366    80.85
5902    20.00
5974    19.70
6000    20.25
6019    25.35
6188    25.75
6694    19.85
Name: monthly_charges, dtype: float64

In [86]:
# using the "fillna" function, i am expressing: "fill in" missing values in "total_charges" with "monthly_charges"

telco_df["total_charges"] = telco_df["total_charges"].fillna(telco_df["monthly_charges"]) 
telco_df.isnull().sum() # checks out!

internet_service_type_id    0
internet_service_type       0
payment_type_id             0
payment_type                0
contract_type_id            0
contract_type               0
customer_id                 0
gender                      0
senior_citizen              0
partner                     0
dependents                  0
tenure                      0
phone_service               0
multiple_lines              0
online_security             0
online_backup               0
device_protection           0
tech_support                0
streaming_tv                0
streaming_movies            0
paperless_billing           0
monthly_charges             0
total_charges               0
churn                       0
dtype: int64

In [87]:
# quick confirmation of previous missing values
print(telco_df["monthly_charges"].iloc[318]) # should print $56.05
print(telco_df["monthly_charges"].iloc[6694]) # should print $19.85

56.05
19.85


In [88]:
# quick summary stats:

stats = telco_df.describe().T
stats

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
internet_service_type_id,7043.0,1.872923,0.737796,1.0,1.0,2.0,2.0,3.0
payment_type_id,7043.0,2.315633,1.148907,1.0,1.0,2.0,3.0,4.0
contract_type_id,7043.0,1.690473,0.833755,1.0,1.0,1.0,2.0,3.0
senior_citizen,7043.0,0.162147,0.368612,0.0,0.0,0.0,0.0,1.0
tenure,7043.0,32.371149,24.559481,0.0,9.0,29.0,55.0,72.0
monthly_charges,7043.0,64.761692,30.090047,18.25,35.5,70.35,89.85,118.75
total_charges,7043.0,2279.798992,2266.73017,18.8,398.55,1394.55,3786.6,8684.8


In [89]:
# let's also create "range" column for our summary statistics

stats["range"] = stats["max"] - stats["min"]
stats # checks out!

Unnamed: 0,count,mean,std,min,25%,50%,75%,max,range
internet_service_type_id,7043.0,1.872923,0.737796,1.0,1.0,2.0,2.0,3.0,2.0
payment_type_id,7043.0,2.315633,1.148907,1.0,1.0,2.0,3.0,4.0,3.0
contract_type_id,7043.0,1.690473,0.833755,1.0,1.0,1.0,2.0,3.0,2.0
senior_citizen,7043.0,0.162147,0.368612,0.0,0.0,0.0,0.0,1.0,1.0
tenure,7043.0,32.371149,24.559481,0.0,9.0,29.0,55.0,72.0,72.0
monthly_charges,7043.0,64.761692,30.090047,18.25,35.5,70.35,89.85,118.75,100.5
total_charges,7043.0,2279.798992,2266.73017,18.8,398.55,1394.55,3786.6,8684.8,8666.0


---- 
**Data Exploration Continued:**

- removing unnecessary cols/data
- removing duplicate columns
- renaming col values/checking for errors

In [90]:
telco_df.head()

Unnamed: 0,internet_service_type_id,internet_service_type,payment_type_id,payment_type,contract_type_id,contract_type,customer_id,gender,senior_citizen,partner,...,online_security,online_backup,device_protection,tech_support,streaming_tv,streaming_movies,paperless_billing,monthly_charges,total_charges,churn
0,1,DSL,2,Mailed check,2,One year,0002-ORFBO,Female,0,Yes,...,No,Yes,No,Yes,Yes,No,Yes,65.6,593.3,No
1,1,DSL,2,Mailed check,1,Month-to-month,0003-MKNFE,Male,0,No,...,No,No,No,No,No,Yes,No,59.9,542.4,No
2,1,DSL,4,Credit card (automatic),1,Month-to-month,0013-MHZWF,Female,0,No,...,No,No,No,Yes,Yes,Yes,Yes,69.4,571.45,No
3,1,DSL,1,Electronic check,1,Month-to-month,0015-UOCOJ,Female,1,No,...,Yes,No,No,No,No,No,Yes,48.2,340.35,No
4,1,DSL,2,Mailed check,3,Two year,0016-QLJIS,Female,0,Yes,...,Yes,Yes,Yes,Yes,Yes,Yes,Yes,90.45,5957.9,No


In [91]:
# from the data, i conclude that the following columns can be dropped since the information has already been encoded in the telco dataset
# [internet_service_type_id, payment_type_id, contract_type_id]

telco_df = telco_df.drop(columns=['internet_service_type_id', 'payment_type_id', 'contract_type_id'])

In [92]:
telco_df.head()

Unnamed: 0,internet_service_type,payment_type,contract_type,customer_id,gender,senior_citizen,partner,dependents,tenure,phone_service,...,online_security,online_backup,device_protection,tech_support,streaming_tv,streaming_movies,paperless_billing,monthly_charges,total_charges,churn
0,DSL,Mailed check,One year,0002-ORFBO,Female,0,Yes,Yes,9,Yes,...,No,Yes,No,Yes,Yes,No,Yes,65.6,593.3,No
1,DSL,Mailed check,Month-to-month,0003-MKNFE,Male,0,No,No,9,Yes,...,No,No,No,No,No,Yes,No,59.9,542.4,No
2,DSL,Credit card (automatic),Month-to-month,0013-MHZWF,Female,0,No,Yes,9,Yes,...,No,No,No,Yes,Yes,Yes,Yes,69.4,571.45,No
3,DSL,Electronic check,Month-to-month,0015-UOCOJ,Female,1,No,No,7,Yes,...,Yes,No,No,No,No,No,Yes,48.2,340.35,No
4,DSL,Mailed check,Two year,0016-QLJIS,Female,0,Yes,Yes,65,Yes,...,Yes,Yes,Yes,Yes,Yes,Yes,Yes,90.45,5957.9,No


In [93]:
# for consistency, i may want the "senior_citizen" column to reflect either "yes" or "no" values (especially if creating a dummy variable for this categorical feature)

telco_df["senior_citizen"] = telco_df["senior_citizen"].replace({0: "No", 1: "Yes"})
telco_df.head()

Unnamed: 0,internet_service_type,payment_type,contract_type,customer_id,gender,senior_citizen,partner,dependents,tenure,phone_service,...,online_security,online_backup,device_protection,tech_support,streaming_tv,streaming_movies,paperless_billing,monthly_charges,total_charges,churn
0,DSL,Mailed check,One year,0002-ORFBO,Female,No,Yes,Yes,9,Yes,...,No,Yes,No,Yes,Yes,No,Yes,65.6,593.3,No
1,DSL,Mailed check,Month-to-month,0003-MKNFE,Male,No,No,No,9,Yes,...,No,No,No,No,No,Yes,No,59.9,542.4,No
2,DSL,Credit card (automatic),Month-to-month,0013-MHZWF,Female,No,No,Yes,9,Yes,...,No,No,No,Yes,Yes,Yes,Yes,69.4,571.45,No
3,DSL,Electronic check,Month-to-month,0015-UOCOJ,Female,Yes,No,No,7,Yes,...,Yes,No,No,No,No,No,Yes,48.2,340.35,No
4,DSL,Mailed check,Two year,0016-QLJIS,Female,No,Yes,Yes,65,Yes,...,Yes,Yes,Yes,Yes,Yes,Yes,Yes,90.45,5957.9,No


----
**Reviewing "object"/string type cols & values in the Telco Data**

- Consider dummy variables for future categorical exploration & analysis
- Pandas "get_dummies()" function 

In [94]:
telco_df.head()

Unnamed: 0,internet_service_type,payment_type,contract_type,customer_id,gender,senior_citizen,partner,dependents,tenure,phone_service,...,online_security,online_backup,device_protection,tech_support,streaming_tv,streaming_movies,paperless_billing,monthly_charges,total_charges,churn
0,DSL,Mailed check,One year,0002-ORFBO,Female,No,Yes,Yes,9,Yes,...,No,Yes,No,Yes,Yes,No,Yes,65.6,593.3,No
1,DSL,Mailed check,Month-to-month,0003-MKNFE,Male,No,No,No,9,Yes,...,No,No,No,No,No,Yes,No,59.9,542.4,No
2,DSL,Credit card (automatic),Month-to-month,0013-MHZWF,Female,No,No,Yes,9,Yes,...,No,No,No,Yes,Yes,Yes,Yes,69.4,571.45,No
3,DSL,Electronic check,Month-to-month,0015-UOCOJ,Female,Yes,No,No,7,Yes,...,Yes,No,No,No,No,No,Yes,48.2,340.35,No
4,DSL,Mailed check,Two year,0016-QLJIS,Female,No,Yes,Yes,65,Yes,...,Yes,Yes,Yes,Yes,Yes,Yes,Yes,90.45,5957.9,No


In [95]:
# 7043 observations/rows
# 21 features/columns

initial_shape = telco_df.shape
initial_shape

(7043, 21)

In [96]:
pd.Series(telco_df.select_dtypes(include = "object").columns)

0     internet_service_type
1              payment_type
2             contract_type
3               customer_id
4                    gender
5            senior_citizen
6                   partner
7                dependents
8             phone_service
9            multiple_lines
10          online_security
11            online_backup
12        device_protection
13             tech_support
14             streaming_tv
15         streaming_movies
16        paperless_billing
17                    churn
dtype: object

In [97]:
telco_df.select_dtypes(include = "object").columns

Index(['internet_service_type', 'payment_type', 'contract_type', 'customer_id',
       'gender', 'senior_citizen', 'partner', 'dependents', 'phone_service',
       'multiple_lines', 'online_security', 'online_backup',
       'device_protection', 'tech_support', 'streaming_tv', 'streaming_movies',
       'paperless_billing', 'churn'],
      dtype='object')

In [98]:
categorical_lst = ['internet_service_type', \
       'payment_type', \
       'contract_type', \
       'gender', \
       'senior_citizen', \
       'partner', \
       'dependents', \
       'phone_service', \
       'multiple_lines', \
       'online_security', \
       'online_backup', \
       'device_protection', \
       'tech_support', \
       'streaming_tv', \
       'streaming_movies', \
       'paperless_billing']

In [99]:
dummy_df = pd.get_dummies(telco_df[categorical_lst], drop_first=True)
dummy_df.head()

Unnamed: 0,internet_service_type_Fiber optic,internet_service_type_None,payment_type_Credit card (automatic),payment_type_Electronic check,payment_type_Mailed check,contract_type_One year,contract_type_Two year,gender_Male,senior_citizen_Yes,partner_Yes,...,online_backup_Yes,device_protection_No internet service,device_protection_Yes,tech_support_No internet service,tech_support_Yes,streaming_tv_No internet service,streaming_tv_Yes,streaming_movies_No internet service,streaming_movies_Yes,paperless_billing_Yes
0,0,0,0,0,1,1,0,0,0,1,...,1,0,0,0,1,0,1,0,0,1
1,0,0,0,0,1,0,0,1,0,0,...,0,0,0,0,0,0,0,0,1,0
2,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,1,0,1,0,1,1
3,0,0,0,1,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,1
4,0,0,0,0,1,0,1,0,0,1,...,1,0,1,0,1,0,1,0,1,1


In [100]:
telco_df = pd.concat([telco_df, dummy_df], axis = 1)

In [101]:
telco_df.head()

Unnamed: 0,internet_service_type,payment_type,contract_type,customer_id,gender,senior_citizen,partner,dependents,tenure,phone_service,...,online_backup_Yes,device_protection_No internet service,device_protection_Yes,tech_support_No internet service,tech_support_Yes,streaming_tv_No internet service,streaming_tv_Yes,streaming_movies_No internet service,streaming_movies_Yes,paperless_billing_Yes
0,DSL,Mailed check,One year,0002-ORFBO,Female,No,Yes,Yes,9,Yes,...,1,0,0,0,1,0,1,0,0,1
1,DSL,Mailed check,Month-to-month,0003-MKNFE,Male,No,No,No,9,Yes,...,0,0,0,0,0,0,0,0,1,0
2,DSL,Credit card (automatic),Month-to-month,0013-MHZWF,Female,No,No,Yes,9,Yes,...,0,0,0,0,1,0,1,0,1,1
3,DSL,Electronic check,Month-to-month,0015-UOCOJ,Female,Yes,No,No,7,Yes,...,0,0,0,0,0,0,0,0,0,1
4,DSL,Mailed check,Two year,0016-QLJIS,Female,No,Yes,Yes,65,Yes,...,1,0,1,0,1,0,1,0,1,1


In [102]:
# dataframe shape with dummy variables/cols

dummy_shape = telco_df.shape

print(f'Initial Telco Dataframe shape: {initial_shape}')
print(f'Secondary Telco Dataframe shape: {dummy_shape}')

Initial Telco Dataframe shape: (7043, 21)
Secondary Telco Dataframe shape: (7043, 48)


In [103]:
# let's check the data types after conversion:
# here i notice that dummy variables/columns are encoded as "uint8" type -- 
# i will convert these to "bool" type for future referencing 

telco_df.dtypes

internet_service_type                     object
payment_type                              object
contract_type                             object
customer_id                               object
gender                                    object
senior_citizen                            object
partner                                   object
dependents                                object
tenure                                     int64
phone_service                             object
multiple_lines                            object
online_security                           object
online_backup                             object
device_protection                         object
tech_support                              object
streaming_tv                              object
streaming_movies                          object
paperless_billing                         object
monthly_charges                          float64
total_charges                            float64
churn               

In [104]:
# creating a for loop to change the "uint8" columns to "bool" type

for col in telco_df.columns:
    if telco_df[col].dtype == "uint8":
        telco_df[col] = telco_df[col].astype("bool")

In [105]:
telco_df.head()

Unnamed: 0,internet_service_type,payment_type,contract_type,customer_id,gender,senior_citizen,partner,dependents,tenure,phone_service,...,online_backup_Yes,device_protection_No internet service,device_protection_Yes,tech_support_No internet service,tech_support_Yes,streaming_tv_No internet service,streaming_tv_Yes,streaming_movies_No internet service,streaming_movies_Yes,paperless_billing_Yes
0,DSL,Mailed check,One year,0002-ORFBO,Female,No,Yes,Yes,9,Yes,...,True,False,False,False,True,False,True,False,False,True
1,DSL,Mailed check,Month-to-month,0003-MKNFE,Male,No,No,No,9,Yes,...,False,False,False,False,False,False,False,False,True,False
2,DSL,Credit card (automatic),Month-to-month,0013-MHZWF,Female,No,No,Yes,9,Yes,...,False,False,False,False,True,False,True,False,True,True
3,DSL,Electronic check,Month-to-month,0015-UOCOJ,Female,Yes,No,No,7,Yes,...,False,False,False,False,False,False,False,False,False,True
4,DSL,Mailed check,Two year,0016-QLJIS,Female,No,Yes,Yes,65,Yes,...,True,False,True,False,True,False,True,False,True,True


----
### Univariate data exploration:

**key objectives:**

* Understand the trends and patterns of data
* Analyze the frequency and such characteristics of key data features
* Know the distribution of the variables in the data