In [2]:
import pandas as pd

## Read Data

This is telco customer churn dataset that we retrieved from kaggle website. According to kaggle, the data set includes information about:

* Customers who left within the last month – the column is called Churn
* Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
* Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges
* Demographic info about customers – gender, age range, and if they have partners and dependents

Dataset url: https://www.kaggle.com/datasets/yeanzc/telco-customer-churn-ibm-dataset

In [7]:
df = pd.read_excel("Telco_customer_churn.xlsx")

In [8]:
df.shape

(7043, 33)

## Data Type Check

In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 33 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   CustomerID         7043 non-null   object 
 1   Count              7043 non-null   int64  
 2   Country            7043 non-null   object 
 3   State              7043 non-null   object 
 4   City               7043 non-null   object 
 5   Zip Code           7043 non-null   int64  
 6   Lat Long           7043 non-null   object 
 7   Latitude           7043 non-null   float64
 8   Longitude          7043 non-null   float64
 9   Gender             7043 non-null   object 
 10  Senior Citizen     7043 non-null   object 
 11  Partner            7043 non-null   object 
 12  Dependents         7043 non-null   object 
 13  Tenure Months      7043 non-null   int64  
 14  Phone Service      7043 non-null   object 
 15  Multiple Lines     7043 non-null   object 
 16  Internet Service   7043 

'Total Charges' has to be float, not object.

## Convert Data Type

### 'Total Charges'

In [10]:
# count the value of total charges
df['Total Charges'].value_counts()

20.2      11
          11
19.75      9
19.65      8
20.05      8
          ..
444.75     1
5459.2     1
295.95     1
394.1      1
6844.5     1
Name: Total Charges, Length: 6531, dtype: int64

In [11]:
# replace the " " to 0
df['Total Charges'] = df['Total Charges'].replace(" ", 0)
df['Total Charges'].value_counts()

20.20      11
0.00       11
19.75       9
19.65       8
20.05       8
           ..
444.75      1
5459.20     1
295.95      1
394.10      1
6844.50     1
Name: Total Charges, Length: 6531, dtype: int64

In [13]:
# convert type
df['Total Charges'] = df['Total Charges'].astype(float)

In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 33 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   CustomerID         7043 non-null   object 
 1   Count              7043 non-null   int64  
 2   Country            7043 non-null   object 
 3   State              7043 non-null   object 
 4   City               7043 non-null   object 
 5   Zip Code           7043 non-null   int64  
 6   Lat Long           7043 non-null   object 
 7   Latitude           7043 non-null   float64
 8   Longitude          7043 non-null   float64
 9   Gender             7043 non-null   object 
 10  Senior Citizen     7043 non-null   object 
 11  Partner            7043 non-null   object 
 12  Dependents         7043 non-null   object 
 13  Tenure Months      7043 non-null   int64  
 14  Phone Service      7043 non-null   object 
 15  Multiple Lines     7043 non-null   object 
 16  Internet Service   7043 

'TotalCharges' successfully converted to float.

In [15]:
df.shape

(7043, 33)

In [22]:
df['Contract'].value_counts()

Month-to-month    3875
Two year          1695
One year          1473
Name: Contract, dtype: int64

After converting the dataset, there are 7043 observations with 21 columns. There are 2 floats, 1 int, and 18 objects type without null values.

### Save Data

In [16]:
df.to_pickle('/Users/sangjun/Desktop/MSDS_Course/MSDS_601/Final_Project/AfterWrangling.pkl')