# Churn Dataset Analysis

### Basic Information
- **Dataset Shape**: The dataset contains 15,929 rows and 19 columns.
- **Data Types**: The columns comprise a mix of numerical (`float64`) and categorical data (`object` for strings).
- **Missing Values**: There are missing values in the dataset.
- **Unique Values**: The dataset has a wide range of unique values across different columns, indicating a diverse dataset.

### Columns Overview
- **`vintage`**: Customer relationship duration with the bank (1,332 unique values).
- **`age`**: Age of the customer (90 unique values).
- **`gender`**: Gender of the customer (2 unique values - presumably male and female).
- **`dependents`**: Number of dependents of the customer (14 unique values).
- **`occupation`**: Occupation of the customer (5 unique categories).
- **`city`**: City codes (1,288 unique values).
- **`customer_nw_category`**: Customer net worth category (3 unique categories).
- **`branch_code`**: Code of the branch (2,806 unique values).
- **Financial columns**: Several columns like `current_balance`, `previous_month_end_balance`, `average_monthly_balance_prevQ`, and similar others indicating various aspects of the customer's account balance and transactions.
- **`churn`**: Customer churn status (2 unique values, likely indicating churned or not churned).

## Reading Files into Python

In [5]:
# importing libraries
import pandas as pd

In [7]:
pwd  

'C:\\Users\\Prashant\\Downloads\\AV\\Black Belt Workshop\\Student Copy'

In [8]:
#importing data
file_path = r"C:\Users\Prashant\Downloads\AV\bbplus"  # folder location of your dataset
data = pd.read_csv(file_path + r'\churn_prediction.csv')

In [9]:
data = pd.read_csv('churn_prediction.csv')  
# this code works only when the CSV file is present in the same folder as THIS 
# Jupyter notebook

In [10]:
#first 5 instances using "head()" function
data.head()

Unnamed: 0,customer_id,vintage,age,gender,dependents,occupation,city,customer_nw_category,branch_code,current_balance,...,average_monthly_balance_prevQ,average_monthly_balance_prevQ2,current_month_credit,previous_month_credit,current_month_debit,previous_month_debit,current_month_balance,previous_month_balance,last_transaction,churn
0,1,2401,66,Male,0.0,self_employed,187.0,2,755,1458.71,...,1458.71,1449.07,0.2,0.2,0.2,0.2,1458.71,1458.71,2019-05-21,0.0
1,2,2648,35,Male,0.0,self_employed,,2,3214,5390.37,...,7799.26,12419.41,0.56,0.56,5486.27,100.56,6496.78,8787.61,2019-11-01,0.0
2,4,2494,31,Male,0.0,salaried,146.0,2,41,3913.16,...,4910.17,2815.94,0.61,0.61,6046.73,259.23,5006.28,5070.14,NaT,0.0
3,5,2629,90,,,self_employed,1020.0,2,582,2291.91,...,2084.54,1006.54,0.47,0.47,0.47,2143.33,2291.91,1669.79,2019-08-06,1.0
4,6,1879,42,Male,2.0,self_employed,1494.0,3,388,927.72,...,1643.31,1871.12,0.33,714.61,588.62,1538.06,1157.15,1677.16,2019-11-03,1.0


In [12]:
#last 10 instances using "tail()" function
data.tail(10)

Unnamed: 0,customer_id,vintage,age,gender,dependents,occupation,city,customer_nw_category,branch_code,current_balance,...,average_monthly_balance_prevQ,average_monthly_balance_prevQ2,current_month_credit,previous_month_credit,current_month_debit,previous_month_debit,current_month_balance,previous_month_balance,last_transaction,churn
15920,16988,2536,54,Male,1.0,self_employed,318.0,2,107,73079.82,...,67526.73,66262.07,5514.76,1500.47,0.47,0.47,68896.57,67489.5,2019-12-25,0.0
15921,16989,2659,48,Female,0.0,self_employed,646.0,3,397,322.65,...,1454.06,1737.29,0.5,3.36,1429.07,0.5,864.52,1749.28,2019-11-23,1.0
15922,16990,2288,32,Male,0.0,salaried,751.0,2,138,3803.44,...,3424.69,6470.93,714.53,1985.96,1300.24,2717.39,4077.26,3266.66,2019-12-18,0.0
15923,16991,2683,45,Female,,self_employed,78.0,1,255,20903.07,...,23593.8,18019.46,0.06,0.06,6428.63,1119.47,24093.37,24120.65,2019-10-16,0.0
15924,16992,2175,40,Male,0.0,salaried,698.0,2,231,9996.84,...,5883.0,191.49,0.1,9923.99,0.1,0.1,10009.28,7483.44,2019-11-10,0.0
15925,16993,2500,64,Female,0.0,self_employed,409.0,3,934,309.64,...,1738.15,300.91,0.04,46961.44,5148.76,41642.9,2733.69,2340.96,2019-11-28,1.0
15926,16994,2470,44,Male,0.0,salaried,1020.0,2,15,1049.67,...,3622.0,6016.92,0.06,966.96,1428.63,5714.34,2100.3,1846.93,2019-12-01,1.0
15927,16995,2537,34,Female,1.0,self_employed,557.0,2,284,1106.86,...,2729.22,2170.8,8732.47,1714.39,8618.2,1142.96,5726.47,605.49,2019-12-07,0.0
15928,16996,2287,56,Female,0.0,self_employed,1540.0,2,670,2098.29,...,2122.82,2131.59,362.14,0.26,143.11,314.54,2132.49,2083.49,2019-12-07,0.0
15929,16997,2656,70,Male,0.0,self_employed,1096.0,2,134,18530.6,...,18199.41,17473.74,150.27,156.55,0.19,,,,,


In [13]:
#finding out the shape of the data using "shape" variable: Output (rows, columns)
data.shape

(15930, 21)

In [16]:
#Printing all the columns present in data
data.columns

Index(['customer_id', 'vintage', 'age', 'gender', 'dependents', 'occupation',
       'city', 'customer_nw_category', 'branch_code', 'current_balance',
       'previous_month_end_balance', 'average_monthly_balance_prevQ',
       'average_monthly_balance_prevQ2', 'current_month_credit',
       'previous_month_credit', 'current_month_debit', 'previous_month_debit',
       'current_month_balance', 'previous_month_balance', 'last_transaction',
       'churn'],
      dtype='object')

## Variable Identification and Typecasting

In [20]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15930 entries, 0 to 15929
Data columns (total 21 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   customer_id                     15930 non-null  int64  
 1   vintage                         15930 non-null  int64  
 2   age                             15930 non-null  int64  
 3   gender                          15642 non-null  object 
 4   dependents                      14582 non-null  float64
 5   occupation                      15883 non-null  object 
 6   city                            15470 non-null  float64
 7   customer_nw_category            15930 non-null  int64  
 8   branch_code                     15930 non-null  int64  
 9   current_balance                 15930 non-null  float64
 10  previous_month_end_balance      15930 non-null  float64
 11  average_monthly_balance_prevQ   15930 non-null  float64
 12  average_monthly_balance_prevQ2  

In [19]:
# A closer look at the data types present in the data
data.dtypes

customer_id                         int64
vintage                             int64
age                                 int64
gender                             object
dependents                        float64
occupation                         object
city                              float64
customer_nw_category                int64
branch_code                         int64
current_balance                   float64
previous_month_end_balance        float64
average_monthly_balance_prevQ     float64
average_monthly_balance_prevQ2    float64
current_month_credit              float64
previous_month_credit             float64
current_month_debit               float64
previous_month_debit              float64
current_month_balance             float64
previous_month_balance            float64
last_transaction                   object
churn                             float64
dtype: object

There are a lot of variables visible at one, so let's narrow this down by looking **at one datatype at once**. We will start with int


### Integer Data Type

In [24]:
# Identifying variables with integer datatype
sum(data.dtypes == "int64" )  # we have 5 columns with int64 datatype

5

In [25]:
data.dtypes[data.dtypes == "int64"]

customer_id             int64
vintage                 int64
age                     int64
customer_nw_category    int64
branch_code             int64
dtype: object

In [27]:
integer_columns = data.select_dtypes(include=['int64', 'float64'])
integer_columns  # part of the original dataframe including ONLY the "int64" & "float64" columns 

Unnamed: 0,customer_id,vintage,age,dependents,city,customer_nw_category,branch_code,current_balance,previous_month_end_balance,average_monthly_balance_prevQ,average_monthly_balance_prevQ2,current_month_credit,previous_month_credit,current_month_debit,previous_month_debit,current_month_balance,previous_month_balance,churn
0,1,2401,66,0.0,187.0,2,755,1458.71,1458.71,1458.71,1449.07,0.20,0.20,0.20,0.20,1458.71,1458.71,0.0
1,2,2648,35,0.0,,2,3214,5390.37,8704.66,7799.26,12419.41,0.56,0.56,5486.27,100.56,6496.78,8787.61,0.0
2,4,2494,31,0.0,146.0,2,41,3913.16,5815.29,4910.17,2815.94,0.61,0.61,6046.73,259.23,5006.28,5070.14,0.0
3,5,2629,90,,1020.0,2,582,2291.91,2291.91,2084.54,1006.54,0.47,0.47,0.47,2143.33,2291.91,1669.79,1.0
4,6,1879,42,2.0,1494.0,3,388,927.72,1401.72,1643.31,1871.12,0.33,714.61,588.62,1538.06,1157.15,1677.16,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15925,16993,2500,64,0.0,409.0,3,934,309.64,5458.35,1738.15,300.91,0.04,46961.44,5148.76,41642.90,2733.69,2340.96,1.0
15926,16994,2470,44,0.0,1020.0,2,15,1049.67,2716.22,3622.00,6016.92,0.06,966.96,1428.63,5714.34,2100.30,1846.93,1.0
15927,16995,2537,34,1.0,557.0,2,284,1106.86,992.59,2729.22,2170.80,8732.47,1714.39,8618.20,1142.96,5726.47,605.49,0.0
15928,16996,2287,56,0.0,1540.0,2,670,2098.29,1879.26,2122.82,2131.59,362.14,0.26,143.11,314.54,2132.49,2083.49,0.0


Summary:

*    **Customer id** are a unique number assigned to customers. It is **Okay to consider this as an Integer**. This variable would not considered for our analysis.

*    **branch code** again represents different branches, therefore it should be **convereted to category**.

*    **Age** and **Vintage** are also numbers and hence we are okay with them as integers.

*    **customer_networth_category** is supposed to be an ordinal category, **should be converted to category**.

*    **churn** : 1 represents the churn and 0 represents not churn. However, there is no comparison between these two categories. This **needs to be converted to category datatype**.


In [28]:
data.describe()

Unnamed: 0,customer_id,vintage,age,dependents,city,customer_nw_category,branch_code,current_balance,previous_month_end_balance,average_monthly_balance_prevQ,average_monthly_balance_prevQ2,current_month_credit,previous_month_credit,current_month_debit,previous_month_debit,current_month_balance,previous_month_balance,churn
count,15930.0,15930.0,15930.0,14582.0,15470.0,15930.0,15930.0,15930.0,15930.0,15930.0,15930.0,15930.0,15930.0,15930.0,15929.0,15929.0,15929.0,15929.0
mean,8498.085625,2392.437728,48.174576,0.356741,795.157207,2.221846,927.797928,7271.421,7386.992,7433.499,7183.789,3279.202,3419.49,3498.199,3466.232,7362.582,7438.765,0.18752
std,4910.076657,272.486481,17.865815,1.086172,430.395665,0.662057,937.402995,50511.91,49442.42,48935.4,55285.74,31247.26,35412.68,28257.28,27720.81,49338.07,49736.15,0.39034
min,1.0,438.0,1.0,0.0,0.0,1.0,1.0,-5503.96,-2998.64,1428.69,-16506.1,0.01,0.01,0.01,0.01,-3374.18,-3060.13,0.0
25%,4245.25,2260.0,35.0,0.0,409.0,2.0,176.0,1776.387,1908.638,2168.155,1822.055,0.31,0.33,0.41,0.41,1988.66,2077.04,0.0
50%,8485.5,2454.0,46.0,0.0,834.0,2.0,578.0,3276.645,3381.18,3520.035,3373.785,0.61,0.64,94.4,112.48,3445.41,3450.51,0.0
75%,12752.75,2593.0,60.0,0.0,1096.0,3.0,1437.0,6624.458,6686.012,6676.24,6532.695,707.5825,771.8425,1393.23,1357.73,6658.49,6679.81,0.0
max,16997.0,2776.0,90.0,52.0,1648.0,3.0,4753.0,5905904.0,5740439.0,5700290.0,5010170.0,1764286.0,2361808.0,1764286.0,1414168.0,5778185.0,5720144.0,1.0


In [29]:
data["churn"].dtype

dtype('float64')

In [34]:
# converting churn, branch code and customer_nw_category  to "category"
data['churn'] = data['churn'].astype('category')
data['branch_code'] = data['branch_code'].astype('category')
data['customer_nw_category'] = data['customer_nw_category'].astype('category')

In [35]:
data.dtypes

customer_id                          int64
vintage                              int64
age                                  int64
gender                              object
dependents                         float64
occupation                          object
city                               float64
customer_nw_category              category
branch_code                       category
current_balance                    float64
previous_month_end_balance         float64
average_monthly_balance_prevQ      float64
average_monthly_balance_prevQ2     float64
current_month_credit               float64
previous_month_credit              float64
current_month_debit                float64
previous_month_debit               float64
current_month_balance              float64
previous_month_balance             float64
last_transaction                    object
churn                             category
dtype: object

In [36]:
data.dtypes[data.dtypes == 'int64']

customer_id    int64
vintage        int64
age            int64
dtype: object

In [40]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15930 entries, 0 to 15929
Data columns (total 21 columns):
 #   Column                          Non-Null Count  Dtype   
---  ------                          --------------  -----   
 0   customer_id                     15930 non-null  int64   
 1   vintage                         15930 non-null  int64   
 2   age                             15930 non-null  int64   
 3   gender                          15642 non-null  object  
 4   dependents                      14582 non-null  float64 
 5   occupation                      15883 non-null  object  
 6   city                            15470 non-null  float64 
 7   customer_nw_category            15930 non-null  category
 8   branch_code                     15930 non-null  category
 9   current_balance                 15930 non-null  float64 
 10  previous_month_end_balance      15930 non-null  float64 
 11  average_monthly_balance_prevQ   15930 non-null  float64 
 12  average_monthly_ba

### Float Data Type

In [38]:
# Identifying variables with float datatype
data.dtypes[data.dtypes == 'float64']

dependents                        float64
city                              float64
current_balance                   float64
previous_month_end_balance        float64
average_monthly_balance_prevQ     float64
average_monthly_balance_prevQ2    float64
current_month_credit              float64
previous_month_credit             float64
current_month_debit               float64
previous_month_debit              float64
current_month_balance             float64
previous_month_balance            float64
dtype: object

Summary:

*    **dependents** is expected to be a whole number. **Should be changed to integer type**

*    **city** variable is also a unique code of a city represented by some interger number. **Should be converted to Category type**

*    Rest of the variables like **credit, balance and debit** are best represented by the float variables.

In [50]:
# converting "dependents" and "city" to their respective types
data['dependents'] = data['dependents'].astype("Int64")
data['city'] = data['city'].astype('category')

# checking
data[['dependents','city']].dtypes

dependents       Int64
city          category
dtype: object

### Object Data Type

In [51]:
data.dtypes

customer_id                          int64
vintage                              int64
age                                  int64
gender                              object
dependents                           Int64
occupation                          object
city                              category
customer_nw_category              category
branch_code                       category
current_balance                    float64
previous_month_end_balance         float64
average_monthly_balance_prevQ      float64
average_monthly_balance_prevQ2     float64
current_month_credit               float64
previous_month_credit              float64
current_month_debit                float64
previous_month_debit               float64
current_month_balance              float64
previous_month_balance             float64
last_transaction                    object
churn                             category
dtype: object

*    **variables like 'gender', 'occupation' and 'last_transaction' are of type object**. This means that **Pandas was not able to recognise the datatype** of these three variables.

In [52]:
# Manually checking object types
data[['gender','occupation','last_transaction']].head(7)

Unnamed: 0,gender,occupation,last_transaction
0,Male,self_employed,2019-05-21
1,Male,self_employed,2019-11-01
2,Male,salaried,NaT
3,,self_employed,2019-08-06
4,Male,self_employed,2019-11-03
5,Female,self_employed,2019-11-01
6,Male,retired,2019-09-24


*    **gender** and **occupation** variables **belong to categorical data types**.
*    **last_transaction** should be a  **datetime variable**.

In [53]:
# typecasting "gender" and "occupation" to category type
data['gender'] = data['gender'].astype('category')
data['occupation'] = data['occupation'].astype('category')

In [54]:
# checking
data[['gender','occupation']].dtypes

gender        category
occupation    category
dtype: object

In [55]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15930 entries, 0 to 15929
Data columns (total 21 columns):
 #   Column                          Non-Null Count  Dtype   
---  ------                          --------------  -----   
 0   customer_id                     15930 non-null  int64   
 1   vintage                         15930 non-null  int64   
 2   age                             15930 non-null  int64   
 3   gender                          15642 non-null  category
 4   dependents                      14582 non-null  Int64   
 5   occupation                      15883 non-null  category
 6   city                            15470 non-null  category
 7   customer_nw_category            15930 non-null  category
 8   branch_code                     15930 non-null  category
 9   current_balance                 15930 non-null  float64 
 10  previous_month_end_balance      15930 non-null  float64 
 11  average_monthly_balance_prevQ   15930 non-null  float64 
 12  average_monthly_ba

In [56]:
100*(2.6-2.1)/2.6 

19.23076923076923

### datetime Data Type

In [57]:
# Convert the "last_transaction" column into datetime object.
data['last_transaction'] = pd.to_datetime(data['last_transaction'])

In [58]:
data.dtypes

customer_id                                int64
vintage                                    int64
age                                        int64
gender                                  category
dependents                                 Int64
occupation                              category
city                                    category
customer_nw_category                    category
branch_code                             category
current_balance                          float64
previous_month_end_balance               float64
average_monthly_balance_prevQ            float64
average_monthly_balance_prevQ2           float64
current_month_credit                     float64
previous_month_credit                    float64
current_month_debit                      float64
previous_month_debit                     float64
current_month_balance                    float64
previous_month_balance                   float64
last_transaction                  datetime64[ns]
churn               

In [59]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15930 entries, 0 to 15929
Data columns (total 21 columns):
 #   Column                          Non-Null Count  Dtype         
---  ------                          --------------  -----         
 0   customer_id                     15930 non-null  int64         
 1   vintage                         15930 non-null  int64         
 2   age                             15930 non-null  int64         
 3   gender                          15642 non-null  category      
 4   dependents                      14582 non-null  Int64         
 5   occupation                      15883 non-null  category      
 6   city                            15470 non-null  category      
 7   customer_nw_category            15930 non-null  category      
 8   branch_code                     15930 non-null  category      
 9   current_balance                 15930 non-null  float64       
 10  previous_month_end_balance      15930 non-null  float64       
 11  av

In [60]:
# extracting new columns from "last_transaction"  >> Feature Engineering

# last day of year when transaction was done
data['doy_ls_tran'] = data.last_transaction.dt.dayofyear

# week of year when last transaction was done
data['woy_ls_tran'] = data.last_transaction.dt.isocalendar().week

# month of year when last transaction was done
data['moy_ls_tran'] = data.last_transaction.dt.month

# day of week when last transaction was done
data['dow_ls_tran'] = data.last_transaction.dt.dayofweek

In [61]:
# checking new extracted columns using datetime
data[['last_transaction','doy_ls_tran','woy_ls_tran','moy_ls_tran','dow_ls_tran']].head()

Unnamed: 0,last_transaction,doy_ls_tran,woy_ls_tran,moy_ls_tran,dow_ls_tran
0,2019-05-21,141.0,21.0,5.0,1.0
1,2019-11-01,305.0,44.0,11.0,4.0
2,NaT,,,,
3,2019-08-06,218.0,32.0,8.0,1.0
4,2019-11-03,307.0,44.0,11.0,6.0


The first column is the complete date of the last transaction which was done by the any given customer.

The next columns represent the day of year, week of year, month of year, day of week when the last transaction was done.

**Breaking down the date variable** into these granular information will **help us in understand when the last transaction was done from different perspectives**. Now that we have extracted the essentials from the last_transaction variables, we will drop it from the dataset.



In [62]:
# Drop the last_transaction column
# data = data.drop(columns = ['last_transaction'])
# data.drop(data['last_transaction', axis=1, inplace=True)

data.drop('last_transaction', axis=1)

Unnamed: 0,customer_id,vintage,age,gender,dependents,occupation,city,customer_nw_category,branch_code,current_balance,...,previous_month_credit,current_month_debit,previous_month_debit,current_month_balance,previous_month_balance,churn,doy_ls_tran,woy_ls_tran,moy_ls_tran,dow_ls_tran
0,1,2401,66,Male,0,self_employed,187.0,2,755,1458.71,...,0.20,0.20,0.20,1458.71,1458.71,0.0,141.0,21,5.0,1.0
1,2,2648,35,Male,0,self_employed,,2,3214,5390.37,...,0.56,5486.27,100.56,6496.78,8787.61,0.0,305.0,44,11.0,4.0
2,4,2494,31,Male,0,salaried,146.0,2,41,3913.16,...,0.61,6046.73,259.23,5006.28,5070.14,0.0,,,,
3,5,2629,90,,,self_employed,1020.0,2,582,2291.91,...,0.47,0.47,2143.33,2291.91,1669.79,1.0,218.0,32,8.0,1.0
4,6,1879,42,Male,2,self_employed,1494.0,3,388,927.72,...,714.61,588.62,1538.06,1157.15,1677.16,1.0,307.0,44,11.0,6.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15925,16993,2500,64,Female,0,self_employed,409.0,3,934,309.64,...,46961.44,5148.76,41642.90,2733.69,2340.96,1.0,332.0,48,11.0,3.0
15926,16994,2470,44,Male,0,salaried,1020.0,2,15,1049.67,...,966.96,1428.63,5714.34,2100.30,1846.93,1.0,335.0,48,12.0,6.0
15927,16995,2537,34,Female,1,self_employed,557.0,2,284,1106.86,...,1714.39,8618.20,1142.96,5726.47,605.49,0.0,341.0,49,12.0,5.0
15928,16996,2287,56,Female,0,self_employed,1540.0,2,670,2098.29,...,0.26,143.11,314.54,2132.49,2083.49,0.0,341.0,49,12.0,5.0


In [63]:
data.columns

Index(['customer_id', 'vintage', 'age', 'gender', 'dependents', 'occupation',
       'city', 'customer_nw_category', 'branch_code', 'current_balance',
       'previous_month_end_balance', 'average_monthly_balance_prevQ',
       'average_monthly_balance_prevQ2', 'current_month_credit',
       'previous_month_credit', 'current_month_debit', 'previous_month_debit',
       'current_month_balance', 'previous_month_balance', 'last_transaction',
       'churn', 'doy_ls_tran', 'woy_ls_tran', 'moy_ls_tran', 'dow_ls_tran'],
      dtype='object')

In [64]:
data.drop('last_transaction', axis=1, inplace=True)

In [65]:
data.dtypes

customer_id                          int64
vintage                              int64
age                                  int64
gender                            category
dependents                           Int64
occupation                        category
city                              category
customer_nw_category              category
branch_code                       category
current_balance                    float64
previous_month_end_balance         float64
average_monthly_balance_prevQ      float64
average_monthly_balance_prevQ2     float64
current_month_credit               float64
previous_month_credit              float64
current_month_debit                float64
previous_month_debit               float64
current_month_balance              float64
previous_month_balance             float64
churn                             category
doy_ls_tran                        float64
woy_ls_tran                         UInt32
moy_ls_tran                        float64
dow_ls_tran

In [67]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15930 entries, 0 to 15929
Data columns (total 24 columns):
 #   Column                          Non-Null Count  Dtype   
---  ------                          --------------  -----   
 0   customer_id                     15930 non-null  int64   
 1   vintage                         15930 non-null  int64   
 2   age                             15930 non-null  int64   
 3   gender                          15642 non-null  category
 4   dependents                      14582 non-null  Int64   
 5   occupation                      15883 non-null  category
 6   city                            15470 non-null  category
 7   customer_nw_category            15930 non-null  category
 8   branch_code                     15930 non-null  category
 9   current_balance                 15930 non-null  float64 
 10  previous_month_end_balance      15930 non-null  float64 
 11  average_monthly_balance_prevQ   15930 non-null  float64 
 12  average_monthly_ba

In [68]:
# Save your dataset for further analysis
data.to_csv("customer_churn_v1.csv", index=False)
data.to_pickle("customer_churn_v1.pkl")