# Bank Customer Churn Prediction

`Author:` [Prashant Sharma](https://github.com/Prashantpq)\
`Date:` 20.December.2024\
`Dataset:` [Bank Customer Churn Prediction](https://www.kaggle.com/datasets/shubhammeshram579/bank-customer-churn-prediction)

### About Dataset (Meta data)
#### Context
The bank customer churn dataset is a commonly used dataset for predicting customer churn in the banking industry. It contains information on bank customers who either left the bank or continue to be a customer.

### Content
#### Column Descriptions:
* `Customer ID:` A unique identifier for each customer.
* `Surname:` The customer's surname or last name.
* `Credit Score:` A numerical value representing the customer's credit score.
* `Geography:` The country where the customer resides (France, Spain or Germany).
* `Gender:` The customer's gender (Male or Female).
* `Age:`  The customer's age.
* `Tenure:` The number of years the customer has been with the bank.
* `Balance:` The customer's account balance.
* `NumOfProducts:` The number of bank products the customer uses (e.g., savings account, credit card).
* `MonthsHasCrCard:` Whether the customer has a credit card (1 = yes, 0 = no).
* `IsActiveMember:` Whether the customer is an active member (1 = yes, 0 = no).
* `EstimatedSalary:`  The estimated salary of the customer.
* `Exited:` Whether the customer has churned (1 = yes, 0 = no).

# `Import Libraries`

In [1]:
# Import libraries

# Data manipulation and analysis
import numpy as np
import pandas as pd

# Data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Machine learning models and utilities
from sklearn.preprocessing import StandardScaler
from sklearn.mixture import GaussianMixture
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# `Load the Dataset`

In [2]:
df = pd.read_csv('Data/Customer.csv')
# Show top 10 rows
df.head(10)

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42.0,2,0.0,1,1.0,1.0,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41.0,1,83807.86,1,0.0,1.0,112542.58,0
2,3,15619304,Onio,502,France,Female,42.0,8,159660.8,3,1.0,0.0,113931.57,1
3,4,15701354,Boni,699,France,Female,39.0,1,0.0,2,0.0,0.0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43.0,2,125510.82,1,,1.0,79084.1,0
5,6,15574012,Chu,645,Spain,Male,44.0,8,113755.78,2,1.0,0.0,149756.71,1
6,7,15592531,Bartlett,822,,Male,50.0,7,0.0,2,1.0,1.0,10062.8,0
7,8,15656148,Obinna,376,Germany,Female,29.0,4,115046.74,4,1.0,0.0,119346.88,1
8,9,15792365,He,501,France,Male,44.0,4,142051.07,2,0.0,,74940.5,0
9,10,15592389,H?,684,France,Male,,2,134603.88,1,1.0,1.0,71725.73,0


# `Data Preprocessing`

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10002 entries, 0 to 10001
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10002 non-null  int64  
 1   CustomerId       10002 non-null  int64  
 2   Surname          10002 non-null  object 
 3   CreditScore      10002 non-null  int64  
 4   Geography        10001 non-null  object 
 5   Gender           10002 non-null  object 
 6   Age              10001 non-null  float64
 7   Tenure           10002 non-null  int64  
 8   Balance          10002 non-null  float64
 9   NumOfProducts    10002 non-null  int64  
 10  HasCrCard        10001 non-null  float64
 11  IsActiveMember   10001 non-null  float64
 12  EstimatedSalary  10002 non-null  float64
 13  Exited           10002 non-null  int64  
dtypes: float64(5), int64(6), object(3)
memory usage: 1.1+ MB
