<a href="https://colab.research.google.com/github/medhhaa/Bank-Churn-Prediction/blob/main/Bank_Churner_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction
A business manager of a consumer credit card portfolio is facing the problem of customer attrition. They want to analyze the data to find out the reason behind this and leverage the same to predict customers who are likely to drop off.

This dataset is sourced from https://www.kaggle.com/datasets/sakshigoyal7/credit-card-customers/data

A manager at the bank is disturbed with more and more customers leaving their credit card services. They would really appreciate if one could predict for them who is gonna get churned so they can proactively go to the customer to provide them better services and turn customers' decisions in the opposite direction

Now, this dataset consists of 10,000 customers mentioning their age, salary, marital_status, credit card limit, credit card category, etc. There are nearly 18 features.

We have only 16.07% of customers who have churned. Thus, it's a bit difficult to train our model to predict churning customers.


### 1.1 Import Libraries

In [2]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns



### 1.2 Load Data

In [3]:
df = pd.read_csv('BankChurners.csv')
df = df[df.columns[:-2]]
df.head(5)

Unnamed: 0,CLIENTNUM,Attrition_Flag,Customer_Age,Gender,Dependent_count,Education_Level,Marital_Status,Income_Category,Card_Category,Months_on_book,...,Months_Inactive_12_mon,Contacts_Count_12_mon,Credit_Limit,Total_Revolving_Bal,Avg_Open_To_Buy,Total_Amt_Chng_Q4_Q1,Total_Trans_Amt,Total_Trans_Ct,Total_Ct_Chng_Q4_Q1,Avg_Utilization_Ratio
0,768805383,Existing Customer,45,M,3,High School,Married,$60K - $80K,Blue,39,...,1,3,12691.0,777,11914.0,1.335,1144,42,1.625,0.061
1,818770008,Existing Customer,49,F,5,Graduate,Single,Less than $40K,Blue,44,...,1,2,8256.0,864,7392.0,1.541,1291,33,3.714,0.105
2,713982108,Existing Customer,51,M,3,Graduate,Married,$80K - $120K,Blue,36,...,1,0,3418.0,0,3418.0,2.594,1887,20,2.333,0.0


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10127 entries, 0 to 10126
Data columns (total 21 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   CLIENTNUM                 10127 non-null  int64  
 1   Attrition_Flag            10127 non-null  object 
 2   Customer_Age              10127 non-null  int64  
 3   Gender                    10127 non-null  object 
 4   Dependent_count           10127 non-null  int64  
 5   Education_Level           10127 non-null  object 
 6   Marital_Status            10127 non-null  object 
 7   Income_Category           10127 non-null  object 
 8   Card_Category             10127 non-null  object 
 9   Months_on_book            10127 non-null  int64  
 10  Total_Relationship_Count  10127 non-null  int64  
 11  Months_Inactive_12_mon    10127 non-null  int64  
 12  Contacts_Count_12_mon     10127 non-null  int64  
 13  Credit_Limit              10127 non-null  float64
 14  Total_

Inference: The data does not have any missing values.

### Understanding Dataset

- **CLIENTNUM**: Client number. Unique identifier for the customer holding the account  
- **Attrition_Flag**: Internal event (customer activity) variable – if the account is closed then 1 else 0  
- **Customer_Age**: Demographic variable – customer’s age in years  
- **Gender**: Demographic variable – M = Male, F = Female  
- **Dependent_count**: Demographic variable – number of dependents  
- **Education_Level**: Demographic variable – highest educational qualification of the account holder (e.g., high school, graduate, etc.)  
- **Marital_Status**: Demographic variable – Married, Single, Divorced, Unknown  
- **Income_Category**: Demographic variable – annual income bracket (< $40K, $40K–$60K, $60K–$80K, $80K–$120K, > $120K)  
- **Card_Category**: Product variable – type of card (Blue, Silver, Gold, Platinum)  
- **Months_on_book**: Relationship variable – number of months the customer has held the account  
- **Total_Relationship_Count**: Relationship variable – total number of products held by the customer  
- **Months_Inactive_12_mon**: Activity variable – number of months inactive in the last 12 months  
- **Contacts_Count_12_mon**: Engagement variable – number of contacts in the last 12 months  
- **Credit_Limit**: Financial variable – credit limit on the credit card  
- **Total_Revolving_Bal**: Financial variable – total revolving balance on the credit card  
- **Avg_Open_To_Buy**: Financial variable – average open-to-buy credit line over the last 12 months  
- **Total_Amt_Chng_Q4_Q1**: Behavioral variable – change in transaction amount (Q4 over Q1)  
- **Total_Trans_Amt**: Behavioral variable – total transaction amount in the last 12 months  
- **Total_Trans_Ct**: Behavioral variable – total transaction count in the last 12 months  
- **Total_Ct_Chng_Q4_Q1**: Behavioral variable – change in transaction count (Q4 over Q1)  
- **Avg_Utilization_Ratio**: Utilization variable – average card utilization ratio (revolving balance ÷ credit limit)
