### 1. Import Dependencies

How Binning Helps?

1. Non-linear relationship with target
- Binning can help capture non-linear patterns that a linear model might miss

2. Skewed distribution 
- Binning can smooth out skew and reduce the effect of extreme values

3. Interpretability is key
- Easier for business users to understand "age 18–25" than "age = 23"

4. Model is prone to overfitting
- Binning reduces granularity → fewer splits → less overfitting

5. Need to reduce cardinality
- Helps when a numeric column has too many unique values

6. Sparse or noisy data
- Binning can group rare or noisy values to improve signal strength

In [5]:
import os
import numpy as np
import pandas as pd # alias
import seaborn as sns
import matplotlib.pyplot as plt

### 2. basic processing

In [6]:
df = pd.read_csv('data/processed/ChurnModelling_Outliers_Handled.csv')
df.head()

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,Female,42.0,2,0.0,1,1,1,101348.88,1
1,608,Spain,Female,41.0,1,83807.86,1,0,1,112542.58,0
2,502,France,Female,42.0,8,159660.8,3,1,0,113931.57,1
3,699,France,Female,38.91,1,0.0,2,0,0,93826.63,0
4,850,Spain,Female,43.0,2,125510.82,1,1,1,79084.1,0


18 - 30 ==> Youngsters

30 - 54 ==> Old


350 - 580 => 'Poor'

580 - 670 => 'Fair'

670 - 740 => 'Good' 

740 - 800 => 'Very Good'

800 - 850 => Excellent

In [7]:
def custom_binning_credit_score(score):
    if score < 580:
        return 'Poor'
    if score < 670:
        return 'Fair'
    if score < 740:
        return 'Good'
    if score < 800:
        return 'Very Good'
    if score <= 850:
        return 'Excellent'
    else:
        assert True, "Credit Score can't go beyond 850"

df['CreditScoreBins'] = df['CreditScore'].apply(custom_binning_credit_score)
del df['CreditScore']

df.head(15)


Unnamed: 0,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited,CreditScoreBins
0,France,Female,42.0,2,0.0,1,1,1,101348.88,1,Fair
1,Spain,Female,41.0,1,83807.86,1,0,1,112542.58,0,Fair
2,France,Female,42.0,8,159660.8,3,1,0,113931.57,1,Poor
3,France,Female,38.91,1,0.0,2,0,0,93826.63,0,Good
4,Spain,Female,43.0,2,125510.82,1,1,1,79084.1,0,Excellent
5,Spain,Male,44.0,8,113755.78,2,1,0,149756.71,1,Fair
6,France,Male,50.0,7,0.0,2,1,1,10062.8,0,Excellent
7,Germany,Female,29.0,4,115046.74,4,1,0,119346.88,1,Poor
8,France,Male,44.0,4,142051.07,2,0,1,74940.5,0,Poor
9,France,Male,27.0,2,134603.88,1,1,1,71725.73,0,Good


In [8]:
df.to_csv('data/processed/ChurnModelling_Binning_Applied.csv', index=False)