# Demo 2 - Customer Retention at EzyBank

##**Scenario: Customer Churn Prediction**

EzyBank, a fast-growing digital bank, is experiencing increasing customer churn. The analytics team is tasked with improving the accuracy of their churn prediction model. Initial experiments with individual models like Logistic Regression and Decision Tree showed average performance. Now, the team aims to build a stacking ensemble model combining Logistic Regression, Random Forest, and K-Nearest Neighbors as base models, with a Gradient Boosting classifier as the meta-learner, to improve classification accuracy.

##**Objective:**

* We aim to improve customer churn classification using stacking, combining Logistic Regression, Random Forest, and KNN as base models, and Gradient Boosting as the meta-model.

##Step 1: Import Required Libraries
We import all the models, tools, and metrics needed for preprocessing, training, stacking, and evaluation.

In [1]:
# Import libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, accuracy_score

##Step 2: Load Dataset

In [2]:
# Load dataset
df = pd.read_csv("ezybank_churn_dataset.csv")

## Step 3: Separate Features and Target

Drop customer_id since it's an identifier and not useful for prediction.

X contains input features.

y is the target variable we want to predict: churned (0 = stayed, 1 = churned).

In [3]:
# Features and target
X = df.drop(columns=["customer_id", "churned"])
y = df["churned"]

##Step 4: Train-Test Split

In [4]:
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

##Step 5: Scale Features

Standardize features (mean = 0, std = 1) for better performance with models like KNN and Logistic Regression.

Always fit on training data and transform both train and test.

In [5]:
# Feature Scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

##Step 6: Define Base Learners and Meta Learner

We choose 3 diverse models as base learners:

LogisticRegression: linear

RandomForest: ensemble of trees

KNN: instance-based

The meta-learner (GBM) learns from their predictions.



In [6]:
# Define base learners
base_learners = [
    ('lr', LogisticRegression()),
    ('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
    ('knn', KNeighborsClassifier(n_neighbors=5))
]

# Meta learner
meta_learner = GradientBoostingClassifier(n_estimators=100, random_state=42)

##Step 7: Create and Train Stacking Model

We build a stacked model using base models and a final model (meta-learner).
The cv=5 means 5-fold cross-validation is used to train the meta-learner.



In [7]:
# Stacking classifier
stack_model = StackingClassifier(estimators=base_learners, final_estimator=meta_learner, cv=5)

# Train stacking model
stack_model.fit(X_train_scaled, y_train)

##Step 8: Make Predictions and Evaluate

We predict on the test set.

We print metrics like precision, recall, F1-score, and overall accuracy.

In [8]:
# Predict
y_pred = stack_model.predict(X_test_scaled)

In [9]:
# Evaluate
print("Classification Report:")
print(classification_report(y_test, y_pred))
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")

Classification Report:
              precision    recall  f1-score   support

           0       0.72      0.91      0.81        70
           1       0.45      0.17      0.24        30

    accuracy                           0.69       100
   macro avg       0.59      0.54      0.52       100
weighted avg       0.64      0.69      0.64       100

Accuracy: 0.69
