# Bank Customer Churn â€” Modeling

## Purpose of this notebook
This notebook focuses on training and evaluating machine learning models
for customer churn prediction.

The main goals are:
- Load preprocessed datasets and artifacts
- Train a strong baseline model
- Evaluate model performance using appropriate metrics
- Establish a reference point for more advanced models

All modeling steps rely on artifacts created in `02_preprocessing.ipynb`.


In [7]:
# Core libraries
import numpy as np
import pandas as pd

# System utilities
from pathlib import Path

# Modeling
from sklearn.linear_model import LogisticRegression

# Evaluation
from sklearn.metrics import (
    roc_auc_score,
    average_precision_score,
    classification_report,
    confusion_matrix,
    RocCurveDisplay,
    PrecisionRecallDisplay,
)

# Reproducibility
RANDOM_STATE = 42


## Load Preprocessing Artifacts

We load the processed datasets and preprocessing artifacts
produced in `02_preprocessing.ipynb`.

This ensures full consistency between preprocessing and modeling stages.


In [8]:
# Clone project repository (required in Colab)
!git clone https://github.com/laser54/bank-churn-prediction.git


fatal: destination path 'bank-churn-prediction' already exists and is not an empty directory.


In [9]:
# Path to artifacts directory
ARTIFACTS_DIR = Path("../artifacts")

# Load datasets
X_train = pd.read_parquet(ARTIFACTS_DIR / "X_train.parquet")
X_test = pd.read_parquet(ARTIFACTS_DIR / "X_test.parquet")
y_train = pd.read_parquet(ARTIFACTS_DIR / "y_train.parquet")["churn"]
y_test = pd.read_parquet(ARTIFACTS_DIR / "y_test.parquet")["churn"]

X_train.shape, X_test.shape


FileNotFoundError: [Errno 2] No such file or directory: '../artifacts/X_train.parquet'

In [None]:
from pathlib import Path
import os

print("CWD:", Path.cwd())
print("Files here:", os.listdir("."))
