# Interpretable Machine Learning Models

This notebook applies interpretable machine learning models to the customer churn
prediction task. The goal is not only to achieve reasonable predictive performance,
but also to understand how individual features contribute to churn decisions.

In [1]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, roc_auc_score

## Data Loading and Preparation

We load the dataset and prepare features for modeling. Identifier variables
are removed, and categorical variables are encoded using one-hot encoding.

In [2]:
df = pd.read_csv("../data/telecom_churn.csv")

FileNotFoundError: [Errno 2] No such file or directory: '../data/telecom_churn.csv'

## Feature Selection

The customer identifier is removed as it does not carry predictive information.
The target variable is converted to a binary format.

In [None]:
df = df.drop(columns=["customerID"])

df["Churn"] = df["Churn"].map({"Yes": 1, "No": 0})

## Feature Encoding Strategy

Numerical features are passed directly to the model, while categorical features
are encoded using one-hot encoding. This enables linear models to capture
categorical effects in an interpretable manner.

In [None]:
X = df.drop(columns=["Churn"])
y = df["Churn"]

num_features = X.select_dtypes(include=["int64", "float64"]).columns
cat_features = X.select_dtypes(include=["object", "string"]).columns

## Train-Test Split

The dataset is split into training and test sets to evaluate generalization
performance on unseen data.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

## Logistic Regression

Logistic regression is used as a baseline interpretable model. Model coefficients
provide direct insights into how each feature influences churn probability.

In [None]:
preprocessor = ColumnTransformer(
    transformers=[
        ("num", "passthrough", num_features),
        ("cat", OneHotEncoder(drop="first"), cat_features)
    ]
)

log_reg = Pipeline(
    steps=[
        ("preprocessor", preprocessor),
        ("model", LogisticRegression(max_iter=1000))
    ]
)

log_reg.fit(X_train, y_train)

In [None]:
y_pred = log_reg.predict(X_test)
y_prob = log_reg.predict_proba(X_test)[:, 1]

print(classification_report(y_test, y_pred))
print("ROC-AUC:", roc_auc_score(y_test, y_prob))

## Interpretation of Logistic Regression Coefficients

Positive coefficients increase the likelihood of churn, while negative
coefficients reduce it. Features with larger absolute coefficients have
stronger influence on the modelâ€™s predictions.


## Decision Tree

A shallow decision tree is trained to capture non-linear relationships
while maintaining interpretability through a limited tree depth.


In [None]:
tree = Pipeline(
    steps=[
        ("preprocessor", preprocessor),
        ("model", DecisionTreeClassifier(max_depth=4, random_state=42))
    ]
)

tree.fit(X_train, y_train)

In [None]:
y_pred_tree = tree.predict(X_test)
y_prob_tree = tree.predict_proba(X_test)[:, 1]

print(classification_report(y_test, y_pred_tree))
print("ROC-AUC:", roc_auc_score(y_test, y_prob_tree))

## Model Comparison and Discussion

Both logistic regression and decision trees achieve reasonable predictive
performance. Logistic regression offers global interpretability through
feature coefficients, while decision trees capture non-linear interactions.

These results demonstrate that interpretable models can effectively model
churn behavior without relying on complex black-box methods.
