# 03 - Model Experiments: Churn Prediction

## Goals:
- Try different classifiers (Random Forest, XGBoost, etc.)
- Add/remove features and observe impact
- Track everything with MLflow

---

## 🚀 MLflow Server Command Explained

```bash

mlflow server \
  --backend-store-uri sqlite:///backend.db \
  --default-artifact-root ./mlruns \
  --host 127.0.0.1 \
  --port 5000


## 1. Load the Processed Data

We load the train and test datasets that were previously cleaned and saved to the `data/processed/` folder.

These files will be used as the base input for our feature engineering and model training experiments.

In [1]:
# 1. Load train and test data

import pandas as pd
import mlflow
import mlflow.sklearn
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler, FunctionTransformer
from sklearn.impute import SimpleImputer
import numpy as np
from sklearn.metrics import accuracy_score, roc_auc_score
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from xgboost import XGBClassifier





# Load processed train and test sets
train_df = pd.read_csv("../data/processed/train.csv")
test_df = pd.read_csv("../data/processed/test.csv")

print(f"✅ Train shape: {train_df.shape}")
print(f"✅ Test shape: {test_df.shape}")

# Optional: show a few rows
train_df.head()

✅ Train shape: (7088, 19)
✅ Test shape: (3039, 19)


Unnamed: 0,Customer_Age,Dependent_count,Months_on_book,Total_Relationship_Count,Months_Inactive_12_mon,Contacts_Count_12_mon,Credit_Limit,Total_Revolving_Bal,Total_Trans_Amt,Total_Trans_Ct,Total_Amt_Chng_Q4_Q1,Total_Ct_Chng_Q4_Q1,Avg_Utilization_Ratio,Gender,Education_Level,Marital_Status,Income_Category,Card_Category,churn
0,44,3,36,2,3,3,6680.0,1839,7632,95,0.617,0.532,0.275,F,Uneducated,Married,Less than $40K,Blue,0
1,39,1,34,3,1,1,2884.0,2517,4809,87,0.693,0.74,0.873,F,Graduate,Single,Unknown,Blue,0
2,52,1,36,4,2,2,14858.0,1594,4286,72,0.51,0.636,0.107,M,Unknown,Married,$80K - $120K,Blue,0
3,34,0,17,4,1,4,2638.0,2092,1868,43,0.591,0.344,0.793,M,Graduate,Married,$40K - $60K,Blue,0
4,47,5,36,3,1,2,8896.0,1338,4252,70,0.741,0.591,0.15,M,Doctorate,Single,Less than $40K,Blue,0


In [2]:
# Separate features and target
X_train = train_df.drop(columns=["churn"])  # Drop unused target + unused col
y_train = train_df["churn"]

X_test = test_df.drop(columns=["churn"])
y_test = test_df["churn"]

## 2. Select Features and Target

We define the list of features to use for model training and isolate the target variable (`churn`). 

These features include both numerical and categorical columns, so we'll handle preprocessing later using a pipeline.

In [3]:
# === 1. Define your custom feature engineering logic ===
def add_interaction_features(df):
    df = df.copy()
    df["Avg_Transaction_Amt"] = df["Total_Trans_Amt"] / (df["Total_Trans_Ct"] + 1e-3)
    df["Revolve_to_Limit"] = df["Total_Revolving_Bal"] / (df["Credit_Limit"] + 1e-3)
    df["AmtCt_Chg_Ratio"] = df["Total_Amt_Chng_Q4_Q1"] / (df["Total_Ct_Chng_Q4_Q1"] + 1e-3)
    return df

# Wrap it as a FunctionTransformer
interaction_transformer = FunctionTransformer(add_interaction_features)

# === 2. Separate features ===
categorical = [
    "Gender", "Education_Level", "Marital_Status", 
    "Income_Category", "Card_Category"
]

numerical_to_scale = [
    "Credit_Limit", "Total_Revolving_Bal", 
    "Total_Trans_Amt", "Total_Trans_Ct", 
    "Avg_Utilization_Ratio", "Avg_Transaction_Amt", 
    "Revolve_to_Limit", "AmtCt_Chg_Ratio"
]

numerical_no_scale = [
    "Customer_Age", "Dependent_count", "Months_on_book",
    "Total_Relationship_Count", "Months_Inactive_12_mon",
    "Contacts_Count_12_mon", "Total_Amt_Chng_Q4_Q1", 
    "Total_Ct_Chng_Q4_Q1"
]

# === 3. Define transformers ===

cat_transformer = Pipeline([
    ("imputer", SimpleImputer(strategy="most_frequent")),
    ("encoder", OneHotEncoder(handle_unknown="ignore"))
])

num_scale_transformer = Pipeline([
    ("imputer", SimpleImputer(strategy="median")),
    ("scaler", StandardScaler())
])

num_noscale_transformer = Pipeline([
    ("imputer", SimpleImputer(strategy="median"))
])

# === 4. Compose all into one ColumnTransformer ===
preprocessor = ColumnTransformer([
    ("num_scaled", num_scale_transformer, numerical_to_scale),
    ("num_noscale", num_noscale_transformer, numerical_no_scale),
    ("cat", cat_transformer, categorical)
])

# === 5. Final pipeline with feature engineering + preprocessor ===
full_pipeline = Pipeline([
    ("feature_engineering", interaction_transformer),
    ("preprocessor", preprocessor)
])

## 3. Train + evaluate baseline model

In [4]:


# Combine preprocessing pipeline with the classifier
rf_pipeline = Pipeline([
    ("feature_pipeline", full_pipeline),
    ("classifier", RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42))
])

# Fit the model
rf_pipeline.fit(X_train, y_train)

# Predict
y_pred = rf_pipeline.predict(X_test)
y_proba = rf_pipeline.predict_proba(X_test)[:, 1]

# Evaluate
print("✅ Accuracy:", accuracy_score(y_test, y_pred))
print("✅ ROC AUC:", roc_auc_score(y_test, y_proba))

✅ Accuracy: 0.9519578808818691
✅ ROC AUC: 0.9843969899300179


## 4. MLflow tracking block

In [5]:


# Set the experiment name (create it if not exists)
mlflow.set_tracking_uri("http://127.0.0.1:5000")
mlflow.set_experiment("churn-pipeline-baseline-models")

with mlflow.start_run():
    # Log model name as a tag
    mlflow.set_tag("model", "random_forest")

    # Define model params
    rf_params = {
        "n_estimators": 100,
        "max_depth": 10,
        "random_state": 42
    }
    mlflow.log_params(rf_params)

    # Create pipeline
    rf_pipeline = Pipeline([
        ("feature_pipeline", full_pipeline),
        ("classifier", RandomForestClassifier(**rf_params))
    ])

    # Fit
    rf_pipeline.fit(X_train, y_train)

    # Predict
    y_pred = rf_pipeline.predict(X_test)
    y_proba = rf_pipeline.predict_proba(X_test)[:, 1]

    # Metrics
    acc = accuracy_score(y_test, y_pred)
    roc_auc = roc_auc_score(y_test, y_proba)

    mlflow.log_metric("accuracy", acc)
    mlflow.log_metric("roc_auc", roc_auc)

    # Log full pipeline
    mlflow.sklearn.log_model(rf_pipeline, artifact_path="model")

    print(f"✅ Accuracy: {acc:.4f}")
    print(f"✅ ROC AUC: {roc_auc:.4f}")



✅ Accuracy: 0.9520
✅ ROC AUC: 0.9844
🏃 View run capricious-kit-713 at: http://127.0.0.1:5000/#/experiments/1/runs/9b94b41f1f394f9dbb45fcf2e2ddf6f7
🧪 View experiment at: http://127.0.0.1:5000/#/experiments/1


## 4.a (Optional) Register best model

In [8]:
def train_and_log_model(model, model_name, extra_params=None):
    mlflow.set_tracking_uri("http://127.0.0.1:5000")
    mlflow.set_experiment("churn-pipeline-baseline-models")
    
    with mlflow.start_run():
        mlflow.set_tag("model", model_name)
        
        # Automatically extract model parameters
        model_params = model.get_params()
        
        # Merge with any manually passed params (optional)
        if extra_params:
            model_params.update(extra_params)
        
        mlflow.log_params(model_params)

        pipeline = Pipeline([
            ("feature_pipeline", full_pipeline),
            ("classifier", model)
        ])

        pipeline.fit(X_train, y_train)
        y_pred = pipeline.predict(X_test)
        y_proba = pipeline.predict_proba(X_test)[:, 1]

        acc = accuracy_score(y_test, y_pred)
        roc_auc = roc_auc_score(y_test, y_proba)

        mlflow.log_metric("accuracy", acc)
        mlflow.log_metric("roc_auc", roc_auc)

        mlflow.sklearn.log_model(pipeline, artifact_path="model")

        print(f"✅ {model_name} - Accuracy: {acc:.4f}, ROC AUC: {roc_auc:.4f}")

## 🧪 Step 5: Model Experimentation

Now that we have a complete preprocessing and feature engineering pipeline, we can experiment with different models to evaluate their performance.

Instead of repeating boilerplate code for each model, we define a **utility function** `train_and_log_model()` that:

- Creates a pipeline with preprocessing + model
- Fits the model on the training set
- Logs model parameters and metrics (Accuracy, ROC AUC) to MLflow
- (Optional) Skips saving the model to S3 while we’re in the experimentation phase

This way, we can try out multiple models easily and compare their performance directly in the MLflow UI (`http://127.0.0.1:5000`).

Typical models to test include:
- ✅ Random Forest
- ✅ Logistic Regression
- ✅ Gradient Boosting
- ✅ XGBoost

Once we identify the best-performing model, we can:
- Save it using `mlflow.sklearn.log_model()`
- Register it in the **Model Registry**
- Deploy or version it as needed

In [10]:
# Random Forest
train_and_log_model(
    RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42),
    model_name="random_forest",
    extra_params={"n_estimators": 100, "max_depth": 10}
)

# Logistic Regression
train_and_log_model(
    LogisticRegression(max_iter=500),
    model_name="logistic_regression",
    extra_params={"max_iter": 500}
)

# Gradient Boosting
train_and_log_model(
    GradientBoostingClassifier(n_estimators=100, learning_rate=0.1),
    model_name="gradient_boosting",
    extra_params={"n_estimators": 100, "learning_rate": 0.1}
)

# XGBoost
train_and_log_model(
    XGBClassifier(n_estimators=100, use_label_encoder=False, eval_metric='logloss'),
    model_name="xgboost",
    extra_params={"n_estimators": 100}
)



✅ random_forest - Accuracy: 0.9520, ROC AUC: 0.9844
🏃 View run chill-bass-176 at: http://127.0.0.1:5000/#/experiments/1/runs/01b82c662ffd464aa19ebeaeb502f4fe
🧪 View experiment at: http://127.0.0.1:5000/#/experiments/1




✅ logistic_regression - Accuracy: 0.9075, ROC AUC: 0.9280
🏃 View run masked-lynx-887 at: http://127.0.0.1:5000/#/experiments/1/runs/542941524bef48e89b44af1c87eb02e6
🧪 View experiment at: http://127.0.0.1:5000/#/experiments/1




✅ gradient_boosting - Accuracy: 0.9599, ROC AUC: 0.9879
🏃 View run worried-horse-40 at: http://127.0.0.1:5000/#/experiments/1/runs/6dad04fb62244dbea427485d505eccf3
🧪 View experiment at: http://127.0.0.1:5000/#/experiments/1


Parameters: { "use_label_encoder" } are not used.



✅ xgboost - Accuracy: 0.9707, ROC AUC: 0.9920
🏃 View run wise-shrew-85 at: http://127.0.0.1:5000/#/experiments/1/runs/e2a291c40dd940dc8b3e703be56ba1a7
🧪 View experiment at: http://127.0.0.1:5000/#/experiments/1


## 7. Model Selection: Comparing Baselines
Now that we’ve trained and logged four models using a shared preprocessing pipeline, we evaluate their performance using ROC AUC and accuracy. 
Based on this comparison, we’ll choose one or two models for further tuning.

In [12]:
import warnings
warnings.filterwarnings("ignore", category=UserWarning, module="xgboost")



# 1. Define the pipeline with XGBoost
xgb = XGBClassifier(use_label_encoder=False, eval_metric="logloss", random_state=42)
pipeline = Pipeline([
    ("feature_pipeline", full_pipeline),
    ("classifier", xgb)
])

# 2. Define parameter grid
param_grid = {
    "classifier__n_estimators": [100, 150],
    "classifier__max_depth": [3, 5, 7],
    "classifier__learning_rate": [0.05, 0.1]
}

# 3. Fit grid search
grid_search = GridSearchCV(
    estimator=pipeline,
    param_grid=param_grid,
    cv=3,
    scoring="roc_auc",
    verbose=1
)

grid_search.fit(X_train, y_train)

# 4. Extract the best model and params
best_model = grid_search.best_estimator_
best_params = grid_search.best_params_

# Optional: add back model_name to params dict
best_params["model_name"] = "xgboost_tuned"

# 5. Call your existing function 🎉
train_and_log_model(
    model=best_model.named_steps["classifier"],  # Only pass the classifier
    model_name="xgboost_tuned",
    params=best_params
)

Fitting 3 folds for each of 12 candidates, totalling 36 fits


TypeError: train_and_log_model() got an unexpected keyword argument 'params'

In [14]:
import mlflow

model_uri = "s3://mlops-churn-analytics-falcon/mlflow-artifacts/2/models/m-7cb7b517788c48a5ac1aa5808135197f/artifacts/"
model = mlflow.sklearn.load_model(model_uri)

# Use it for predictions
preds = model.predict(X_test)

array([0, 0, 0, ..., 0, 0, 0])