# Predictive SLA‑Breach Model
This notebook demonstrates an **end‑to‑end workflow** to explore, feature‑engineer and train a model that predicts whether a pharmacy benefit change‑request will breach the 5‑day service‑level agreement (SLA).

**Dataset:** synthetic 500‑row sample generated for prototyping (`dummy_change_request_data.csv`).

**Pipeline outline:**
1. Load & validate data
2. Exploratory Data Analysis (EDA)
3. Feature Engineering
4. Train/Test split
5. Model training (Logistic Regression + Gradient Boosting)
6. Evaluation & interpretation
7. Save model artifact

*Feel free to run each cell sequentially or adapt the code for your production environment.*

In [None]:

import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt

# For modeling
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import (
    classification_report, confusion_matrix, roc_auc_score, RocCurveDisplay
)
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier
import joblib  # for saving model

# Load data
csv_path = "/mnt/data/dummy_change_request_data.csv"
df = pd.read_csv(csv_path, dtype=str)

# Parse dates
date_cols = [
    "cr date created", "cr date closed",
    "test results received date", "test results approved date"
]
for col in date_cols:
    df[col] = pd.to_datetime(df[col], format="%Y%m%d")

df.head()


## 1. Exploratory Data Analysis

In [None]:

print("Dataset shape:", df.shape)
print("\nData types:")
print(df.dtypes)

# Missing values
print("\nMissing values per column:")
print(df.isna().sum())

# SLA durations
df['request_SLA_days'] = (df['cr date closed'] - df['cr date created']).dt.days
df['testing_SLA_days'] = (df['test results approved date'] - df['test results received date']).dt.days

fig, ax = plt.subplots()
df['request_SLA_days'].hist(ax=ax)
ax.set_title("Distribution of Request SLA (days)")
ax.set_xlabel("Days")
ax.set_ylabel("Frequency")
plt.show()


## 2. Feature Engineering

In [None]:

# Binary target: breach if request_SLA_days > 5
SLA_THRESHOLD = 5
df['breach'] = (df['request_SLA_days'] > SLA_THRESHOLD).astype(int)

# Base features
df['month'] = df['cr date created'].dt.month
df['year'] = df['cr date created'].dt.year

feature_cols = [
    'request_SLA_days', 'testing_SLA_days', 'month', 'year', 'category'
]

X = df[feature_cols]
y = df['breach']

# Categorical cols
cat_features = ['category']
num_features = ['request_SLA_days', 'testing_SLA_days', 'month', 'year']

# Preprocess: one‑hot encode category
preprocessor = ColumnTransformer(
    transformers=[
        ('cat', OneHotEncoder(handle_unknown='ignore'), cat_features),
        ('num', 'passthrough', num_features)
    ]
)


## 3. Train/Test Split & Model Training

In [None]:

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, stratify=y, random_state=42
)

# Logistic Regression pipeline
logreg_pipeline = Pipeline(steps=[
    ('preprocess', preprocessor),
    ('model', LogisticRegression(max_iter=1000))
])

# Gradient Boosting pipeline
gb_pipeline = Pipeline(steps=[
    ('preprocess', preprocessor),
    ('model', GradientBoostingClassifier())
])

models = {
    'LogisticRegression': logreg_pipeline,
    'GradientBoosting': gb_pipeline
}

for name, pipe in models.items():
    pipe.fit(X_train, y_train)
    y_pred = pipe.predict(X_test)
    y_proba = pipe.predict_proba(X_test)[:, 1]
    auc = roc_auc_score(y_test, y_proba)
    print(f"\n=== {name} ===")
    print("AUC:", round(auc, 3))
    print("Classification Report:\n", classification_report(y_test, y_pred))


## 4. Evaluation: ROC Curve

In [None]:

best_model = gb_pipeline  # choose based on AUC above
RocCurveDisplay.from_estimator(best_model, X_test, y_test)
plt.show()


## 5. Save Model Artifact

In [None]:

model_path = "/mnt/data/gb_sla_breach_model.pkl"
joblib.dump(best_model, model_path)
print("Model saved to:", model_path)



### 6. Next Steps
* Integrate with live SharePoint data via Power Automate.
* Schedule daily inference job and write predictions back to a risk‑monitoring list.
* Enhance feature set with rolling backlog metrics and requester metadata.
* Perform hyper‑parameter tuning (e.g., `GridSearchCV`) for production deployment.
