# IBD Flare-up Prediction Project

**Research Question:** Can we predict IBD flare-ups 90 days in advance using clinical, lab, medication, and procedure data?

**Abstract:** This project develops a logistic regression model to predict flare-ups in IBD patients. Features include lab measurements (CRP, ESR, Hb, Hct, WBC, albumin, ALT, AST, fecal calprotectin), medications, and procedures. The goal is to alert patients and doctors for timely intervention.

The following notebook shows the full analysis with model training, evaluation metrics, and visualizations.

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, accuracy_score, confusion_matrix, classification_report, roc_curve
import matplotlib.pyplot as plt
import seaborn as sns

# Load embedded dataset
df = pd.DataFrame({'count_crohn': {0: 2, 1: 0, 2: 2, 3: 2, 4: 0}, 'count_uc': {0: 1, 1: 0, 2: 0, 3: 1, 4: 1}, 'max_crp': {0: 5.798789748102473, 1: 7.095407531479817, 2: 7.824182145821345, 3: 2.079559797917673, 4: 7.080968984100551}, 'max_esr': {0: 24.476693436516047, 1: 8.739832981012887, 2: 15.220091124615296, 3: 12.425522308953443, 4: 4.147578576211197}, 'min_hb': {0: 11.854783619232597, 1: 10.916053406441486, 2: 11.74271483748895, 3: 13.287482137788935, 4: 12.167121037264037}, 'max_hct': {0: 43.74492409695887, 1: 42.15824012193549, 2: 33.2251466025566, 3: 45.90494914728543, 4: 31.455165259231272}, 'max_wbc': {0: 8.925206615517599, 1: 4.183104421905278, 2: 6.6243271788705425, 3: 7.1126824849629156, 4: 8.387859070444504}, 'min_albumin': {0: 4.424891070561206, 1: 4.376606087512163, 2: 5.236037631949703, 3: 4.310259080403771, 4: 3.1065605746541314}, 'max_alt': {0: 20.57826636044578, 1: 39.00309007538186, 2: 17.659166673019143, 3: 34.865065927704656, 4: 23.36588866860744}, 'max_ast': {0: 29.904614580325138, 1: 17.38091923477903, 2: 18.907096240413118, 3: 21.316415545001227, 4: 17.046562362820378}, 'max_fcp': {0: 65.45306633144014, 1: 163.78479303496945, 2: 209.71862387734916, 3: 104.04026159754677, 4: 162.49079448302146}, 'count_medications': {0: 1, 1: 0, 2: 0, 3: 1, 4: 1}, 'count_procedures': {0: 0, 1: 1, 2: 0, 3: 1, 4: 1}, 'flare_up': {0: 1, 1: 1, 2: 0, 3: 1, 4: 1}})  # Sample view
# In practice, df is fully defined in previous cell

# Features and target
X = df.drop(columns=['flare_up'])
y = df['flare_up']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Impute & scale
imputer = SimpleImputer(strategy='mean')
X_train_scaled = StandardScaler().fit_transform(imputer.fit_transform(X_train))
X_test_scaled = StandardScaler().fit_transform(imputer.transform(X_test))

# Logistic regression
model = LogisticRegression(max_iter=1000)
model.fit(X_train_scaled, y_train)

# Predictions
y_pred = model.predict(X_test_scaled)
y_prob = model.predict_proba(X_test_scaled)[:,1]

# Metrics
acc = accuracy_score(y_test, y_pred)
roc = roc_auc_score(y_test, y_prob)
cm = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred)

print('Accuracy:', acc)
print('ROC-AUC:', roc)
print('Confusion Matrix:\n', cm)
print('Classification Report:\n', report)

# Confusion matrix plot
plt.figure(figsize=(6,4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

# ROC curve
fpr, tpr, _ = roc_curve(y_test, y_prob)
plt.figure(figsize=(6,4))
plt.plot(fpr, tpr, label=f'ROC curve (AUC = {roc:.3f})')
plt.plot([0,1],[0,1],'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.show()
