# üß¨ Liver Disease Prediction

**Objective**:
This notebook implements the **Hepatic Risk Ensemble**. It is specialized to handle skewed chemical enzyme distributions using Log-Transformation and Scaling.

**Workflow**:
1.  **Ingestion**: Loading ILPD Dataset.
2.  **Preprocessing**: Log-Normalizing Bilirubin and Enzyme levels.
3.  **Training**: Ensemble (XGB+RF+GBM).
4.  **Export**: Production-ready pickle export.

In [None]:
# Core Libraries
import os
import pickle
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Ensemble Components
from sklearn.ensemble import VotingClassifier, RandomForestClassifier, GradientBoostingClassifier
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import StandardScaler

print("‚úÖ Environment Loaded.")

In [None]:
# Load Data
DATA_FILE = "../data/processed/liver.parquet"

if os.path.exists(DATA_FILE):
    df = pd.read_parquet(DATA_FILE)
    print(f"‚úÖ Data Ingested: {df.shape[0]} rows | {df.shape[1]} features")
else:
    print("‚ùå Dataset missing.")

### Feature Engineering: Handling Skew
Liver enzymes often follow a power-law distribution. We apply `log1p` transformation to normalize them.

In [None]:
target = 'target'

# Log Transform Skewed Features
skewed = ['total_bilirubin', 'alkaline_phosphotase', 'alamine_aminotransferase', 'albumin_and_globulin_ratio']
for col in skewed:
    if col in df.columns:
        df[col] = np.log1p(df[col])

X = df.drop(columns=[target])
y = df[target]

# Scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

In [None]:
# --- ENSEMBLE DEFINITION ---

clf1 = XGBClassifier(n_estimators=200, learning_rate=0.05, max_depth=5, eval_metric='logloss', random_state=42)
clf2 = RandomForestClassifier(n_estimators=200, max_depth=10, random_state=42)
clf3 = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)

ensemble = VotingClassifier(
    estimators=[('xgb', clf1), ('rf', clf2), ('gb', clf3)],
    voting='soft'
)

print("‚è≥ Training Hepatic Ensemble...")
ensemble.fit(X_train, y_train)
print("‚úÖ Training Complete.")

In [None]:
# Evaluation
preds = ensemble.predict(X_test)
acc = accuracy_score(y_test, preds)
print(f"üéØ Ensemble Accuracy: {acc:.4f}")
print("\nClassification Report:\n", classification_report(y_test, preds))