# ‚ù§Ô∏è Heart Disease Prediction

**Objective**:
This notebook implements the **Cardiac Risk Ensemble** using LightGBM, XGBoost, and Random Forest. It is designed to handle the large-scale CDC BRFSS dataset (250,000+ records).

**Workflow**:
1.  **Ingestion**: Loading 250k+ rows of Health Indicators.
2.  **Training**: Training a diverse ensemble with `VotingClassifier`.
3.  **Validation**: Checking accuracy on unseen test set.

In [None]:
# Core Libraries
import os
import pickle
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Ensemble Components
from sklearn.ensemble import VotingClassifier, RandomForestClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

print("‚úÖ Environment Loaded.")

In [None]:
# Load CDC Heart Data
DATA_FILE = "../data/processed/heart.parquet"

if os.path.exists(DATA_FILE):
    df = pd.read_parquet(DATA_FILE)
    print(f"‚úÖ Data Ingested: {df.shape[0]} rows | {df.shape[1]} features")
else:
    print("‚ùå Dataset missing.")

### Model Architecture: Cardiac Ensemble
1.  **LightGBM**: Highly efficient for large datasets (Speed + Accuracy King).
2.  **XGBoost**: Robust gradient boosting.
3.  **Random Forest**: Interpretation and stability.


In [None]:
target = 'target'
X = df.drop(columns=[target])
y = df[target]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# --- ENSEMBLE DEFINITION ---

clf1 = XGBClassifier(n_estimators=200, learning_rate=0.05, max_depth=5, eval_metric='logloss', random_state=42)
clf2 = RandomForestClassifier(n_estimators=200, max_depth=10, random_state=42)
clf3 = LGBMClassifier(n_estimators=200, learning_rate=0.05, num_leaves=31, random_state=42)

ensemble = VotingClassifier(
    estimators=[('xgb', clf1), ('rf', clf2), ('lgbm', clf3)],
    voting='soft'
)

print("‚è≥ Training Cardiac Ensemble...")
ensemble.fit(X_train, y_train)
print("‚úÖ Training Complete.")

In [None]:
# Evaluation
preds = ensemble.predict(X_test)
acc = accuracy_score(y_test, preds)
print(f"üéØ Ensemble Accuracy: {acc:.4f}")
print("\nClassification Report:\n", classification_report(y_test, preds))