# Week 9 â€” Gradient Boost (EZ Classifier Version)

**Author:** James Hardison II  
**Date:** 2025-11-03

This minimal notebook loads your CKD dataset, converts the `class` target to 0/1, 
does one-hot encoding with pandas, trains a `GradientBoostingClassifier`, and reports metrics and feature importances.

> Edit `csv_path` below, then **Run All**.


In [None]:
# === CONFIG ===
csv_path = "ckd_cleaned.csv"  # set your CSV path


In [None]:
# === LOAD + QUICK CLEAN ===
import numpy as np, pandas as pd, matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

df = pd.read_csv(csv_path)
df.columns = df.columns.str.strip().str.lower()
print("Shape:", df.shape)
display(df.head())

# target -> 0/1
if "class" not in df.columns:
    raise ValueError(f"'class' column not found. Available columns: {list(df.columns)}")
df["class"] = (
    df["class"].astype(str).str.strip().str.lower()
    .replace({"ckd": 1, "notckd": 0})
)
df = df[df["class"].isin([0, 1])]

X = df.drop(columns=["class"])
y = df["class"].astype(int)

X = pd.get_dummies(X, drop_first=True)
X = X.apply(pd.to_numeric, errors="coerce")
X = X.fillna(X.median(numeric_only=True)).fillna(0)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print("Train:", X_train.shape, " Test:", X_test.shape)


In [None]:
# === TRAIN + METRICS ===
gb = GradientBoostingClassifier(
    n_estimators=150, learning_rate=0.1, max_depth=3, random_state=42
)
gb.fit(X_train, y_train)
pred = gb.predict(X_test)

print(f"Accuracy : {accuracy_score(y_test, pred):.3f}")
print(f"Precision: {precision_score(y_test, pred):.3f}")
print(f"Recall   : {recall_score(y_test, pred):.3f}")
print(f"F1       : {f1_score(y_test, pred):.3f}")


In [None]:
# === FEATURE IMPORTANCES ===
importances = pd.Series(gb.feature_importances_, index=X.columns).sort_values(ascending=False)
display(importances.head(15))

plt.figure(figsize=(8,6))
importances.head(15).sort_values(ascending=True).plot(kind="barh")
plt.title("Gradient Boost Feature Importances")
plt.xlabel("Importance"); plt.ylabel("Feature")
plt.tight_layout(); plt.show()


**Done.** Copy metrics and the plot into your Week 9 summary. 
If you need LR or estimator sweeps, add them below, but this file meets the assignment basics.
