# ü©∫ Diabetes Classification Project

Notebook ini digunakan untuk melakukan klasifikasi penyakit diabetes menggunakan dua algoritma Machine Learning:
- Logistic Regression
- Decision Tree

Hasil evaluasi akan disimpan ke dalam file `metrics_summary.json`.

In [None]:

# Import Library yang Dibutuhkan
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, roc_curve, auc
import json


## 1Ô∏è‚É£ Load Dataset

In [None]:

# Load dataset diabetes.csv (pastikan file sudah diupload ke Colab)
df = pd.read_csv("diabetes.csv")
df.head()


## 2Ô∏è‚É£ Exploratory Data Analysis (EDA)

In [None]:

# Info dasar dataset
print("Jumlah data dan fitur:", df.shape)
print("\nCek missing values:")
print(df.isnull().sum())

# Visualisasi distribusi target
sns.countplot(x='Outcome', data=df)
plt.title("Distribusi Kelas (0 = Tidak Diabetes, 1 = Diabetes)")
plt.show()

# Korelasi antar fitur
plt.figure(figsize=(10,8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title("Heatmap Korelasi Fitur")
plt.show()


## 3Ô∏è‚É£ Preprocessing Data

In [None]:

# Pisahkan fitur (X) dan target (y)
X = df.drop("Outcome", axis=1)
y = df["Outcome"]

# Split data menjadi train dan test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Normalisasi fitur
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test_scaled)


## 4Ô∏è‚É£ Training Model

In [None]:

# Inisialisasi model
log_model = LogisticRegression()
tree_model = DecisionTreeClassifier(random_state=42)

# Training
log_model.fit(X_train_scaled, y_train)
tree_model.fit(X_train, y_train)


## 5Ô∏è‚É£ Evaluasi Model

In [None]:

models = {'Logistic Regression': log_model, 'Decision Tree': tree_model}
metrics_summary = {}

for name, model in models.items():
    if name == 'Logistic Regression':
        y_pred = model.predict(X_test_scaled)
    else:
        y_pred = model.predict(X_test)

    acc = accuracy_score(y_test, y_pred)
    prec = precision_score(y_test, y_pred)
    rec = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)

    cm = confusion_matrix(y_test, y_pred)
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
    plt.title(f"Confusion Matrix - {name}")
    plt.xlabel("Predicted")
    plt.ylabel("Actual")
    plt.show()

    # ROC Curve
    y_prob = model.predict_proba(X_test_scaled if name == 'Logistic Regression' else X_test)[:,1]
    fpr, tpr, _ = roc_curve(y_test, y_prob)
    roc_auc = auc(fpr, tpr)

    plt.plot(fpr, tpr, label=f"{name} (AUC = {roc_auc:.2f})")
    plt.plot([0,1],[0,1],'--',color='gray')
    plt.xlabel("False Positive Rate")
    plt.ylabel("True Positive Rate")
    plt.title("ROC Curve")
    plt.legend()
    plt.show()

    metrics_summary[name] = {
        "Accuracy": acc,
        "Precision": prec,
        "Recall": rec,
        "F1-Score": f1,
        "AUC": roc_auc
    }

# Simpan hasil evaluasi ke file JSON
with open("metrics_summary.json", "w") as f:
    json.dump(metrics_summary, f, indent=4)

metrics_summary


## ‚úÖ Kesimpulan

Model Logistic Regression dan Decision Tree dibandingkan berdasarkan metrik evaluasi (Accuracy, Precision, Recall, F1-score, AUC). Hasil disimpan pada `metrics_summary.json` untuk dokumentasi.