# Notebook 6 — Logistic Regression (Binary & Multinomial)

**Dataset:** Heart Disease UCI (Kaggle)

**Purpose:** Demonstrate binary logistic regression for disease prediction, preprocessing, ROC curve, and optional multiclass setup.

## Setup & Load
Download CSV (e.g., `heart.csv`) from Kaggle and place it in working directory.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, roc_auc_score, roc_curve, classification_report

CSV = 'heart.csv'
try:
    df = pd.read_csv(CSV)
    print('Loaded heart dataset:', df.shape)
    display(df.head())
except Exception as e:
    print('Could not load heart.csv — ensure file is present.\n', e)


## Preprocess & Train logistic regression

In [None]:
try:
    X = df.drop(columns=['target'])
    y = df['target']
    X = pd.get_dummies(X, drop_first=True)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    clf = LogisticRegression(max_iter=1000)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    y_proba = clf.predict_proba(X_test)[:,1]
    print('Accuracy:', accuracy_score(y_test, y_pred))
    print('ROC AUC:', roc_auc_score(y_test, y_proba))
    print('\nClassification report:\n', classification_report(y_test, y_pred))
except Exception as e:
    print('Logistic regression pipeline failed.\n', e)
