# Day 03 — Decision trees + comparison

Today I add a **decision tree classifier** and compare it to the Day 02 logistic regression baseline.


## Goals for this notebook
1. Load a real dataset (Breast Cancer Wisconsin).
2. Train a logistic regression baseline.
3. Train a decision tree and compare metrics.

I keep the workflow short but add commentary so each step is clear.


In [None]:
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score


## Load data
The scikit-learn dataset loader gives us a clean numeric dataset.
I convert it into a DataFrame to make feature inspection easier.


In [None]:
dataset = load_breast_cancer()
df = pd.DataFrame(dataset.data, columns=dataset.feature_names)
df["target"] = dataset.target
df.head()


## Train/test split
I split once so both models see the same train/test data.


In [None]:
X = df.drop(columns=["target"])
y = df["target"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)


## Baseline: logistic regression
This mirrors Day 02. We use it as a reference point for the tree.


In [None]:
log_reg = LogisticRegression(max_iter=1000)
log_reg.fit(X_train, y_train)
log_preds = log_reg.predict(X_test)

log_metrics = {
    "model": "logistic_regression",
    "accuracy": accuracy_score(y_test, log_preds),
    "precision": precision_score(y_test, log_preds),
    "recall": recall_score(y_test, log_preds),
}
log_metrics


## Decision tree
A decision tree can capture non-linear splits. I keep the depth small
to reduce overfitting while still seeing how it compares.


In [None]:
tree = DecisionTreeClassifier(max_depth=4, random_state=42)
tree.fit(X_train, y_train)
tree_preds = tree.predict(X_test)

tree_metrics = {
    "model": "decision_tree",
    "accuracy": accuracy_score(y_test, tree_preds),
    "precision": precision_score(y_test, tree_preds),
    "recall": recall_score(y_test, tree_preds),
}
tree_metrics


## Compare results
I place metrics side-by-side so it is obvious which model wins
on this dataset.


In [None]:
pd.DataFrame([log_metrics, tree_metrics])
