# Breast Cancer Wisconsin in PyTorch (Logistic Regression)

One quick note: PyTorch doesn’t have a native “Decision Tree” API (trees are usually done with scikit-learn / XGBoost). In PyTorch we typically train Logistic Regression / MLP (Neural Net) on this dataset.



## Import package


In [1]:
import numpy as np
import torch
import torch.nn as nn
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score, roc_auc_score

In [2]:
# Repro
np.random.seed(42)
torch.manual_seed(42)

<torch._C.Generator at 0x783270087370>

In [3]:
# Load dataset (sklearn only for data)
data = load_breast_cancer()
X = data.data.astype(np.float32)
y = data.target.astype(np.float32).reshape(-1, 1)  # 0=malignant, 1=benign


In [4]:
# Manual split
idx = np.random.permutation(len(X))
split = int(0.8 * len(X))
train_idx, test_idx = idx[:split], idx[split:]

X_train, X_test = X[train_idx], X[test_idx]
y_train, y_test = y[train_idx], y[test_idx]

# Standardize (z-score)

In [5]:
# Standardize (fit on train only)
mu = X_train.mean(axis=0, keepdims=True)
sigma = X_train.std(axis=0, keepdims=True) + 1e-12
X_train = (X_train - mu) / sigma
X_test  = (X_test  - mu) / sigma

In [6]:
# Torch tensors
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
X_train_t = torch.tensor(X_train, device=device)
y_train_t = torch.tensor(y_train, device=device)
X_test_t  = torch.tensor(X_test, device=device)
y_test_t  = torch.tensor(y_test, device=device)

# Model

In [7]:
model = nn.Linear(X_train_t.shape[1], 1).to(device)  # logits
criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Train

In [8]:
epochs = 2000
for epoch in range(epochs):
    logits = model(X_train_t)
    loss = criterion(logits, y_train_t)

    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

    if epoch % 200 == 0:
        print(f"Epoch {epoch} | Loss {loss.item():.4f}")

Epoch 0 | Loss 0.8492
Epoch 200 | Loss 0.0687
Epoch 400 | Loss 0.0546
Epoch 600 | Loss 0.0484
Epoch 800 | Loss 0.0442
Epoch 1000 | Loss 0.0410
Epoch 1200 | Loss 0.0383
Epoch 1400 | Loss 0.0359
Epoch 1600 | Loss 0.0338
Epoch 1800 | Loss 0.0319


# Evaluate

In [9]:
with torch.no_grad():
    logits_test = model(X_test_t)
    probs_test = torch.sigmoid(logits_test).cpu().numpy()
    preds_test = (probs_test >= 0.5).astype(int)

acc = accuracy_score(y_test.astype(int), preds_test)
auc = roc_auc_score(y_test, probs_test)

print("Accuracy:", acc)
print("ROC–AUC:", auc)

Accuracy: 0.9385964912280702
ROC–AUC: 0.9882502381708478


# Small MLP

In [10]:
mlp = nn.Sequential(
    nn.Linear(X_train_t.shape[1], 32),
    nn.ReLU(),
    nn.Linear(32, 1)
).to(device)

criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(mlp.parameters(), lr=0.005)

for epoch in range(2000):
    logits = mlp(X_train_t)
    loss = criterion(logits, y_train_t)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

with torch.no_grad():
    probs = torch.sigmoid(mlp(X_test_t)).cpu().numpy()
    preds = (probs >= 0.5).astype(int)

print("MLP Accuracy:", accuracy_score(y_test.astype(int), preds))
print("MLP ROC–AUC:", roc_auc_score(y_test, probs))

MLP Accuracy: 0.9649122807017544
MLP ROC–AUC: 0.9901556049539536
