### **State University of Campinas - UNICAMP** </br>
**Course**: MC886A </br>
**Professor**: Marcelo da Silva Reis </br>
**TA (PED)**: Marcos Vinicius Souza Freire

---

### **Hands-On: Model Selection**
##### Notebook: 01 Model Selection

> Dataset from Scikit Learn - [load_breast_cancer](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html), based on [Breast Cancer Wisconsin (Diagnostic)](https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic)(1993)[1]
---

**This notebook covers the following topics:**

- **Model Selection and Regularization:** Using subset selection (RFE), Ridge (L2) and Lasso (L1) regression.
- **Advanced Model Selection:** Applying regularization with PyTorch for logistic regression, and a demonstration with k-Nearest Neighbors and Random Forest.

Throughout the notebook we illustrate the methods using formulas, interactive Plotly graphs for the decision boundaries, and well-structured code cells.

Based on the Jurafsky & Martin (2025) lectures [2]

---


In [1]:
# Import necessary libraries
import numpy as np
import pandas as pd

# Replace Matplotlib with Plotly for interactive plotting
import plotly.graph_objects as go
import plotly.express as px

from sklearn.datasets import make_classification, load_breast_cancer
from sklearn.model_selection import train_test_split, KFold, LeaveOneOut, cross_val_score, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import RFE

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, TensorDataset

import warnings
warnings.filterwarnings('ignore')

# Set seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)


<torch._C.Generator at 0x7a86b57343d0>

#### **Basic exploration of the dataset**

In [2]:
# Let's load the Breast Cancer Dataset from Scikit-Learn
cancer_dataset = load_breast_cancer()

In [3]:
# Keys in dataset
cancer_dataset.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [4]:
# Malignant or benign value
cancer_dataset['target']

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
       0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,
       1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
       0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0,

In [5]:
# Target value name malignant or benign tumor
cancer_dataset['target_names']

array(['malignant', 'benign'], dtype='<U9')

In [6]:
# Description of data
print(cancer_dataset['DESCR'])

.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

:Number of Instances: 569

:Number of Attributes: 30 numeric, predictive attributes and the class

:Attribute Information:
    - radius (mean of distances from center to points on the perimeter)
    - texture (standard deviation of gray-scale values)
    - perimeter
    - area
    - smoothness (local variation in radius lengths)
    - compactness (perimeter^2 / area - 1.0)
    - concavity (severity of concave portions of the contour)
    - concave points (number of concave portions of the contour)
    - symmetry
    - fractal dimension ("coastline approximation" - 1)

    The mean, standard error, and "worst" or largest (mean of the three
    worst/largest values) of these features were computed for each image,
    resulting in 30 features.  For instance, field 0 is Mean Radius, field
    10 is Radius SE, field 20 is Worst Radius.

    - 

In [7]:
# Name of features
print(cancer_dataset['feature_names'])

['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']


In [8]:
# Create datafrmae
cancer_df = pd.DataFrame(np.c_[cancer_dataset['data'],cancer_dataset['target']],
             columns = np.append(cancer_dataset['feature_names'], ['target']))

In [9]:
# Head of cancer DataFrame
cancer_df.head(6)

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0.0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0.0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0.0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0.0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0.0
5,12.45,15.7,82.57,477.1,0.1278,0.17,0.1578,0.08089,0.2087,0.07613,...,23.75,103.4,741.6,0.1791,0.5249,0.5355,0.1741,0.3985,0.1244,0.0


In [10]:
# Tail of cancer DataFrame
cancer_df.tail(6)

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
563,20.92,25.09,143.0,1347.0,0.1099,0.2236,0.3174,0.1474,0.2149,0.06879,...,29.41,179.1,1819.0,0.1407,0.4186,0.6599,0.2542,0.2929,0.09873,0.0
564,21.56,22.39,142.0,1479.0,0.111,0.1159,0.2439,0.1389,0.1726,0.05623,...,26.4,166.1,2027.0,0.141,0.2113,0.4107,0.2216,0.206,0.07115,0.0
565,20.13,28.25,131.2,1261.0,0.0978,0.1034,0.144,0.09791,0.1752,0.05533,...,38.25,155.0,1731.0,0.1166,0.1922,0.3215,0.1628,0.2572,0.06637,0.0
566,16.6,28.08,108.3,858.1,0.08455,0.1023,0.09251,0.05302,0.159,0.05648,...,34.12,126.7,1124.0,0.1139,0.3094,0.3403,0.1418,0.2218,0.0782,0.0
567,20.6,29.33,140.1,1265.0,0.1178,0.277,0.3514,0.152,0.2397,0.07016,...,39.42,184.6,1821.0,0.165,0.8681,0.9387,0.265,0.4087,0.124,0.0
568,7.76,24.54,47.92,181.0,0.05263,0.04362,0.0,0.0,0.1587,0.05884,...,30.37,59.16,268.6,0.08996,0.06444,0.0,0.0,0.2871,0.07039,1.0


In [11]:
# Information of cancer Dataframe
cancer_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 31 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   mean radius              569 non-null    float64
 1   mean texture             569 non-null    float64
 2   mean perimeter           569 non-null    float64
 3   mean area                569 non-null    float64
 4   mean smoothness          569 non-null    float64
 5   mean compactness         569 non-null    float64
 6   mean concavity           569 non-null    float64
 7   mean concave points      569 non-null    float64
 8   mean symmetry            569 non-null    float64
 9   mean fractal dimension   569 non-null    float64
 10  radius error             569 non-null    float64
 11  texture error            569 non-null    float64
 12  perimeter error          569 non-null    float64
 13  area error               569 non-null    float64
 14  smoothness error         5

In [12]:
# Numerical distribution of data
cancer_df.describe()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
count,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,...,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0
mean,14.127292,19.289649,91.969033,654.889104,0.09636,0.104341,0.088799,0.048919,0.181162,0.062798,...,25.677223,107.261213,880.583128,0.132369,0.254265,0.272188,0.114606,0.290076,0.083946,0.627417
std,3.524049,4.301036,24.298981,351.914129,0.014064,0.052813,0.07972,0.038803,0.027414,0.00706,...,6.146258,33.602542,569.356993,0.022832,0.157336,0.208624,0.065732,0.061867,0.018061,0.483918
min,6.981,9.71,43.79,143.5,0.05263,0.01938,0.0,0.0,0.106,0.04996,...,12.02,50.41,185.2,0.07117,0.02729,0.0,0.0,0.1565,0.05504,0.0
25%,11.7,16.17,75.17,420.3,0.08637,0.06492,0.02956,0.02031,0.1619,0.0577,...,21.08,84.11,515.3,0.1166,0.1472,0.1145,0.06493,0.2504,0.07146,0.0
50%,13.37,18.84,86.24,551.1,0.09587,0.09263,0.06154,0.0335,0.1792,0.06154,...,25.41,97.66,686.5,0.1313,0.2119,0.2267,0.09993,0.2822,0.08004,1.0
75%,15.78,21.8,104.1,782.7,0.1053,0.1304,0.1307,0.074,0.1957,0.06612,...,29.72,125.4,1084.0,0.146,0.3391,0.3829,0.1614,0.3179,0.09208,1.0
max,28.11,39.28,188.5,2501.0,0.1634,0.3454,0.4268,0.2012,0.304,0.09744,...,49.54,251.2,4254.0,0.2226,1.058,1.252,0.291,0.6638,0.2075,1.0


---

### **Helper Function**

Evaluate Classifier - borrowed from the Notebook 00 Logistic Regression and Classification and Resampling methods


In [13]:
def evaluate_classifier(y_true, y_pred):
    """Print evaluation metrics for a classifier."""
    print("Accuracy:", accuracy_score(y_true, y_pred))
    print("\nConfusion Matrix:")
    print(confusion_matrix(y_true, y_pred))
    print("\nClassification Report:")
    print(classification_report(y_true, y_pred))


### **Part 1: Model Selection and Regularization I**

In this part we explore:

- **Subset Selection:** using Recursive Feature Elimination (RFE)
- **Ridge Regression (L2 Regularization) and Lasso Regression (L1 Regularization)**

These methods help us control overfitting by penalizing large weights.


In [14]:
# Load the Breast Cancer dataset
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print("Full dataset shape:", X.shape)

# Subset Selection using Recursive Feature Elimination (RFE)
lda = LinearDiscriminantAnalysis()
rfe = RFE(estimator=lda, n_features_to_select=5)
X_train_rfe = rfe.fit_transform(X_train, y_train)
X_test_rfe = rfe.transform(X_test)
print("Selected features (indices):", np.where(rfe.support_)[0])
lda.fit(X_train_rfe, y_train)
y_pred_rfe = lda.predict(X_test_rfe)
print("\nSubset Selection (RFE) Evaluation:")
evaluate_classifier(y_test, y_pred_rfe)

# Ridge Regression (L2 Regularization)
class RidgeRegression(nn.Module):
    def __init__(self, input_dim, lambda_reg=0.1):
        super(RidgeRegression, self).__init__()
        self.linear = nn.Linear(input_dim, 1)
        self.lambda_reg = lambda_reg

    def forward(self, x):
        return torch.sigmoid(self.linear(x))

    def ridge_penalty(self):
        # L2 penalty
        return self.lambda_reg * sum(torch.sum(param ** 2) for param in self.parameters())

# Standardize features for Ridge regression
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
X_train_tensor = torch.FloatTensor(X_train_scaled)
y_train_tensor = torch.FloatTensor(y_train.reshape(-1, 1))
X_test_tensor = torch.FloatTensor(X_test_scaled)

lambda_values = [0.0, 0.01, 0.1, 1.0]
ridge_accuracies = []

for l in lambda_values:
    model_ridge = RidgeRegression(X_train_scaled.shape[1], lambda_reg=l)
    criterion = nn.BCELoss()
    optimizer = optim.SGD(model_ridge.parameters(), lr=0.01)
    epochs = 1000
    for epoch in range(epochs):
        outputs = model_ridge(X_train_tensor)
        loss = criterion(outputs, y_train_tensor) + model_ridge.ridge_penalty()
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    model_ridge.eval()
    with torch.no_grad():
        y_pred_probs = model_ridge(X_test_tensor)
        y_pred = (y_pred_probs > 0.5).float().numpy().flatten()
        acc = accuracy_score(y_test, y_pred)
        ridge_accuracies.append(acc)
    print(f"Ridge: Lambda = {l}, Test Accuracy = {acc:.4f}")

# Plot Ridge results using Plotly
fig = go.Figure()
fig.add_trace(go.Scatter(x=lambda_values, y=ridge_accuracies, mode='lines+markers'))
fig.update_layout(
    title="Effect of Ridge Regularization",
    xaxis=dict(title="Regularization Strength (λ)", type="log"),
    yaxis_title="Accuracy"
)
fig.show()

# Lasso Regression (L1 Regularization)
class LassoRegression(nn.Module):
    def __init__(self, input_dim, lambda_reg=0.1):
        super(LassoRegression, self).__init__()
        self.linear = nn.Linear(input_dim, 1)
        self.lambda_reg = lambda_reg

    def forward(self, x):
        return torch.sigmoid(self.linear(x))

    def lasso_penalty(self):
        # L1 penalty
        return self.lambda_reg * sum(torch.sum(torch.abs(param)) for param in self.parameters())

lambda_values = [0.0, 0.01, 0.1, 1.0]
lasso_accuracies = []
nonzero_coeffs = []

for l in lambda_values:
    model_lasso = LassoRegression(X_train_scaled.shape[1], lambda_reg=l)
    criterion = nn.BCELoss()
    optimizer = optim.SGD(model_lasso.parameters(), lr=0.01)
    epochs = 1000
    for epoch in range(epochs):
        outputs = model_lasso(X_train_tensor)
        loss = criterion(outputs, y_train_tensor) + model_lasso.lasso_penalty()
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    model_lasso.eval()
    with torch.no_grad():
        y_pred_probs = model_lasso(X_test_tensor)
        y_pred = (y_pred_probs > 0.5).float().numpy().flatten()
        acc = accuracy_score(y_test, y_pred)
        lasso_accuracies.append(acc)
        # Count non-zero weights
        weight = model_lasso.linear.weight.data.numpy().flatten()
        nonzeros = np.sum(np.abs(weight) > 0.01)
        nonzero_coeffs.append(nonzeros)
    print(f"Lasso: Lambda = {l}, Test Accuracy = {acc:.4f}, Non-zero Coeffs = {nonzeros}/{len(weight)}")

# Plot Lasso results using Plotly (Accuracy and sparsity)
fig = go.Figure()
fig.add_trace(go.Scatter(x=lambda_values, y=lasso_accuracies, mode='lines+markers', name="Accuracy"))
fig.update_layout(
    title="Effect of Lasso Regularization on Accuracy",
    xaxis=dict(title="Regularization Strength (λ)", type="log"),
    yaxis_title="Accuracy"
)
fig.show()

fig = go.Figure()
fig.add_trace(go.Scatter(x=lambda_values, y=nonzero_coeffs, mode='lines+markers', name="Non-zero Coefficients"))
fig.update_layout(
    title="Lasso Regularization: Sparsity",
    xaxis=dict(title="Regularization Strength (λ)", type="log"),
    yaxis_title="Number of Non-zero Coefficients"
)
fig.show()


Full dataset shape: (569, 30)
Selected features (indices): [ 7  9 14 19 29]

Subset Selection (RFE) Evaluation:
Accuracy: 0.935672514619883

Confusion Matrix:
[[ 56   7]
 [  4 104]]

Classification Report:
              precision    recall  f1-score   support

           0       0.93      0.89      0.91        63
           1       0.94      0.96      0.95       108

    accuracy                           0.94       171
   macro avg       0.94      0.93      0.93       171
weighted avg       0.94      0.94      0.94       171

Ridge: Lambda = 0.0, Test Accuracy = 0.9883
Ridge: Lambda = 0.01, Test Accuracy = 0.9883
Ridge: Lambda = 0.1, Test Accuracy = 0.9825
Ridge: Lambda = 1.0, Test Accuracy = 0.9474


Lasso: Lambda = 0.0, Test Accuracy = 0.9883, Non-zero Coeffs = 30/30
Lasso: Lambda = 0.01, Test Accuracy = 0.9883, Non-zero Coeffs = 22/30
Lasso: Lambda = 0.1, Test Accuracy = 0.9591, Non-zero Coeffs = 11/30
Lasso: Lambda = 1.0, Test Accuracy = 0.6959, Non-zero Coeffs = 3/30


### **Part 2: Model Selection and Regularization II (PyTorch)**

This section integrates model selection using PyTorch implementations along with hyperparameter tuning
of different methods including:

- **Logistic Regression with Regularization**
- **k-Nearest Neighbors (kNN)**
- **Random Forest**


In [15]:
# Load and preprocess the breast cancer dataset
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
X_train_tensor = torch.FloatTensor(X_train_scaled)
y_train_tensor = torch.FloatTensor(y_train.reshape(-1, 1))
X_test_tensor = torch.FloatTensor(X_test_scaled)
y_test_tensor = torch.FloatTensor(y_test.reshape(-1, 1))

# Logistic Regression with Regularization in PyTorch
class LogisticRegressionWithReg(nn.Module):
    def __init__(self, input_dim, l1_lambda=0.0, l2_lambda=0.0):
        super(LogisticRegressionWithReg, self).__init__()
        self.linear = nn.Linear(input_dim, 1)
        self.l1_lambda = l1_lambda
        self.l2_lambda = l2_lambda

    def forward(self, x):
        return torch.sigmoid(self.linear(x))

    def regularization_loss(self):
        l1 = self.l1_lambda * sum(torch.sum(torch.abs(param)) for param in self.parameters())
        l2 = self.l2_lambda * sum(torch.sum(param ** 2) for param in self.parameters())
        return l1 + l2

# Cross-validation function for PyTorch models
def cross_validate_model(X, y, model_class, model_params, cv=5, epochs=500, batch_size=32, lr=0.01):
    kf = KFold(n_splits=cv, shuffle=True, random_state=42)
    scores = []
    for train_idx, val_idx in kf.split(X):
        X_train_fold, X_val_fold = X[train_idx], X[val_idx]
        y_train_fold, y_val_fold = y[train_idx], y[val_idx]
        X_train_tensor = torch.FloatTensor(X_train_fold)
        y_train_tensor = torch.FloatTensor(y_train_fold).reshape(-1, 1)
        X_val_tensor = torch.FloatTensor(X_val_fold)

        train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
        loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

        model = model_class(**model_params)
        criterion = nn.BCELoss()
        optimizer = optim.SGD(model.parameters(), lr=lr)

        for epoch in range(epochs):
            for inputs, labels in loader:
                outputs = model(inputs)
                loss = criterion(outputs, labels)
                if hasattr(model, 'regularization_loss'):
                    loss += model.regularization_loss()
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

        model.eval()
        with torch.no_grad():
            outputs = model(X_val_tensor)
            y_pred = (outputs > 0.5).float().numpy().flatten()
            acc = accuracy_score(y_val_fold, y_pred)
            scores.append(acc)
    return np.mean(scores), np.std(scores)

input_dim = X_train_scaled.shape[1]
model_params = {'input_dim': input_dim, 'l1_lambda': 0.01, 'l2_lambda': 0.01}
mean_acc, std_acc = cross_validate_model(X_train_scaled, y_train, LogisticRegressionWithReg, model_params, cv=5, epochs=500)
print(f"Logistic Regression with Regularization: {mean_acc:.4f} ± {std_acc:.4f}")

# Train final logistic regression model with regularization on full training set
best_model = LogisticRegressionWithReg(input_dim, l1_lambda=0.01, l2_lambda=0.01)
criterion = nn.BCELoss()
optimizer = optim.SGD(best_model.parameters(), lr=0.01)
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
for epoch in range(500):
    for inputs, labels in loader:
        outputs = best_model(inputs)
        loss = criterion(outputs, labels) + best_model.regularization_loss()
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

best_model.eval()
with torch.no_grad():
    outputs = best_model(X_test_tensor)
    y_pred_logreg = (outputs > 0.5).float().numpy().flatten()
    logreg_test_acc = accuracy_score(y_test, y_pred_logreg)
print(f"Final Logistic Regression with Reg - Test Accuracy: {logreg_test_acc:.4f}")

# k-Nearest Neighbors (kNN)
param_grid = {'n_neighbors': [3, 5, 7, 9]}
grid_search_knn = GridSearchCV(KNeighborsClassifier(), param_grid, cv=5)
grid_search_knn.fit(X_train_scaled, y_train)
print(f"Best k for kNN: {grid_search_knn.best_params_['n_neighbors']}, CV Accuracy: {grid_search_knn.best_score_:.4f}")
best_knn = KNeighborsClassifier(n_neighbors=grid_search_knn.best_params_['n_neighbors'])
best_knn.fit(X_train_scaled, y_train)
y_pred_knn = best_knn.predict(X_test_scaled)
knn_test_acc = accuracy_score(y_test, y_pred_knn)
print(f"kNN Test Accuracy: {knn_test_acc:.4f}")

# Random Forest
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20, 30]
}
grid_search_rf = GridSearchCV(RandomForestClassifier(random_state=42), param_grid, cv=5)
grid_search_rf.fit(X_train_scaled, y_train)
print(f"Best Params for Random Forest: {grid_search_rf.best_params_}, CV Accuracy: {grid_search_rf.best_score_:.4f}")
best_rf = RandomForestClassifier(**grid_search_rf.best_params_, random_state=42)
best_rf.fit(X_train_scaled, y_train)
y_pred_rf = best_rf.predict(X_test_scaled)
rf_test_acc = accuracy_score(y_test, y_pred_rf)
print(f"Random Forest Test Accuracy: {rf_test_acc:.4f}")

print("\nFinal Model Comparison on Test Set:")
print(f"Logistic Regression with Reg: {logreg_test_acc:.4f}")
print(f"kNN: {knn_test_acc:.4f}")
print(f"Random Forest: {rf_test_acc:.4f}")


Logistic Regression with Regularization: 0.9724 ± 0.0145
Final Logistic Regression with Reg - Test Accuracy: 0.9883
Best k for kNN: 3, CV Accuracy: 0.9598
kNN Test Accuracy: 0.9591
Best Params for Random Forest: {'max_depth': None, 'n_estimators': 200}, CV Accuracy: 0.9522
Random Forest Test Accuracy: 0.9708

Final Model Comparison on Test Set:
Logistic Regression with Reg: 0.9883
kNN: 0.9591
Random Forest: 0.9708


---

## **REFERENCES**

[1] Wolberg, W., Mangasarian, O., Street, N., & Street, W. (1993). Breast Cancer Wisconsin (Diagnostic) [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5DW2B.

[2] Jurafsky and Martin. (2025). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models, 3rd edition. Ch. 5. Logistic Regression. Online manuscript released January 12, 2025. https://web.stanford.edu/~jurafsky/slp3.